-------------------------------------------------------------------------------- sort: sort tables -------------------------------------------------------------------------------- Sort can sort a column (or columns should two entries in the sort column have the same value) alphabetically, numerically and other attributes (e.g. month). The discussion below assumes GNU sort (gsort). ------------------------------------------------------------------------ ARCANA: When sorting alphabetical data the basis should be understood. There are two basis: en_US.utf8 and C. You can choose by preceeding the sort command $ LANG=C sort ... Make sure that join uses the same basis as sort if you plan to use sort followed by join ------------------------------------------------------------------------ UNDER DEVELOPMENT The default limiter (field separator) for sort is blank (space, spaces, tab). The default order for sorting is alphabetical. $ gsort -flags files -b .. ignore leading blanks (relevant to alphabetical sort) -f .. ingore case (fold lower case to upper case) -d .. consider only blanks and alphanumeric characters -k KEYDEF .. key for sorting -r .. reverse sort -t,SEP .. SEP is a 1-character delimiter (IFS, OFS) -m .. merge files which are already sorted -u .. retain only one unique entries (delete all identical records) -n .. numerical sort -M .. sort by month (JAN, FEB, ...) -o FILENAME .. write output to FILENAM -c .. check if file sorted. no output if sorted already -debug .. First let us make an interesting file to sort $ echo {a..j} | xargs -n 1 > alpha $ paste <(jot -r 10) <(seq 10) <(A=$(printf "%s\n" {a..j}); echo "$A") > Infile #subshells act as files. wow! #The default output separator for "Infile" is tab. Check it out. $ gcat -A Infile #flag -A display control characters. very useful. 69^I1^Ia$ 2^I2^Ib$ 47^I3^Ic$ 90^I4^Id$ 39^I5^Ie$ 37^I6^If$ 94^I7^Ig$ 63^I8^Ih$ 19^I9^Ii$ 25^I10^Ij$ $ sort -k 1 -n Infile 2 2 b 19 9 i 25 10 j 37 6 f 39 5 e 47 3 c 63 8 h 69 1 a 90 4 d 94 7 g $ sort -k2 -n -r Infile 25 10 j 19 9 i 63 8 h 94 7 g 37 6 f 39 5 e 90 4 d 47 3 c 2 2 b 69 1 a $ sort -k3 Infile #default sorting is alphabetical #output is same as input, as expected $ sort -k3 -r Infile 25 10 j 19 9 i 63 8 h 94 7 g 37 6 f 39 5 e 90 4 d 47 3 c 2 2 b 69 1 a Next we generate a file with output field separator set to "," $ paste -d, <(jot -r 10) <(seq 10) alpha > InfileComma $ gcat -A InfileComma 38,1,a$ 31,2,b$ 92,3,c$ 46,4,d$ 48,5,e$ 16,6,f$ 47,7,g$ 19,8,h$ 21,9,i$ 11,10,j$ $ sort -k3 -r -t, InfileComma #reverse sort on column 3 and IFS=, 11,10,j 21,9,i 19,8,h 47,7,g 16,6,f 48,5,e 46,4,d 92,3,c 31,2,b 38,1,a -------------------------------------------------------------------------------- #file sorted? $ cat a 6 4 2 $ sort -n -c a #thoug sorted we did not specify the order sort: a:2: disorder: 4 # $ sort -n -r -c a #we gave the right description "-r" #echo $? and it should be 0, if file sorted Summary: sort compares the input file against all the sort options (e.g. -n -r -k etc) -------------------------------------------------------------------------------- ##### retain only unique files $ cat a.dat #generate input file 1 3 3 1 5 $ sort -n a.dat #sort it out 1 1 3 3 5 $ sort -n -u a.dat #only unique lines are retained 1 3 5 -------------------------------------------------------------------------------- $ sort file1 file2 #simple way #second way: input files already sorted #caution: to use this feature the sort direction should be specified $ cat a # note that the file is reverse sort 6 4 2 $ cat b 5 3 1 $ sort -m -rn a b #proper merge since "-r" option has been chosen 6 5 4 3 2 1 $ sort -m -n a b #merge becomes concatenation since "-r" was not chosen 6 4 2 5 3 1 -------------------------------------------------------------------------------- $ sort -n a.dat | awk 'a!=$0{a=$0;print}' # equivalent to sort -u -n a.dat