--------------------------------------------------------------------------------
sort: sort tables
--------------------------------------------------------------------------------

Sort can sort a column (or columns should two entries in the sort column
have the same value) alphabetically, numerically and other attributes
(e.g. month).  The discussion below assumes GNU sort (gsort).

------------------------------------------------------------------------
ARCANA: When sorting alphabetical data the basis should be understood.
There are two basis: en_US.utf8 and C. You can choose by preceeding the
sort command

$ LANG=C sort ...
Make sure that join uses the same basis as sort if you plan to use sort
followed by join
------------------------------------------------------------------------

UNDER DEVELOPMENT

The default limiter (field separator) for sort is blank (space, spaces, tab).
The default order for sorting is alphabetical.

$ gsort -flags files

-b .. ignore leading blanks (relevant to alphabetical sort)
-f .. ingore case (fold lower case to upper case)
-d .. consider only blanks and alphanumeric characters

-k KEYDEF .. key for sorting 

-r .. reverse sort 
-t,SEP .. SEP is a 1-character delimiter (IFS, OFS)
-m .. merge files which are already sorted
-u .. retain only one unique entries (delete all identical records)

-n .. numerical sort
-M .. sort by month (JAN, FEB, ...)

-o FILENAME ..  write output to FILENAM
-c .. check if file sorted. no output if sorted already

-debug ..


First let us make an interesting file to sort 

$ echo {a..j} | xargs -n 1 > alpha       
$ paste <(jot -r 10) <(seq 10) <(A=$(printf "%s\n" {a..j}); echo "$A") > Infile  #subshells act as files. wow!


#The default output separator for "Infile" is tab. Check it out.

$ gcat -A Infile   #flag -A display control characters. very useful.
69^I1^Ia$
2^I2^Ib$
47^I3^Ic$
90^I4^Id$
39^I5^Ie$
37^I6^If$
94^I7^Ig$
63^I8^Ih$
19^I9^Ii$
25^I10^Ij$


$ sort -k 1 -n  Infile
2	2	b
19	9	i
25	10	j
37	6	f
39	5	e
47	3	c
63	8	h
69	1	a
90	4	d
94	7	g

$ sort -k2 -n -r Infile
25	10	j
19	9	i
63	8	h
94	7	g
37	6	f
39	5	e
90	4	d
47	3	c
2	2	b
69	1	a

$ sort -k3 Infile  #default sorting is alphabetical
		   #output is same as input, as expected

$ sort -k3 -r Infile 
25	10	j
19	9	i
63	8	h
94	7	g
37	6	f
39	5	e
90	4	d
47	3	c
2	2	b
69	1	a

Next we generate a file with output field separator set to  ","
$ paste -d, <(jot -r 10) <(seq 10) alpha > InfileComma

$ gcat -A InfileComma
38,1,a$
31,2,b$
92,3,c$
46,4,d$
48,5,e$
16,6,f$
47,7,g$
19,8,h$
21,9,i$
11,10,j$


$ sort -k3 -r -t, InfileComma   #reverse sort on column 3 and IFS=,
11,10,j
21,9,i
19,8,h
47,7,g
16,6,f
48,5,e
46,4,d
92,3,c
31,2,b
38,1,a

--------------------------------------------------------------------------------
	#file sorted?
$ cat a
6
4
2

$ sort -n -c a              #thoug sorted we did not specify the order 
sort: a:2: disorder: 4      #

$ sort -n -r -c a       #we gave the right description "-r"
			#echo $? and it should be 0, if file sorted

Summary: sort compares the input file against all the sort options 
(e.g. -n -r -k etc) 

--------------------------------------------------------------------------------
     ##### retain only unique files
$ cat a.dat  #generate input file
1
3
3
1
5

$ sort -n a.dat       #sort it out 
1
1
3
3
5

$ sort -n -u a.dat    #only unique lines are retained
1
3
5

--------------------------------------------------------------------------------
$ sort file1 file2    #simple way

	#second way: input files already sorted	
	#caution: to use this feature the sort direction should be specified

$ cat a    # note that the file is reverse sort
6
4
2

$ cat b
5
3
1

$ sort -m -rn a b   #proper merge since "-r" option has been chosen
6
5
4
3
2
1

$ sort -m -n a b      #merge becomes concatenation since "-r" was not chosen
6
4
2
5
3
1
--------------------------------------------------------------------------------

$ sort -n a.dat | awk 'a!=$0{a=$0;print}'   # equivalent to sort -u  -n a.dat