------------------------------------------------------------------------
lexical collation of LaTeX .bib file
------------------------------------------------------------------------

This problem was posed by Yuhan Yao, Caltech "Could you please
sort my bib file alphabetically by the last name of the first
author?"

The bibfile in question is at2019dge.bib. The structure of the file
supplied by Yuhas has, for each reference, a bibtex entry ending
with a zero-character line ("blank line").  entry. Yuhan provides
a key on the first line of each entry is the first line has the
first authors name neatly captured (by Yuhan). The goal is to
alphabetically sort this file, using the first author's last name
as the sorting key.

The file at2019dge.bib can be found at
	http://www.astro.caltech.edu/~srk/SRKUnix/Examples/at2019dge.bib
This file contains multi-line records. The record separator, RS,
is "" ("blank line").

$ cat at2019dge.bib
@ARTICLE{Zou2017,
       author = {{Zou}, Hu and {Zhang}, Tianmeng and {Zhou}, Zhimin and {Nie}, Jundan and
         {Peng}, Xiyan and {Zhou}, Xu and {Jiang}, Linhua and {Cai}, Zheng and
         {Dey}, Arjun and {Fan}, Xiaohui and {Fan}, Dongwei and {Guo}, Yucheng and
         {He}, Boliang and {Jiang}, Zhaoji and {Lang}, Dustin and
         {Lesser}, Michael and {Li}, Zefeng and {Ma}, Jun and {Mao}, Shude and
         {McGreer}, Ian and {Schlegel}, David and {Shao}, Yali and
         {Wang}, Jiali and {Wang}, Shu and {Wu}, Jin and {Wu}, Xiaohan and
         {Yang}, Qian and {Yue}, Minghao},
        title = "{The First Data Release of the Beijing-Arizona Sky Survey}",
      journal = {\aj},
      ..
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

@MISC{Bellm2016,
       author = {{Bellm}, Eric C. and {Sesar}, Branimir},
       ..
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

@ARTICLE{Arcavi2017,
       author = {{Arcavi}, Iair and {Hosseinzadeh}, Griffin and {Brown}, Peter J. and
       ..
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
..

There are two subtleties:
  1. In any alphabetical sort, capital versus lower case matters
     (because of the differing locations occupied by A-Z and a-z).
     So it is best to make a "case insensitive" sort.
  2. Yuhan's file has the not-uncommon problem: the last record
     does not end with a RS. I added a blank line.

------------------------------------------------------------------------
Solution 1: Use Unix tools to create index file & awk to write output
------------------------------------------------------------------------

This solution requires two passes at the data.

Our first job is to create a list of keyword for each paper

$ sed -n 's/^@.*{//p' at2019dge.bib | nl 
     1  Zou2017,
     2  Bellm2016,
     3  Arcavi2017,
     4  Arnett1982,
     5  Astropy-Collaboration2013,
     6  BC03,

We sort on the second column ("-f" is "fold" which is same as "case insenstive")
and store the record number in a file index.list

$ sed -n 's/^@.*{//p' at2019dge.bib | nl | sort -k2 -f | cut -f1 | tee index.list
     3  
     4  
     5 
     6 
     2 
     ...

     
We construct an awk program to read in the index file, read the bib
file and write out the bib references in the order given by the
index file

$ cat bib.awk

Note some awk arcana -- "ind[i]+0" is needed to ensure ind[i] is intrepreted as 
an integer number.

$ awk -f bib.awk  at2019dge.bib 
BEGIN{while((getline<"index.list")>0){ind[++i]=$0};RS="";ORS="\n\n"}  #read in index.list
     {rec[++j]=$0}                                         #read bib file
END{for (i=1;i<=length(ind);i++){print rec[ind[i]+0]}}     #print in order of index.list

An alternative which does the same job is as follows (and is shorter!):

$ awk -f bib2.awk index.list at2019dge.bib

$ cat bib2.awk
BEGIN{RS="";FS=","; ORS="\n\n"}
   FNR==NR{for (i=1;i<=NF;i++) {ind[i]=$i+0}}  #make clever use of RS="" to read entire file
   FNR!=NR{rec[++j]=$0}
END{for (i=1;i<=length(ind);i++) {print rec[ind[i]]}}

------------------------------------------------------------------------
Solution II: Only one pass but no unix tools, only awk
------------------------------------------------------------------------

$ awk -f sortbib.awk    at2019dge.bib

where

$ cat sortbib.awk
BEGIN {RS="";FS=","}
        {rec[++i]=$0 
	 a=$1; sub("^@.*{","",a)     # a=Zou2017
	 b[i]=a", "i                 # b[1]=Zou2017, 1
	}
END   { n=asort(b)                   #sort b[] alphabetically
	for (k=1;k<=n;k++)  { 
          m=split(b[k],outb,",");    #outb(m)=3  (corresponding to Arcavi2017)
          ind=outb[m];
          print rec[ind+0] "\n"      #subtlety: coerce ind to behave as integer
        }
      }

------------------------------------------------------------------------
Solution III: A solely Unix solution (a triumph!)
------------------------------------------------------------------------

$ gsed -e 's/\(^@.*{\)\(.*$\)/\2 \1\2/'                           #1a
       -e 's/^$/\x0/' at2019dge.bib               |               #1b
       sort -z                                   |                #2
       sed '/@/s/^[^ ]*, //'                    |                 #3
       tr -d '\000'                            |                  #4
       sed 1d       >         alpha_sort.bib                      #5


Step #1a
   Using "playback" feature in sed, extract key word and start the first 
   line in each bib record
Step #1b
   In the input file a blank line "^$" separates bibliographic
   records from each other.  Replace this line with a NUL character
   (\x00 which is visually displayed as ^@)
  IMPORTANT: must used "gsed" since "sed" does not deal with NUL

Bellm2016, @MISC{Bellm2016,
       author = {{Bellm}, Eric C. and {Sesar}, Branimir},
        title = "{pyraf-dbsp: Reduction pipeline for the Palomar Double Beam Spectrograph}",
     keywords = {Software},
         year = "2016",
        month = "Feb",
          eid = {ascl:1602.002},
        pages = {ascl:1602.002},
archivePrefix = {ascl},
       eprint = {1602.002},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2016ascl.soft02002B},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
^@
Zou2017, @ARTICLE{Zou2017,
         author = {{Zou}, Hu and {Zhang}, Tianmeng and {Zhou}, Zhimin and {Nie}, Jundan and
...

Step #2
  Sort on the first work (key word). The flag "-z" forces sort to
  consider \x00 as the record separator instead of the usual \nl

Step #3
  Now that the file has been sorted, delete the keyword that was
  inserted in step 1

Step #4
  Delete NUL. The replacement is, by default, "^$"

Step #5
  It appears that an extra-line gets inserted. Delete it.