------------------------------------------------------------------------
IPAC Table format
------------------------------------------------------------------------

Recently I wanted to obtain a list of IPAC database holdings
(specifically lists of catalogs). After some effort I found this
statement "... see the 'description' and 'catname' columns  of
irsa_catalogs.tbl"

 "allwise_p3as_psd"

So I dutifully searched this file and found it
$ curl -o irsa_catalogs.tbl "https://irsa.ipac.caltech.edu/cgi-bin/Gator/nph-scan?mode=ascii" 

This is an unwieldy file as can be gathered from this one liner.
$ head -n 5 irsa_catalogs.tbl | awk '{print length}'                  
80
11
34
3704
3704

This file uses "IPAC Table Format" (ITF) which is not well suited
to modern tools.  After a day of work I developed a Unix based
approach to read ITF files.  The task was made enjoyable by the
forays I undertook as I tried to understand the origin of this
format. I learnt the history of Unix, IRAS and so on. As usual,
knowledge is highly interconnected.  If you are the sort who likes
broad gathering of knowledge and synthesis then do read the rest
of this article. You will both enjoy and learn new things.

HISTORY: UNIX.
Unix is now the standard operating system (OS). It triumphed over
OS for Data General, DEC and IBM. The standard languages in the
sixties through eighties was Fortran (for scientists) and COBOL
(for business).  Unix was released to Universities in 1973 and it
ran only on DEC equipment. In 1978 Unix was used on 600 DEC machines
in some form. The GNU project was born in 1983. The computer giant,
IBM, started offering Unix in 1986. However, the installed based
of Unix was small.  Unix took off with the launch of Sun workstation
(SPARC; 1987) and the famed Intel 30286 (1989).

Unix had two distinct features. First the OS was not written in
assembly language but in C and second it treated everything as a
file or file system. It is the latter that gives awesome power (via
pipelines).  In contrast to the prevailing standards of fixed format
files, Unix works on files that are organized by a simple algorithm
(e.g. tab separated variable file) and it is this generality that
makes it possible to use pipelines in Unix.

HISTORY: IPAC (& IPAC Table Format).
IPAC was created to support IRAS (launched in early 1983; operated
until the end of the year). The results from IRAS were still
reverberating in 1985 when I joined Caltech.

IPAC created an internal standard --the IPAC Table Format (ITF) --
to support IRAS data and over time ITF was used for 2MASS (1997-2001).
It is now a fixture in IRSA.  ITF uses "fixed width" fields which
belies its deep connection to Fortran. The fields are separated by
"|" (pipe).  The widths can be inferred from the first line in the
descriptor block.  An ITF has a pleasing appearance when printed
on a 132-column printer.  However, files with longer records are
decidedly ugly to view.

MODERN TIMES: 
ITF is a weird hybrid with the underlying spirit derived from Fortran
but with flexibility in defining the fields (widths, number of
fields, type of variable) which is reminiscent of Unix.  Here, I
show how IPAC Table format files can be read with Unix tools. Along
the way we will discover a (obscure) feature of awk which was
designed explicitly to read Fortran files!

------------------------------------------------------------------------
I. The IPAC Table Format 
------------------------------------------------------------------------

The IPAC Table format has three blocks. Header & comments (which
start off with  "\" and "\ ", respectively.  The descriptor block,
with "|" serving as the bookends for each line, follows the header
& comment block.  The descriptor to the table requires a minimum
of two lines.  The first line has the names of the column and the
next line the type of the value in each table (double, int, real,
char, logical..) There could up two additional descriptor block
lines.

The data lines have no "|" whatsoever and begin after descriptor
block.  The key to IPAC Table format is that values for each column
must lie between adjacent "|" (whose column numbers are fixed).

The standard example for the IPAC Table format can be found from the
IRSA homepage.

$cat IPAC.tbl

\catalog = 'Sample Catalog'
\ A sample table comment
|   ra      |    dec    |  n_obs  |    V     |    SpType  |
|   double  |    double |  int    |   real   |    char    |
|   deg     |    deg    |         |   mag    |            |
|   null    |    null   |  null   |   null   |    null    |
  165.466279  -34.704730     5       11.27        K6Ve 

There are two solutions to analyze ITF files. The first is to
substitute "|" at the appropriate locations in the data portion of
the file. The resulting ".csv" file (pipe separated value file) is
essentially a csv file and well suited to reading by awk or csvkit
etc.  The second is an arcane solution and takes advantage of
"FIELDWIDTH" feature of awk (explicitly designed to read FORTRAN
files).

Nota bene:  I found that using "|" resulted in erratic behavior.
I decided to stick with ";" which seems acceptable


------------------------------------------------------------------------
IIa Convert IPAC Table to csv format
------------------------------------------------------------------------

For this solution we have to replace blank character by ";" in
appropriate columns of the data file.  The locations of "|" in the
input file are obtained from the first line of the descriptor block.

|   ra      |    dec    |  n_obs  |    V     |    SpType  |

We assign index 1 to first character "|", index 2 to the next and
so on. The last character is a "|" and has index 59.  We determine
the index of each "|" as follows

$ PIPES=$(sed -n '3p' IPAC.tbl | gsed 's/./&\n/g' | nl | sed -n 's/|//p' | xargs)
$ echo $PIPES
1 13 25 35 46 59


This piece of code does the task we set out to do 
$ awk -v PIPES="$PIPES" 'BEGIN{split(PIPES,ind)}  \
          /^ /{n=split($0,a,""); delete a[1];     \
               for (i in ind){a[ind[i]]=";"}; delete a[1]; delete a[n]; \
               for (i in a){b=b a[i]};print b     \
               }' IPAC.tbl

line 1: pass the bash variable "$PIPES" to awk. In the BEGIN block,
        extract the values into "ind"
line 2: convert input line into an array of characters using split
line 3: effect the ";" substitution. delete the first and last characters 
        of the record (which are "|", in effect)
line 4: rejoin the characters of array into a single line and print line

This logic forms the basis of the stand alone utility "ipac2csv" (attached).

APPLICATION.

Recall my starting point was to interpret an IRSA ITF file
"irsa_catalog.tbl" which apparently lists all the catalog files of
IRSA.

$ curl -o irsa_catalogs.tbl "https://irsa.ipac.caltech.edu/cgi-bin/Gator/nph-scan?mode=ascii"

Let us decode the first line of the descriptor block, using awk
$ sed -n '4{p;q;}' $IRSA_FILE | tr '|' '\n' | sed 's/ //g' | nl
     1	cntr
     2	description
     3	subtitle
     4	ack
     5	infourl
     6	server
     7	database
     8	catname
     9	ddname
    10	ddlink
    11	xpffile
    12	cols
    13	nrows
    14	ext
    15	key
    16	pos
    17	rel
    18	coneradius
    19	uploadradius
    20	projectshort
    21	missionid
    22	groupid
    23	access_flag
    24	alg
    25	algv
    26	ex
    27	where1
    28	where2
    29	mojoin
    30	modataset
    31	toperiod
    32	periodresult
    33	sptlevel
    34	visibility
    35	dbms

Let us only print out "description" (2), "catname" (7), "database" (8). There
are lots of blanks. Also, let us restrict to the WISE mission (for now).
$ awk -F";" '{print $7,$8,$2}' OFS="\t" irsa_catalogs.csv | sed 's/  */ /g' | grep WISE | less -S
 wise_allwise    allwise_p3as_psd        AllWISE Source Catalog
 wise_allwise    allwise_p3as_mep        AllWISE Multiepoch Photometry Table
 wise_allwise    allwise_p3as_psr        AllWISE Reject Table
 wise_allwise    allwise_p3as_cdd        AllWISE Atlas Metadata Table
 wise_allwise    allwise_p3am_xrf        AllWISE Frame Cross-Reference Table
 wise_allwise    allwise_p3al_lod        AllWISE Atlas Inventory Table
 wise_allwise    allwise_p3am_cdd        AllWISE Atlas Image Inventory Table
 wise_allwise    allwise_mfpos   AllWISE Refined Pointing Information for the Single-
 neowiser        neowiser_p1bs_psd       NEOWISE-R Single Exposure (L1b) Source Table
 neowiser        neowiser_p1ba_mch       NEOWISE-R Known Solar System Object Possible
 neowiser        neowiser_p1bs_frm       NEOWISE-R Single Exposure (L1b) Frame Metada
 neowiser        neowiser_p1bl_lod       NEOWISE-R Single Exposure (L1b) Scan Invento
 neowiser        neowiser_p1bm_frm       NEOWISE-R Single Exposure (L1b) Image Inven
....
------------------------------------------------------------------------
IIb.  Using FIELDWIDTH feature of awk
------------------------------------------------------------------------

We deduce the field widths from the first line of the description block

$ IRSA_FILE=irsa_catalogs.tbl
$ FW=$(sed -n '/^|/{s:|:& :g;s:^|: :;p;q;}' $IRSA_FILE  | tr '|' '\n' | awk '{print length}' | sed '$d' | xargs | sed 's/.*/"&"/')

The field widths are captured into $FW.
$ echo $FW
"42 303 103 83 203 33 33 43 63 103 103 13 22 13 13 5 353 12 14 43 13 13 13 43 83 203 303 303 503 43 503 43 13 12 13"

Having obtained the field widths we use awk

$  sed '/^\\/d' $IRSA_FILE | awk 'BEGIN{FIELDWIDTHS='$FW'}{print $7,$8,$2}' OFS="\t" | sed 's/ //g' | column -t | grep WISE

Restricting to WISE  we find

wise_allwise      allwise_p3as_psd               AllWISESourceCatalog
wise_allwise      allwise_p3as_mep               AllWISEMultiepochPhotometryTable
wise_allwise      allwise_p3as_psr               AllWISERejectTable
wise_allwise      allwise_p3as_cdd               AllWISEAtlasMetadataTable
wise_allwise      allwise_p3am_xrf               AllWISEFrameCross-ReferenceTable
wise_allwise      allwise_p3al_lod               AllWISEAtlasInventoryTable
wise_allwise      allwise_p3am_cdd               AllWISEAtlasImageInventoryTable
wise_allwise      allwise_mfpos                  AllWISERefinedPointingInformationfortheSingle-exposureImages
neowiser          neowiser_p1bs_psd              NEOWISE-RSingleExposure(L1b)SourceTable
neowiser          neowiser_p1ba_mch              NEOWISE-RKnownSolarSystemObjectPossibleAssociationList(<ahref='http://wise2.ipac.caltech.edu/docs/release/neowise/expsup/kssopal.README.html'target='doc'>Caution</a>)
neowiser          neowiser_p1bs_frm              NEOWISE-RSingleExposure(L1b)FrameMetadataTable
neowiser          neowiser_p1bl_lod              NEOWISE-RSingleExposure(L1b)ScanInventoryTable
neowiser          neowiser_p1bm_frm              NEOWISE-RSingleExposure(L1b)ImageInventoryTable
wise_allsky       allsky_4band_p3as_psd          WISEAll-SkySourceCatalog
....