------------------------------------------------------------------------
fixing fits files
------------------------------------------------------------------------

Kishalay De has a few fits files (generated by DBSP) with errant
headers. He would like to identify the files and then fix the
aberrations. In Unix there are two basic types of files: text files
composed of records (lines) with each record ending by a "\n" and
binary files which are simply streams of bytes. The simplest fits
file has a text header and a binary body.

Most Unix utilities including famous one such as sed and awk operate
on text files. In fact, the point of Unix was not to differentiate
between data streams, whether coming from a file or keyboard or
another program. Text files with line breaks are central to the
concept of Unix utilities. However, data files are specific to the
method of generation or subsequent manipulation (e.g. if integer,
1,2 or 4 bytes; endian, small or big; if floating point, 4 or 8
bytes, IEEE or designer).

Astronomers, in defining the fits standard (in 1981, almost forty
years ago), were way ahead of the times. Having experienced the
chaotic of the "pre-fits" era I can attest  to the amazing reign
of orderliness that came with the introduction of the fits standard.

The fits architecture was revolutionary. The header had the keyword
and value architecture and the binary part could accommodate all
possible formats of integers and floating point (see Summary below).
The layout of the data file (and pixel units, pixel sky location
etc) could be figured out from the header.

Step 1: The "block" size for fits file is 2880 bytes (which is 36x80
bytes). For now, let us assume that the header has nhdr blocks.

$ infile="FOC"
$ nhdr=2


Using "dd" I cleave the fits file, "FOC.fits", into "hdr"
and "data" (the remaining blocks)

$ infile=FOC
$ dd if=${infile}.fits bs=2880 skip=0     count=$nhdr of=hdr
$ dd if=${infile}.fits bs=2880 skip=$nhdr             of=data

Step 2: "Punch" cards with 80-characters provided the standard way
to communicate to the computer. Naturally, fits headers were designed
to emulate the  80-character IBM punch cards.  Thus a "line" or
"record" is 80 bytes long. However, the fits header has no "\n"
(that came later with Unix).

Using "gsed", I converted "hdr" into a regular text file with "\n"
inserted at every 80 characters (\n not included in the count).
This makes it easy to use modern tools such as awk and sed (invented
circa eighties). This step is executed with the firs "gsed".  [Unix
arcana: sed cannot deal with \n and so GNU sed or gsed has to be
used.]

Kishalay had a specific request: insert a line "AIRMASS=1.0", ahead
of the "END" record. I assume that the END record is at least
one record before the end of header file. [The last header block
is filled with blanks so as to reach 2880 characters].
This step is undertaken with the second invocation of gsed. 
The substituion below conforms to the fits standard of "= " in columns 9-10.

Next, we have to delete the last (blank) line because we added a
line which is done with the second command of the second gsed.
Finally, we have to strip out "\n". This is done elegantly by "tr".

$ gsed 's/.\{80\}/&\n/g' hdr | \
gsed -e "/^END/{h;s:.\{40\}:AIRMASS = '1.000   '           / Airmass:p;x;}" -e '$d' | tr -d '\n'  > hdr_fix


Step 3: Finally I concatenate the two files. The output file is a
repaired file, "FOC_fix.fits"

$ cat hdr_fix data > ${infile}_fix.fits

Of course, all this can be done without any using any intermediate
files (with command substitutions and subshells). The statements
can be converted to a stand alone utility which can fix files,
successively.

In writing this notes to myself it occurred to me that in 2021 the
IAU should organize a celebration of "The FITS standard and its
impact on the world".

------------------------------------------------------------------------
Appendix A: fits format
------------------------------------------------------------------------
The "fits" format was introduced by Wells, Greisen & Hartner (1981).

".fits" files have a human readable header (ascii) and data or data
units which are binary files (1,2,4 byte integers or 4,8 byte real
numbers in a variety of formats).  This mixed format is called HDU
for Header/DATA Unit.  The header(s) and data unit(s) are organized
in blocks of 2880 bytes which is also 36x80 bytes.

"The header contain 80 bytes lines each of which consists of a
keyword of 8 bytes followed in most of the cases by '= ' in the
position 9 and 10 and then the value of the keyword. The rest of
the line is a comment string beginning with '/'. Each header begins
with the following lines

SIMPLE  =                    T / file conforms to FITS standard
BITPIX  =                   16 / number of bits per data pixel
NAXIS   =                    2 / number of data axes
NAXIS1  =                  440 / length of data axis 1
NAXIS2  =                  300 / length of data axis 2

which defines the format of the file as standard FITS, the data
format and the dimensions of the stored data.

One block of 2880 bytes contains 36 lines of 80 characters per line.
The header can have several blocks of 36 lines. The last block is
identified by the presence of the keyword 'END' The next 2880 bytes
block contains the first part of the data. The empty lines after
'END' keyword are filled with blanks and the unused bytes from the
end of the data to the end of the 2880 bytes block are filled with
NULLs."