------------------------------------------------------------------------
fixing fits files
------------------------------------------------------------------------

Kishalay De has a few fits files (generated by DBSP) with errant
headers. He would like to identify the files and then fix the
aberrations. In Unix there are two basic types of files: text files
composed of records (lines) with each record ending by a "\n" and
binary files which are simply streams of bytes. The simplest fits
file has a text header and a binary body.

Most Unix utiliies including famous one such as sed and awk operate
on text files. In fact, the point of Unix was not to differentiate
between data streams, whether coming from a file or keyboard or
ano0ther program. Text files with line breaks are central to the
concept of Unix utilities. However, data files are specific to the
method of generation or subsequent manipulation (e.g. if integer,
1,2 or 4 bytes; endian, small or big; if floating point, 4 or 8
bytes, IEEE or designer).

Astronomers, in defining the fits standard (1981), were away ahead
of the times. I can personally attest (having experienced the chaotic
"pre-fits" era) to the amazing regin of orderliness that came with
the introduction of the fits standard.

The fits architecture was revolutionary. The header had the keyword
and value architecture and the binary part could accommodate all
possible formats of integers and floating point (see Summary below).

Step 1: The "block" size for fits file is 2880 bytes (which is 36x80
bytes). For now, let us assume that the errant entry is in the first
block. Using "dd" I cleave the fits file, "FOC.fits", into "hdr"
(1 block) and "data" (the rest of the file).

$ infile=FOC
$ dd if=${infile}.fits bs=2880 skip=0 count=1 of=hdr
$ dd if=${infile}.fits bs=2880 skip=1 of=data

Step 2: fits headers were designed to emulate 80-character IBM punch
cards.  Thus a natural "line" or "record" is 80 bytes long. 
Using "gsed", I converted "hdr" ito a regular text file with "\n"
inserted at every 80 characters (\n not included in the count).
Say it is required that line "n" of hdr has to be repaired.  A
second invocation of "gsed" is used to edit the line (in this case
replace the original line by "AIRMASS = 1.0 /". Following this
editing I restore the header to conform to fits standard (which has
no record breaks).  This is done by using "tr" to get rid of "\n".
The resulting output file is "hdr_fix".

$ nline=34   #set for now
$ gsed 's/.\{80\}/&\n/g' hdr | gsed $nline's:^.\{14\}:AIRMASS = 1.0/:' |
    tr -d '\n'  > hdr_fix

[Unix arcana: sed cannot deal with \n and so GNU sed or gsed has
to be used.]

Step 3: Now concatenate the two files and the output file is 
a repaired file, "FOC_fix.fits"

$ cat hdr_fix data > ${infile}_fix.fits

------------------------------------------------------------------------
Stand-alone utility
------------------------------------------------------------------------

#!/bin/bash
# repair_fits file1[.fits] file2[.fits] file3[.fits] ...
# ouput: if file is damaged then file1_fix.fits, etc

nhdr=2    #no of 2880-byte blocks of header (set for DBSP files)

if [ $# = 0 ]; then        #need at least one file to work on
    echo "need at least one file"; exit -1
fi

for i in $@; do

  infile=${i%.*}
  dd if=${infile}.fits bs=2880 skip=0 count=$nhdr of=hdr 2>/dev/null
  grep -q AIRMASS hdr

  if [ $? -ne 0 ]; then
    dd if=${infile}.fits bs=2880 skip=$nhdr of=data 2>/dev/null
    gsed 's/.\{80\}/&\n/g' hdr | \
    gsed -e "/^END/{h;s:.\{40\}:AIRMASS = '1.000   '           / Airmass:p;x;}" -e '$d' | tr -d '\n'  > hdr_fix
    cat hdr_fix data > ${infile}_fix.fits
    echo "fixed" ${i}.fits "->" ${i}.fix.fits
  else
    echo ${i}.fits " is good"
  fi

done

------------------------------------------------------------------------
Appendix A: fits format
------------------------------------------------------------------------

".fits" files have a human readable header (ascii) and data or data
units which are binary filesi (1,2,4 byte integers or 4,8 byte real
numbers in a variety of formats).  This mixed format is called HDU
for Header/DATA Unit.  The header(s) and data unit(s) are organized
in blocks of 2880 bytes which is also 36x80 bytes.

"The header contain 80 bytes lines each of which consists of a
keyword of 8 bytes followed in most of the cases by '= ' in the
position 9 and 10 and then the value of the keyword. The rest of
the line is a comment string beginning with '/'. Each header begins
with the following lines

SIMPLE  =                    T / file conforms to FITS standard
BITPIX  =                   16 / number of bits per data pixel
NAXIS   =                    2 / number of data axes
NAXIS1  =                  440 / length of data axis 1
NAXIS2  =                  300 / length of data axis 2

which defines the format of the file as standard FITS, the data
format and the dimensions of the stored data.

One block of 2880 bytes contains 36 lines of 80 characters per line.
The header can have several blocks of 36 lines. The last block is
identified by the presence of the keyword 'END' The next 2880 bytes
block contains the first part of the data. The empty lines after
'END' keyword are filled with blanks and the unused bytes from the
end of the data to the end of the 2880 bytes block are filled with
NULLs."