------------------------------------------------------------------------ fixing fits files ------------------------------------------------------------------------ Kishalay De has a few fits files (generated by DBSP) with errant headers. He would like to identify the files and then fix the aberrations. In Unix there are two basic types of files: text files composed of records (lines) with each record ending by a "\n" and binary files which are simply streams of bytes. The simplest fits file has a text header and a binary body. Most Unix utilities including famous one such as sed and awk operate on text files. In fact, the point of Unix was not to differentiate between data streams, whether coming from a file or keyboard or another program. Text files with line breaks are central to the concept of Unix utilities. However, data files are specific to the method of generation or subsequent manipulation (e.g. if integer, 1,2 or 4 bytes; endian, small or big; if floating point, 4 or 8 bytes, IEEE or designer). Astronomers, in defining the fits standard (in 1981, almost forty years ago), were way ahead of the times. Having experienced the chaotic of the "pre-fits" era I can attest to the amazing reign of orderliness that came with the introduction of the fits standard. The fits architecture was revolutionary. The header had the keyword and value architecture and the binary part could accommodate all possible formats of integers and floating point (see Summary below). The layout of the data file (and pixel units, pixel sky location etc) could be figured out from the header. Step 1: The "block" size for fits file is 2880 bytes (which is 36x80 bytes). For now, let us assume that the header has nhdr blocks. $ infile="FOC" $ nhdr=2 Using "dd" I cleave the fits file, "FOC.fits", into "hdr" and "data" (the remaining blocks) $ infile=FOC $ dd if=${infile}.fits bs=2880 skip=0 count=$nhdr of=hdr $ dd if=${infile}.fits bs=2880 skip=$nhdr of=data Step 2: "Punch" cards with 80-characters provided the standard way to communicate to the computer. Naturally, fits headers were designed to emulate the 80-character IBM punch cards. Thus a "line" or "record" is 80 bytes long. However, the fits header has no "\n" (that came later with Unix). Using "gsed", I converted "hdr" into a regular text file with "\n" inserted at every 80 characters (\n not included in the count). This makes it easy to use modern tools such as awk and sed (invented circa eighties). This step is executed with the firs "gsed". [Unix arcana: sed cannot deal with \n and so GNU sed or gsed has to be used.] Kishalay had a specific request: insert a line "AIRMASS=1.0", ahead of the "END" record. I assume that the END record is at least one record before the end of header file. [The last header block is filled with blanks so as to reach 2880 characters]. This step is undertaken with the second invocation of gsed. The substituion below conforms to the fits standard of "= " in columns 9-10. Next, we have to delete the last (blank) line because we added a line which is done with the second command of the second gsed. Finally, we have to strip out "\n". This is done elegantly by "tr". $ gsed 's/.\{80\}/&\n/g' hdr | \ gsed -e "/^END/{h;s:.\{40\}:AIRMASS = '1.000 ' / Airmass:p;x;}" -e '$d' | tr -d '\n' > hdr_fix Step 3: Finally I concatenate the two files. The output file is a repaired file, "FOC_fix.fits" $ cat hdr_fix data > ${infile}_fix.fits Of course, all this can be done without any using any intermediate files (with command substitutions and subshells). The statements can be converted to a stand alone utility which can fix files, successively. In writing this notes to myself it occurred to me that in 2021 the IAU should organize a celebration of "The FITS standard and its impact on the world". ------------------------------------------------------------------------ Appendix A: fits format ------------------------------------------------------------------------ The "fits" format was introduced by Wells, Greisen & Hartner (1981). ".fits" files have a human readable header (ascii) and data or data units which are binary files (1,2,4 byte integers or 4,8 byte real numbers in a variety of formats). This mixed format is called HDU for Header/DATA Unit. The header(s) and data unit(s) are organized in blocks of 2880 bytes which is also 36x80 bytes. "The header contain 80 bytes lines each of which consists of a keyword of 8 bytes followed in most of the cases by '= ' in the position 9 and 10 and then the value of the keyword. The rest of the line is a comment string beginning with '/'. Each header begins with the following lines SIMPLE = T / file conforms to FITS standard BITPIX = 16 / number of bits per data pixel NAXIS = 2 / number of data axes NAXIS1 = 440 / length of data axis 1 NAXIS2 = 300 / length of data axis 2 which defines the format of the file as standard FITS, the data format and the dimensions of the stored data. One block of 2880 bytes contains 36 lines of 80 characters per line. The header can have several blocks of 36 lines. The last block is identified by the presence of the keyword 'END' The next 2880 bytes block contains the first part of the data. The empty lines after 'END' keyword are filled with blanks and the unused bytes from the end of the data to the end of the 2880 bytes block are filled with NULLs."