------------------------------------------------------------------------ fixing fits files ------------------------------------------------------------------------ Kishalay De has a few fits files (generated by DBSP) with errant headers. He would like to identify the files and then fix the aberrations. In Unix there are two basic types of files: text files composed of records (lines) with each record ending by a "\n" and binary files which are simply streams of bytes. The simplest fits file has a text header and a binary body. Most Unix utiliies including famous one such as sed and awk operate on text files. In fact, the point of Unix was not to differentiate between data streams, whether coming from a file or keyboard or ano0ther program. Text files with line breaks are central to the concept of Unix utilities. However, data files are specific to the method of generation or subsequent manipulation (e.g. if integer, 1,2 or 4 bytes; endian, small or big; if floating point, 4 or 8 bytes, IEEE or designer). Astronomers, in defining the fits standard (1981), were away ahead of the times. I can personally attest (having experienced the chaotic "pre-fits" era) to the amazing regin of orderliness that came with the introduction of the fits standard. The fits architecture was revolutionary. The header had the keyword and value architecture and the binary part could accommodate all possible formats of integers and floating point (see Summary below). Step 1: The "block" size for fits file is 2880 bytes (which is 36x80 bytes). For now, let us assume that the errant entry is in the first block. Using "dd" I cleave the fits file, "FOC.fits", into "hdr" (1 block) and "data" (the rest of the file). $ infile=FOC $ dd if=${infile}.fits bs=2880 skip=0 count=1 of=hdr $ dd if=${infile}.fits bs=2880 skip=1 of=data Step 2: fits headers were designed to emulate 80-character IBM punch cards. Thus a natural "line" or "record" is 80 bytes long. Using "gsed", I converted "hdr" ito a regular text file with "\n" inserted at every 80 characters (\n not included in the count). Say it is required that line "n" of hdr has to be repaired. A second invocation of "gsed" is used to edit the line (in this case replace the original line by "AIRMASS = 1.0 /". Following this editing I restore the header to conform to fits standard (which has no record breaks). This is done by using "tr" to get rid of "\n". The resulting output file is "hdr_fix". $ nline=34 #set for now $ gsed 's/.\{80\}/&\n/g' hdr | gsed $nline's:^.\{14\}:AIRMASS = 1.0/:' | tr -d '\n' > hdr_fix [Unix arcana: sed cannot deal with \n and so GNU sed or gsed has to be used.] Step 3: Now concatenate the two files and the output file is a repaired file, "FOC_fix.fits" $ cat hdr_fix data > ${infile}_fix.fits ------------------------------------------------------------------------ Stand-alone utility ------------------------------------------------------------------------ #!/bin/bash # repair_fits file1[.fits] file2[.fits] file3[.fits] ... # ouput: if file is damaged then file1_fix.fits, etc nhdr=2 #no of 2880-byte blocks of header (set for DBSP files) if [ $# = 0 ]; then #need at least one file to work on echo "need at least one file"; exit -1 fi for i in $@; do infile=${i%.*} dd if=${infile}.fits bs=2880 skip=0 count=$nhdr of=hdr 2>/dev/null grep -q AIRMASS hdr if [ $? -ne 0 ]; then dd if=${infile}.fits bs=2880 skip=$nhdr of=data 2>/dev/null gsed 's/.\{80\}/&\n/g' hdr | \ gsed -e "/^END/{h;s:.\{40\}:AIRMASS = '1.000 ' / Airmass:p;x;}" -e '$d' | tr -d '\n' > hdr_fix cat hdr_fix data > ${infile}_fix.fits echo "fixed" ${i}.fits "->" ${i}.fix.fits else echo ${i}.fits " is good" fi done ------------------------------------------------------------------------ Appendix A: fits format ------------------------------------------------------------------------ ".fits" files have a human readable header (ascii) and data or data units which are binary filesi (1,2,4 byte integers or 4,8 byte real numbers in a variety of formats). This mixed format is called HDU for Header/DATA Unit. The header(s) and data unit(s) are organized in blocks of 2880 bytes which is also 36x80 bytes. "The header contain 80 bytes lines each of which consists of a keyword of 8 bytes followed in most of the cases by '= ' in the position 9 and 10 and then the value of the keyword. The rest of the line is a comment string beginning with '/'. Each header begins with the following lines SIMPLE = T / file conforms to FITS standard BITPIX = 16 / number of bits per data pixel NAXIS = 2 / number of data axes NAXIS1 = 440 / length of data axis 1 NAXIS2 = 300 / length of data axis 2 which defines the format of the file as standard FITS, the data format and the dimensions of the stored data. One block of 2880 bytes contains 36 lines of 80 characters per line. The header can have several blocks of 36 lines. The last block is identified by the presence of the keyword 'END' The next 2880 bytes block contains the first part of the data. The empty lines after 'END' keyword are filled with blanks and the unused bytes from the end of the data to the end of the 2880 bytes block are filled with NULLs."