------------------------------------------------------------------------ XML(VO) files ------------------------------------------------------------------------ XML is a verbose data format of conveying data that has been adopted world wide. It appears to have four parts describe in the Appendix. XML is used by the Virtual Observatory (VO) and so I will use the term VO and XML table interchangeably. In the spirit of swash-bucklers we will build an efficient bash tool to give us a delimited (tsv, csv) table. We will call the input file as $IF. I. Extract the announcement $ ANNOUNCE=$(sed -n '/<TABLE ID/,/<DESCRIPTION>/p' $IF | tr -d '\n' | sed 's/.*name="//;s/".*$//') $ echo $ANNOUNCE III. Extract the header $ grep "<FIELD" $IF | sed 's/^.*name="//;s/".*$//' However, we wish to make the header collection of fields into a single record. The last sed operation is get rid of the trailing $DFS. $ grep "<FIELD" $IF | sed 's/^.*name="//;s/".*$//' | gsed 's/$/'"$DFS"'/g' | tr -d "\n" | sed 's/'"$DFS"'$//' > hdr IV. Extract the data The table consists of lines starting with <TABLEDATA> and ending with </TABLEDATA>. Within this we have records (<TR>..</TR>) and fields (<TD>..</TD>as follows. We will map </TDS> to "$DRS" (record separator) and every </TD> (field) will be mapped to "$DFS" (field separator). For instance, DRS="\n" and DFS="\t" are good default values. To this end, we extract the lines within the TABLEDATA block. Since we are interested in the end ponts </TR> and </TD> we get rid of <TR> and <TD>. To my knowledge, XLM is not line oriented and so I mush all the lines into one long line with "tr -d '\n'". We then map the DRS and DFS. $ sed -n '/<TABLEDATA>/,/<.TABLEDATA>/p' $IF | sed '1d;$d;s/<TR>//g;s/<TD>//g' | tr -d "\n" | gsed 's/<.TD>/'"$DFS"'/g;s/<.TR>/'"$DRS"'/g' > data We are now done $ cat hdr data > table ------------------------------------------------------------------------ APPENDIX A.I The Announcement ------------------------------------------------------------------------ <VOTABLE version="1.1"> <DEFINITIONS> <COOSYS ID="J2000" equinox="2000." epoch="2000." system="eq_FK5" /> </DEFINITIONS> ------------------------------------------------------------------------ Appendix A.II. Preamble "RESOURCE" ------------------------------------------------------------------------ <RESOURCE type="results"> <DESCRIPTION> Results from query to NASA/IPAC Extragalactic Database (NED), which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. This work was (partially) supported by the US National Virtual Observatory development project, which is funded by the National Science Foundation under cooperative agreement AST0122449 with The Johns Hopkins University. </DESCRIPTION> <INFO name="QUERY_STATUS" value="OK"/> <PARAM name="queryDateTime" ucd="time.creation" datatype="char" arraysize="*" value="2021-02-03T16:15:58PST"/> <LINK content-role="query" content-type="text/xml" href="http://ned.ipac.caltech.edu/cgi-bin/calc?in_csys=Equatorial&in_equinox=J2000.0&obs_epoch=2000.0&lon=03h53m04.32000s&lat=+35d35m30.9984s&pa= 0.0&out_csys=Equatorial&out_equinox=J2000.0&of=xml_main&ext=1"/> <TABLE ID="NED_ExtinctionCalculateResults" name="Galactic Extinction Calculation Results for given coordinates "> <DESCRIPTION> Galactic Extinction Calculation Results for given coordinates.</DESCRIPTION> ------------------------------------------------------------------------ Appendix A.III. The Table Caption: "TABLE ID" ------------------------------------------------------------------------ <TABLE ID="NED_ExtinctionCalculateResults" name="Galactic Extinction Calculation Results for given coordinates "> <DESCRIPTION> Galactic Extinction Calculation Results for given coordinates.</DESCRIPTION> III. The Header of the Table: DESCRIPTION AND TABLE FIELDS <FIELD ID="gal_extinc_col1" name="Bandpass" ucd="phot.em" datatype="char" arraysize="*"> <DESCRIPTION> Bandpass common name</DESCRIPTION> </FIELD> ------------------------------------------------------------------------ Appendix A.III The Data (finally) ------------------------------------------------------------------------ <DATA> <TABLEDATA> <TR><TD>Landolt U</TD><TD>0.35</TD><TD> 0.888</TD><TD>2011ApJ...737..103S</TD></TR> <TR><TD>Landolt B</TD><TD>0.43</TD><TD> 0.743</TD><TD>2011ApJ...737..103S</TD></TR> ... <TR><TD>UKIRT K</TD><TD>2.22</TD><TD> 0.073</TD><TD>1998ApJ...500..525S</TD></TR> <TR><TD>UKIRT L'</TD><TD>3.81</TD><TD> 0.031</TD><TD>1998ApJ...500..525S</TD></TR> </TABLEDATA> Each of these four sections are closed off by </DATA> </TABLE> </RESOURCE> </VOTABLE>