------------------------------------------------------------------------ Extracting light curves from ZTF JSON file ------------------------------------------------------------------------ Lynne Hillenbrand posted this problem: "Given file ZTFS2020j.json how do I extract light curves?" [This file in question can be found at on this website] http://www.astro.caltech.edu/~srk/SRKUnix/Examples/ZTFS2020j.json We use the Unix tool "jq" to solve this problem. This tool is widely called as "the sed for JSON files". ------------------------------------------------------------------------ Reconnaissance ------------------------------------------------------------------------ jq is designed to sequentially filter JSON data. An internal pipe, "|", moves the data to the next stage. The "." is the simplest filter. It reproduces the input and by default the output is "pretty print". $ jq . ZTF2020j.json > a Review file "a" in "vi" or equivalent. Alternatively, or in addition, load up this file on Firefox. You will get an excellent overview of the file structure (use "collapse" and "expand", as needed). Pedagogically the correct command to determine the overall structure of the data file is $ jq '. | length,type' a #determine the number of keywords $ jq 'length,type' a #but, thanks to defaults, can be simplified 18 #18 keywords "object" #not an array $ jq 'keys-unsorted' a #let us get a summary of keywords [ "_id", "zvm_program_id", "ra", "dec", "l", "b", "coordinates", "p", "source_types", "source_flags", "history", "labels", "xmatch", "spec", "lc", "created_by", "created", "last_modified" ] ------------------------------------------------------------------------ The photometric data ------------------------------------------------------------------------ The light curve data is an array of objects associated with "lc". Let us get the vital statistics of "lc" $ jq '.lc | type, length' a #so lc consists of three arrays "array" 3 $ jq '.lc[0]' a #let us review the first array [ { "_id": "jo1tkiex4sr41nr1t9cu2q5u", "telescope": "PO:1.2m", "instrument": "ZTF", "release": "ZTF_sources_20191101", "id": 11768202002814, "filter": 2, "lc_type": "temporal", "data": [ { "catflags": 0, "chi": 0.779, "dec": 43.8856188, "expid": 58136421, "hjd": 2458335.86724, "mag": 15.664, "magerr": 0.015, "programid": 2, "ra": 314.5451081, "sharp": -0.032, "uexpid": 11768202058136420 }, .... .... .... { "catflags": 0, "chi": 1.419, "dec": 43.8856296, "expid": 58040686, "hjd": 2458334.90986, "mag": 15.701, "magerr": 0.015, "programid": 2, "ra": 314.5450891, "sharp": -0.035, "uexpid": 11768202058040686 } ] } From an inspection of the above we conclude that the photometric data are in an array called "data". $ jq '.lc[].data | length ' a #there 77, 36 and 218 epochs 77 36 218 Using grep I extracted relevant summary lines for each dataset. Clearly, filters and ownership lead to three datasets. $ jq '.lc' a | grep -B5 "filter" "_id": "jo1tkiex4sr41nr1t9cu2q5u", #first data set (index=0) "telescope": "PO:1.2m", "instrument": "ZTF", "release": "ZTF_sources_20191101", "id": 11768202002814, "filter": 2, -- "_id": "kjq6u5ch9sxv9rrdntwiato2", #second data set "telescope": "PO:1.2m", "instrument": "ZTF", "release": "ZTF_sources_20191101", "id": 11768201001469, "filter": 1, -- "_id": "w9614x1x03z8z7y67yr25f2g", #third data set "telescope": "PO:1.2m", "instrument": "ZTF", "release": "ZTF_sources_20191101", "id": 10730521008859, "filter": 1, ------------------------------------------------------------------------ Extracting the light curves ------------------------------------------------------------------------ Let us agree to extract only "mag" and "magerr". $ jq '.lc[0] | .data[] | .hjd,.mag,.magerr' a | xargs -n 3 > 0.dat $ jq '.lc[1] | .data[] | .hjd,.mag,.magerr' a | xargs -n 3 > 1.dat $ jq '.lc[2] | .data[] | .hjd,.mag,.magerr' a | xargs -n 3 > 2.dat On the other hand, an "all jq" solution: $ jq -r '.lc[0] | .data[]|[.hjd,.mag,.magerr] | join(" ")' a > 0.dat You can save typing with this one-liner $ for ((i=0;i<$(jq ".lc|length" a);i++));do;jq ".lc[$i]|.data[]|.hjd,.mag,.magerr" a|xargs -n 3 >$i.dat;done