------------------------------------------------------------------------
Extracting light curves from ZTF JSON file 
------------------------------------------------------------------------

Lynne Hillenbrand posted this problem: 
   "Given file ZTFS2020j.json how do I extract light curves?"  
[This file in question can be found at on this website]
http://www.astro.caltech.edu/~srk/SRKUnix/Examples/ZTFS2020j.json

We use the Unix tool "jq" to solve this problem. This tool is widely
called as "the sed for JSON files".


------------------------------------------------------------------------
Reconnaissance
------------------------------------------------------------------------

jq is designed to sequentially filter JSON data. An internal pipe,
"|",  moves the data to the next stage. The "." is the simplest
filter. It reproduces the input and by default the output is "pretty
print".

$ jq . ZTF2020j.json > a

Review file "a" in "vi" or equivalent.  Alternatively, or in addition,
load up this file on Firefox. You will get an excellent overview
of the file structure (use "collapse" and "expand", as needed).

Pedagogically the correct command to determine the overall structure
of the data file  is

$ jq '. | length,type' a          #determine the number of keywords

$ jq 'length,type' a              #but, thanks to defaults, can be simplified 
18				  #18 keywords
"object"		          #not an array

$ jq 'keys-unsorted' a            #let us get a summary of keywords
[
  "_id",
  "zvm_program_id",
  "ra",
  "dec",
  "l",
  "b",
  "coordinates",
  "p",
  "source_types",
  "source_flags",
  "history",
  "labels",
  "xmatch",
  "spec",
  "lc",
  "created_by",	
  "created",
  "last_modified"
]
------------------------------------------------------------------------
The photometric data
------------------------------------------------------------------------

The light curve data is an array of objects associated with "lc". 
Let us get the vital statistics of "lc"

$ jq '.lc | type, length' a      #so lc consists of three arrays
"array"
3 

$ jq '.lc[0]' a                  #let us review the first array 

[
  {
    "_id": "jo1tkiex4sr41nr1t9cu2q5u",
    "telescope": "PO:1.2m",
    "instrument": "ZTF",
    "release": "ZTF_sources_20191101",
    "id": 11768202002814,
    "filter": 2,
    "lc_type": "temporal",
    "data": [
      {
        "catflags": 0,
        "chi": 0.779,
        "dec": 43.8856188,
        "expid": 58136421,
        "hjd": 2458335.86724,
        "mag": 15.664,
        "magerr": 0.015,
        "programid": 2,
        "ra": 314.5451081,
        "sharp": -0.032,
        "uexpid": 11768202058136420
      },
      ....
      ....
      ....
      {
      "catflags": 0,
      "chi": 1.419,
      "dec": 43.8856296,
      "expid": 58040686,
      "hjd": 2458334.90986,
      "mag": 15.701,
      "magerr": 0.015,
      "programid": 2,
      "ra": 314.5450891,
      "sharp": -0.035,
      "uexpid": 11768202058040686
    }
  ]
}

From an inspection of the above we conclude that the photometric
data are in an array called "data".

$ jq '.lc[].data | length ' a    #there 77, 36 and 218 epochs
77
36
218

Using grep I extracted relevant summary lines for each dataset.
Clearly, filters and ownership lead to three datasets.

$ jq '.lc' a | grep -B5 "filter"
    "_id": "jo1tkiex4sr41nr1t9cu2q5u",       #first data set (index=0)
    "telescope": "PO:1.2m",
    "instrument": "ZTF",
    "release": "ZTF_sources_20191101",
    "id": 11768202002814,
    "filter": 2,
--
    "_id": "kjq6u5ch9sxv9rrdntwiato2",	     #second data set
    "telescope": "PO:1.2m",
    "instrument": "ZTF",
    "release": "ZTF_sources_20191101",
    "id": 11768201001469,
    "filter": 1,
--
    "_id": "w9614x1x03z8z7y67yr25f2g",      #third data set
    "telescope": "PO:1.2m",
    "instrument": "ZTF",
    "release": "ZTF_sources_20191101",
    "id": 10730521008859,
    "filter": 1,


------------------------------------------------------------------------
Extracting the light curves
------------------------------------------------------------------------

Let us agree to extract only "mag" and "magerr".  

$ jq '.lc[0] | .data[] | .hjd,.mag,.magerr' a | xargs -n 3 > 0.dat
$ jq '.lc[1] | .data[] | .hjd,.mag,.magerr' a | xargs -n 3 > 1.dat
$ jq '.lc[2] | .data[] | .hjd,.mag,.magerr' a | xargs -n 3 > 2.dat

On the other hand, an "all jq" solution:
$ jq -r '.lc[0] | .data[]|[.hjd,.mag,.magerr] | join(" ")' a > 0.dat

You can save typing with this one-liner

$ for ((i=0;i<$(jq ".lc|length" a);i++));do;jq ".lc[$i]|.data[]|.hjd,.mag,.magerr" a|xargs -n 3 >$i.dat;done