------------------------------------------------------------------------
jq ... read and manipulate JSON files
------------------------------------------------------------------------

jq allows the user to read, filter and otherwise manipulate
JSON(JavaScript Object Notation) files. The simplest use of jq is
to display JSON files in a pretty format. jq allows sequential
filtering of the data.


JSON structures are built up from keyword=value pairs. The values
can be string (" "), numbers (eg. 1 or 1E1 or 0.1), boolean
(true|false) or arrays ([0,1,2]) and null. An "object" is a collection
of keyword:value a pairs and is enclosed in curly brackets, {}.
Arrays are enclosed in square brackets, []. Objects can be nested
within objects and arrays can contain objects and so on.

JSON is widely used. Here is a both an interesting exercise and
also allows for procrastination.

$ curl -s http://api.open-notify.org/iss-now.json 
{"timestamp": 1582870593, "message": "success", "iss_position": {"latitude": "-3.3855", "longitude": "103.3779"}}

This is not so easy to read. jq comes to the rescue
$ curl -s http://api.open-notify.org/iss-now.json | jq .    #-s is silent
{
  "timestamp": 1582870636,
  "message": "success",
  "iss_position": {
    "latitude": "-5.5372",
    "longitude": "104.9153"
  }
}
The "." is the simples filter: like echo, the  output is  the input. By default
the output is "pretty". If you do not want a pretty output then

$ jq -c . filename   #-c is for "compact"

You can extract only the latitude of ISS
$ curl -s http://api.open-notify.org/iss-now.json | jq .iss_position.latitude
"-5.5372"

For future let save the "prettified" data as JSON file
$ curl -s http://api.open-notify.org/iss-now.json | jq . | tee  iss.json
{
  "iss_position": {
    "latitude": "6.0713",
    "longitude": "-109.8659"
  },
  "timestamp": 1583004166,
  "message": "success"
}

Source material:
https://www.howtogeek.com/529219/how-to-parse-json-files-on-the-linux-command-line-with-jq/

https://programminghistorian.org/en/lessons/json-and-jq
[This is a good site for other tutorials]

------------------------------------------------------------------------
A usefeul applications of jq
------------------------------------------------------------------------

Easy to url encode something. Here we want to encode all characters
as percent-encoding (for Uniform Resource Identifier or "uri" which is
the super-set or URL encoding)

$ date
Mon Mar 23 14:47:49 PDT 2020

$ date | jq -srR @uri
Mon%20Mar%2023%2014%3A47%3A19%20PDT%202020%0A

Other @foo possibilities are @csv (for arays), @text, @json, @base64,
@sh (for shell), @html (convert <>&'" to &lt; &gt; &amp; &apos; &quot;)

------------------------------------------------------------------------
I  Invoking jq
------------------------------------------------------------------------

$ jq -options 'filter' file1 file2...  #files can be "-" or stream from a pipe
	#some of options are 
-c  "compact output" [\nl and \sp are stripped out]
-r  "raw output" [if the output is a pure string then "" are stripped out]
-j  "join" [like -r but jq will not print \nl after each output]
-S  Output the fields of each object with the keys in sorted order

In general, you want to use single quotes around the filter commands.
If you do use double quotes then to include double quotes you need to
escape those incantations (\").

$ echo '{"a":1,"b":5}'		#using single qoutes
{"a":1,"b":5}

$ echo "{\"a\":1,\"b\":5}"      #using double quotes
{"a":1,"b":5}

------------------------------------------------------------------------
IIa Simple JSON record
------------------------------------------------------------------------

Let us construct a simple JSON structure.

$ cat in1.json       
{"Name":"Shri","Surname":"Kulkarni","job":"Astronomer","pets":"rabbits"}

As can be gathered from the above example, JSON ignores \s,\t,\n
do not matter.  Also JSON does not have any comment block. JSON is
supposed to be self-explanatory.


$ jq . in1.json		#pretty print
{
  "Name": "Shri",
  "Surname": "Kulkarni",
  "job": "Astronomer",
  "pets": "rabbits"
}

Separately, you can use Firefox to display a JSON file and it will
display it as "pretty print". This is sometimes helpful when you
are dealing with files on the web.

$ open -a Firefox in1.json   #will force Firefox to open in1.json

$ jq .Name in1.json             #extract a value associated with a given key
"Shri"

$ jq .name in1.json             #null value is returned for non-existent keys
null                  

$ jq  .Name,.Surname  in1.json  #use CSV for multiple extractions
"Shri"
"Kulkarni"

You can get multiple values by using a comma separated list.
$ jq .Name,.Surname in1.json
"Shri"
"Kulkarni"

However, if some characters (e.g. space,[,] etc) are interpreted
by the shell. So you need to protect the filter from being interpreted
by the shell. You can use " or '. The latter is stricter and in
general is prefered.

$ jq '.Name, .Surname' in1.json  #quoting allows use of space (clarity)
"Shri"
"Kulkarni"

------------------------------------------------------------------------
IIb. Types of values
------------------------------------------------------------------------

JSON has six types of values: null, boolean, numbers, strings,
conventional arrays and associative arrays ("hashes" in Unix,
"objects" in jq).  and null.

$ echo 'false"true"[]1{}null' | jq type   #type of each member
"boolean"
"string"
"array"
"number"
"object"
"null"

$ echo '13"ab"[]{}null' | jq length
13
2
0
0
0

$ echo '13"ab"[]{}null' | jq 'length, type'
13
"number"
2
"string"
0
"array"
0
"object"
0
"null"

------------------------------------------------------------------------
III Object Identifier-Index
------------------------------------------------------------------------
The two examples discussed so far ("iss.json" and "in1.json") provide
sufficient background to discuss how to read or index into objects.

$ jq 'iss_position'
{
  "message": "success",
  "timestamp": 1584150282,
  "iss_position": {
    "latitude": "32.9774",
    "longitude": "-68.3173"
  }
}

$ jq .iss_position iss.json
{
  "latitude": "32.9774",
  "longitude": "-68.3173"
}

This is exactly equivalent to
$ jq '. | .iss_position' iss.json
where "|" is the Unix pipe symbol. The filtering is clear.

$ jq '.iss_position.latitude' iss.json
$ jq '. | iss_position | .latitude' iss.json
"32.9774"

Next, if your subsequent program wants to use the value (and not
interpret latitude) as a string

$ jq -r .iss_position.latitude iss.json   #raw output (strip ")
32.9774

If you wanted the pair for the next program 
$ jq -r '.iss_position.longitude,.iss_position.latitude' iss.json
-68.3173
32.9774

If you wanted the pair as an array then you surround the pair by
[ .. ]
$ jq -r '[.iss_position.longitude,.iss_position.latitude]' iss.json 
[
  "-68.3173",
  "32.9774"
]

------------------------------------------------------------------------
II. Arrays
------------------------------------------------------------------------

Summary: If counting up, the first element is index 0 and the next
one is 1 and so on. Equally, you can start with the last element
whose index is -1 and the one below is -2 and so on. For the former,
if you want to extract elements from an initial index to the end
of the array, do not specify the upper index. You can specify a
range but but the upper value has to one more than the index you
want!

Now let us add an array element to the JSON structure

$ cat in2.json 
{"Name":"Shri","Surname":"Kulkarni","job":"Astronomer",
"pets":["parvi","lakshmi","sarsi","rusty","malli","kuro"]}

$ jq . in2.json    #pretty print it
{
  "Name": "Shri",
  "Surname": "Kulkarni",
  "job": "Astronomer",
  "pets": [
    "parvi",
    "lakshmi",
    "sarsi",
    "rusty",
    "malli",
    "kuro"
  ]
}


$ jq '.pets' in2.json   #this is an array of strings
[
  "parvi",
  "lakshmi",
  "sarsi",
  "rusty",
  "malli",
  "kuro"
]

$ jq '.pets[0]' in2.json    #we need quotes for "[" and "]" to escape the shell
"parvi"

$ jq '.pets[-2]' in2.json   #can index from the end of the array
"malli"

$ jq '.pets[0,3]' in2.jsona  #extract index 0 and 3
"parvi"
"malli"

$ jq '.pets[0:3]' in2.json   #extract a range of 0,1,2 (3 is an upper limit!)
[
  "parvi",
  "lakshmi",
  "sarsi"
]

$ jq '.pets[0:10]' in2.json    #no penalty if upper limit exceeds the array length
[
  "parvi",
  "lakshmi",
  "sarsi",
  "rusty",
  "malli",
  "kuro"
]

$ jq '.pets[2:]' in2.json   # display from index 2 through to the end
[
  "sarsi",
  "rusty",
  "malli",
  "kuro"
]

$ jq '.pets[-1]' in2.json
"kuro"

jq '.pets[:-1]' in2.json 
[
  "parvi",
  "lakshmi",
  "sarsi",
  "rusty",
  "malli"
]


Now let us consider a different example: astronauts on ISS

$ curl -s http://api.open-notify.org/astros.json | jq . | tee astro.json
{
  "people": [
    {
      "craft": "ISS",
      "name": "Andrew Morgan"
    },
    {
      "craft": "ISS",
      "name": "Oleg Skripochka"
    },
    {
      "craft": "ISS",
      "name": "Jessica Meir"
    }
  ],
  "message": "success",
  "number": 3
}


$ jq 'length, type' astro.json   #has three objects
3
"object"

$ jq 'keys' astro.json     #keys for each object
[
  "message",
  "number",
  "people"
]

people is an array which has two elements "craft" and "name".

$ jq '.people[2].name' astro.json     #specify one index
"Jessica Meir"

$ jq '.people[0,1,2].name' astro.json   #specify multiple indices
"Andrew Morgan"
"Oleg Skripochka"
"Jessica Meir"

------------------------------------------------------------------------
IIIb Array/Object Value Iterator: .[]
------------------------------------------------------------------------
The construct ".[]" returns all the elements of the array or all the value
of an object.

$ echo "[1,5,3]" | jq '.[]'
1
5
3

$ echo "[1,5,3]" | jq '.[] | (.*. + 3)'    #parenthesis is a grouping operator. 
4
28
12


$ echo '{"a":1,"b":5}' | jq .		
{
  "a": 1,
  "b": 5
}

$ echo '{"a":1,"b":5}' | jq '.[]'
1
5

$ echo '{"a":"hello","b":"kitty"}' | jq .
{
  "a": "hello",
  "b": "kitty"
}

$ echo '{"a":"hello","b":"kitty"}' | jq '.[]'
"hello"
"kitty"


$ jq '.people[].name' astro.json  
"Andrew Morgan"
"Oleg Skripochka"
"Jessica Meir"

Note that the above construct is exactly equal to these two constructs
$ jq '.people[] .name ' astro.json   
$ jq '.people[] | .name ' astro.json   

If you want to extract a range you need to first get rid of the "[]" and
then filter on "name". The way to get rid of "[]" is to use "[]"

$ jq '.people[0:2] | .[] | .name' astro.json
"Andrew Morgan"
"Oleg Skripochka"


To peek at nested values you need to specify the path properly

$ jq '.iss_position' iss.json  
{
  "latitude": "-6.3152",
  "longitude": "-101.0441"
}

$ jq '.iss_position.latitude' iss.json
$ jq '.iss_position .latitude' iss.json
$ jq '.iss_position | .latitude' iss.json
"-6.3152"

$ jq '.iss_position[]' iss.json     #gets rid of the keywords
"-6.3152"
"-101.0441"

But, having "" around numbers is not helpful if you wish to feed this
to an analysis program.

$ jq -r '.iss_position[]' iss.json    #-r is for "raw-ouput"
-6.3152
-101.0441

------------------------------------------------------------------------
II. Elementary Functions (type, length, keys, del, has, select)
------------------------------------------------------------------------

$ jq 'keys' iss.json           #default is that keys are lexically sorted
[
  "iss_position",
  "message",
  "timestamp"
]

$ jq 'keys_unsorted' iss.json      #no sorting (as it is)
[
  "iss_position",
  "timestamp",
  "message"
]

$ jq 'keys | .[]' in1.json	#extracts the keys as a simple list
"Name"
"Surname"
"job"
"pets"

$ jq -r 'keys|.[]' in1.json     #raw output 
Name
Surname
job
pets

$ jq 'del(.job)' in1.json       #deletes specified keyword:value pair
{
  "Name": "Shri",
  "Surname": "Kulkarni",
  "pets": "rabbits"
} 

$ jq '. | has("Name")' in1.json    #see if field "Name" exists
true

$ jq '.|select(.Name="Shri")' in1.json
{
  "Name": "Shri",
  "Surname": "Kulkarni",
  "job": "Astronomer",
  "pets": "rabbits"
}

$ jq '.|select(.Name=="shri")' in1.json 
          #no output since the name is "Shri" and not "shri"

However, there is a catch!

$ jq '.|select(.Name="shri")' in1.json   #produces a result!!!
{
  "Name": "Shri",
  "Surname": "Kulkarni",
  "job": "Astronomer",
  "pets": "rabbits"
}

The reason is that the stuff in (...) is evaluated to a Boolean
value.  .Name="shri" is not an equality operation (needs "==") and
so it is regarded as a string with non-zero length => true

Other functions:
	has(key)
	in(key)


------------------------------------------------------------------------
Manipulating & Editing JSON structures
------------------------------------------------------------------------
You can create new structures from a JSON file. For instance, in the
iss.json file the "message" keyword is of little use.

$ jq "[.iss_position.latitude, .iss_position.longitude, .timestamp]" iss.json
[
  "-6.3152",
  "-101.0441",
  1583004411
]
We provided "[" and "]" so so that the ouput is now a properly formed array.

$ jq "[.iss_position.latitude, .iss_position.longitude, .timestamp-1570000000]" iss.json 
[
  "-6.3152",
  "-101.0441",
  13004411
]


You can also delete a keyword:value pair.
$ jq 'del(.message)' iss.json 
{
  "iss_position": {
    "latitude": "-6.3152",
    "longitude": "-101.0441"
  },
  "timestamp": 1583004411
}

------------------------------------------------------------------------
Complex JSON objects
------------------------------------------------------------------------

We call upon a NASA site which stores information on meteor impact sites
around the world.

$ curl -s https://data.nasa.gov/resource/y77d-th95.json | jq . > strikes.json

Inspection of the file shows the file starts with a "[" and thus it is an array.

$ jq 'type,length' strikes.jason
"array"
1000

Apparently, there are 1,000 elements within this array.
$ jq '.[0]' strikes.json
{
  "name": "Aachen",
  "id": "1",
  "nametype": "Valid",
  "recclass": "L5",
  "mass": "21",
  "fall": "Fell",
  "year": "1880-01-01T00:00:00.000",
  "reclat": "50.775000",
  "reclong": "6.083330",
  "geolocation": {
    "type": "Point",
    "coordinates": [
      6.08333,
      50.775
    ]
  }
}

$ jq '.[995].name' strikes.json    #name of place for entry 995
"Tirupati"
	or equivalently 
$ jq '.[995]| .name ' strikes.json
"Tirupati"

$ jq '.[] | .name' strikes.json   #list of places with meteor strikes

You can extract parts of the name
$ jq '.[995].name[0:1] ' strikes.json
"T"
$ jq '.[].name[0:1]' strikes.json (and pipe to sort and uniq -c !)


If you want to analyze a range then you need to inform the interpreter that
each object must be processed. Apparently, this requires an intermediate step
as shown below.

$ jq '.[995:] | .[] | .name' strikes.json
"Tirupati"
"Tissint"
"Tjabe"
"Tjerebon"
"Tomakovka"

You can apply functions as you go along

$ jq '.[995:]| .[] | .name | length' strikes.json   
8
7
5
8
9



To retrieve multiple values from each object we use a comma separated list

$ jq '.[450:455] | .[] | .name, .mass' strikes.json
"Kaptal-Aryk"
"3500"
"Karakol"
"3000"
"Karatu"
"2220"
"Karewar"
"180"
"Karkh"
"22000"

The join function allows you to make a "csv" file

$ jq -r ".[450:455] | .[] | [.name, .mass] | join(\", \")" strikes.json
Kaptal-Aryk, 3500
Karakol, 3000
Karatu, 2220
Karewar, 180
Karkh, 22000

$ jq '.[995:] | .[] | has("year")' strikes.json
true
true
true
true
true


------------------------------------------------------------------------
Ic. Simple scalars, vectors and strings 
------------------------------------------------------------------------
 ... are completely compatible with JSON format

$ echo 1 | jq .
1

$ echo \"hello, jello\" | jq .
"hello, jello"

$ echo "true" | jq .
true

$ echo "[1,2]" | jq .
[
  1,
  2
]

$ echo "[\"hello\",\"kitty\"]" | jq .
[
  "hello",
  "kitty"
]