------------------------------------------------------------------------ jq ... read and manipulate JSON files ------------------------------------------------------------------------ jq allows the user to read, filter and otherwise manipulate JSON(JavaScript Object Notation) files. The simplest use of jq is to display JSON files in a pretty format. jq allows sequential filtering of the data. JSON structures are built up from keyword=value pairs. The values can be string (" "), numbers (eg. 1 or 1E1 or 0.1), boolean (true|false) or arrays ([0,1,2]) and null. An "object" is a collection of keyword:value a pairs and is enclosed in curly brackets, {}. Arrays are enclosed in square brackets, []. Objects can be nested within objects and arrays can contain objects and so on. JSON is widely used. Here is a both an interesting exercise and also allows for procrastination. $ curl -s http://api.open-notify.org/iss-now.json {"timestamp": 1582870593, "message": "success", "iss_position": {"latitude": "-3.3855", "longitude": "103.3779"}} This is not so easy to read. jq comes to the rescue $ curl -s http://api.open-notify.org/iss-now.json | jq . #-s is silent { "timestamp": 1582870636, "message": "success", "iss_position": { "latitude": "-5.5372", "longitude": "104.9153" } } The "." is the simples filter: like echo, the output is the input. By default the output is "pretty". If you do not want a pretty output then $ jq -c . filename #-c is for "compact" You can extract only the latitude of ISS $ curl -s http://api.open-notify.org/iss-now.json | jq .iss_position.latitude "-5.5372" For future let save the "prettified" data as JSON file $ curl -s http://api.open-notify.org/iss-now.json | jq . | tee iss.json { "iss_position": { "latitude": "6.0713", "longitude": "-109.8659" }, "timestamp": 1583004166, "message": "success" } Source material: https://www.howtogeek.com/529219/how-to-parse-json-files-on-the-linux-command-line-with-jq/ https://programminghistorian.org/en/lessons/json-and-jq [This is a good site for other tutorials] ------------------------------------------------------------------------ A usefeul applications of jq ------------------------------------------------------------------------ Easy to url encode something. Here we want to encode all characters as percent-encoding (for Uniform Resource Identifier or "uri" which is the super-set or URL encoding) $ date Mon Mar 23 14:47:49 PDT 2020 $ date | jq -srR @uri Mon%20Mar%2023%2014%3A47%3A19%20PDT%202020%0A Other @foo possibilities are @csv (for arays), @text, @json, @base64, @sh (for shell), @html (convert <>&'" to < > & ' ") ------------------------------------------------------------------------ I Invoking jq ------------------------------------------------------------------------ $ jq -options 'filter' file1 file2... #files can be "-" or stream from a pipe #some of options are -c "compact output" [\nl and \sp are stripped out] -r "raw output" [if the output is a pure string then "" are stripped out] -j "join" [like -r but jq will not print \nl after each output] -S Output the fields of each object with the keys in sorted order In general, you want to use single quotes around the filter commands. If you do use double quotes then to include double quotes you need to escape those incantations (\"). $ echo '{"a":1,"b":5}' #using single qoutes {"a":1,"b":5} $ echo "{\"a\":1,\"b\":5}" #using double quotes {"a":1,"b":5} ------------------------------------------------------------------------ IIa Simple JSON record ------------------------------------------------------------------------ Let us construct a simple JSON structure. $ cat in1.json {"Name":"Shri","Surname":"Kulkarni","job":"Astronomer","pets":"rabbits"} As can be gathered from the above example, JSON ignores \s,\t,\n do not matter. Also JSON does not have any comment block. JSON is supposed to be self-explanatory. $ jq . in1.json #pretty print { "Name": "Shri", "Surname": "Kulkarni", "job": "Astronomer", "pets": "rabbits" } Separately, you can use Firefox to display a JSON file and it will display it as "pretty print". This is sometimes helpful when you are dealing with files on the web. $ open -a Firefox in1.json #will force Firefox to open in1.json $ jq .Name in1.json #extract a value associated with a given key "Shri" $ jq .name in1.json #null value is returned for non-existent keys null $ jq .Name,.Surname in1.json #use CSV for multiple extractions "Shri" "Kulkarni" You can get multiple values by using a comma separated list. $ jq .Name,.Surname in1.json "Shri" "Kulkarni" However, if some characters (e.g. space,[,] etc) are interpreted by the shell. So you need to protect the filter from being interpreted by the shell. You can use " or '. The latter is stricter and in general is prefered. $ jq '.Name, .Surname' in1.json #quoting allows use of space (clarity) "Shri" "Kulkarni" ------------------------------------------------------------------------ IIb. Types of values ------------------------------------------------------------------------ JSON has six types of values: null, boolean, numbers, strings, conventional arrays and associative arrays ("hashes" in Unix, "objects" in jq). and null. $ echo 'false"true"[]1{}null' | jq type #type of each member "boolean" "string" "array" "number" "object" "null" $ echo '13"ab"[]{}null' | jq length 13 2 0 0 0 $ echo '13"ab"[]{}null' | jq 'length, type' 13 "number" 2 "string" 0 "array" 0 "object" 0 "null" ------------------------------------------------------------------------ III Object Identifier-Index ------------------------------------------------------------------------ The two examples discussed so far ("iss.json" and "in1.json") provide sufficient background to discuss how to read or index into objects. $ jq 'iss_position' { "message": "success", "timestamp": 1584150282, "iss_position": { "latitude": "32.9774", "longitude": "-68.3173" } } $ jq .iss_position iss.json { "latitude": "32.9774", "longitude": "-68.3173" } This is exactly equivalent to $ jq '. | .iss_position' iss.json where "|" is the Unix pipe symbol. The filtering is clear. $ jq '.iss_position.latitude' iss.json $ jq '. | iss_position | .latitude' iss.json "32.9774" Next, if your subsequent program wants to use the value (and not interpret latitude) as a string $ jq -r .iss_position.latitude iss.json #raw output (strip ") 32.9774 If you wanted the pair for the next program $ jq -r '.iss_position.longitude,.iss_position.latitude' iss.json -68.3173 32.9774 If you wanted the pair as an array then you surround the pair by [ .. ] $ jq -r '[.iss_position.longitude,.iss_position.latitude]' iss.json [ "-68.3173", "32.9774" ] ------------------------------------------------------------------------ II. Arrays ------------------------------------------------------------------------ Summary: If counting up, the first element is index 0 and the next one is 1 and so on. Equally, you can start with the last element whose index is -1 and the one below is -2 and so on. For the former, if you want to extract elements from an initial index to the end of the array, do not specify the upper index. You can specify a range but but the upper value has to one more than the index you want! Now let us add an array element to the JSON structure $ cat in2.json {"Name":"Shri","Surname":"Kulkarni","job":"Astronomer", "pets":["parvi","lakshmi","sarsi","rusty","malli","kuro"]} $ jq . in2.json #pretty print it { "Name": "Shri", "Surname": "Kulkarni", "job": "Astronomer", "pets": [ "parvi", "lakshmi", "sarsi", "rusty", "malli", "kuro" ] } $ jq '.pets' in2.json #this is an array of strings [ "parvi", "lakshmi", "sarsi", "rusty", "malli", "kuro" ] $ jq '.pets[0]' in2.json #we need quotes for "[" and "]" to escape the shell "parvi" $ jq '.pets[-2]' in2.json #can index from the end of the array "malli" $ jq '.pets[0,3]' in2.jsona #extract index 0 and 3 "parvi" "malli" $ jq '.pets[0:3]' in2.json #extract a range of 0,1,2 (3 is an upper limit!) [ "parvi", "lakshmi", "sarsi" ] $ jq '.pets[0:10]' in2.json #no penalty if upper limit exceeds the array length [ "parvi", "lakshmi", "sarsi", "rusty", "malli", "kuro" ] $ jq '.pets[2:]' in2.json # display from index 2 through to the end [ "sarsi", "rusty", "malli", "kuro" ] $ jq '.pets[-1]' in2.json "kuro" jq '.pets[:-1]' in2.json [ "parvi", "lakshmi", "sarsi", "rusty", "malli" ] Now let us consider a different example: astronauts on ISS $ curl -s http://api.open-notify.org/astros.json | jq . | tee astro.json { "people": [ { "craft": "ISS", "name": "Andrew Morgan" }, { "craft": "ISS", "name": "Oleg Skripochka" }, { "craft": "ISS", "name": "Jessica Meir" } ], "message": "success", "number": 3 } $ jq 'length, type' astro.json #has three objects 3 "object" $ jq 'keys' astro.json #keys for each object [ "message", "number", "people" ] people is an array which has two elements "craft" and "name". $ jq '.people[2].name' astro.json #specify one index "Jessica Meir" $ jq '.people[0,1,2].name' astro.json #specify multiple indices "Andrew Morgan" "Oleg Skripochka" "Jessica Meir" ------------------------------------------------------------------------ IIIb Array/Object Value Iterator: .[] ------------------------------------------------------------------------ The construct ".[]" returns all the elements of the array or all the value of an object. $ echo "[1,5,3]" | jq '.[]' 1 5 3 $ echo "[1,5,3]" | jq '.[] | (.*. + 3)' #parenthesis is a grouping operator. 4 28 12 $ echo '{"a":1,"b":5}' | jq . { "a": 1, "b": 5 } $ echo '{"a":1,"b":5}' | jq '.[]' 1 5 $ echo '{"a":"hello","b":"kitty"}' | jq . { "a": "hello", "b": "kitty" } $ echo '{"a":"hello","b":"kitty"}' | jq '.[]' "hello" "kitty" $ jq '.people[].name' astro.json "Andrew Morgan" "Oleg Skripochka" "Jessica Meir" Note that the above construct is exactly equal to these two constructs $ jq '.people[] .name ' astro.json $ jq '.people[] | .name ' astro.json If you want to extract a range you need to first get rid of the "[]" and then filter on "name". The way to get rid of "[]" is to use "[]" $ jq '.people[0:2] | .[] | .name' astro.json "Andrew Morgan" "Oleg Skripochka" To peek at nested values you need to specify the path properly $ jq '.iss_position' iss.json { "latitude": "-6.3152", "longitude": "-101.0441" } $ jq '.iss_position.latitude' iss.json $ jq '.iss_position .latitude' iss.json $ jq '.iss_position | .latitude' iss.json "-6.3152" $ jq '.iss_position[]' iss.json #gets rid of the keywords "-6.3152" "-101.0441" But, having "" around numbers is not helpful if you wish to feed this to an analysis program. $ jq -r '.iss_position[]' iss.json #-r is for "raw-ouput" -6.3152 -101.0441 ------------------------------------------------------------------------ II. Elementary Functions (type, length, keys, del, has, select) ------------------------------------------------------------------------ $ jq 'keys' iss.json #default is that keys are lexically sorted [ "iss_position", "message", "timestamp" ] $ jq 'keys_unsorted' iss.json #no sorting (as it is) [ "iss_position", "timestamp", "message" ] $ jq 'keys | .[]' in1.json #extracts the keys as a simple list "Name" "Surname" "job" "pets" $ jq -r 'keys|.[]' in1.json #raw output Name Surname job pets $ jq 'del(.job)' in1.json #deletes specified keyword:value pair { "Name": "Shri", "Surname": "Kulkarni", "pets": "rabbits" } $ jq '. | has("Name")' in1.json #see if field "Name" exists true $ jq '.|select(.Name="Shri")' in1.json { "Name": "Shri", "Surname": "Kulkarni", "job": "Astronomer", "pets": "rabbits" } $ jq '.|select(.Name=="shri")' in1.json #no output since the name is "Shri" and not "shri" However, there is a catch! $ jq '.|select(.Name="shri")' in1.json #produces a result!!! { "Name": "Shri", "Surname": "Kulkarni", "job": "Astronomer", "pets": "rabbits" } The reason is that the stuff in (...) is evaluated to a Boolean value. .Name="shri" is not an equality operation (needs "==") and so it is regarded as a string with non-zero length => true Other functions: has(key) in(key) ------------------------------------------------------------------------ Manipulating & Editing JSON structures ------------------------------------------------------------------------ You can create new structures from a JSON file. For instance, in the iss.json file the "message" keyword is of little use. $ jq "[.iss_position.latitude, .iss_position.longitude, .timestamp]" iss.json [ "-6.3152", "-101.0441", 1583004411 ] We provided "[" and "]" so so that the ouput is now a properly formed array. $ jq "[.iss_position.latitude, .iss_position.longitude, .timestamp-1570000000]" iss.json [ "-6.3152", "-101.0441", 13004411 ] You can also delete a keyword:value pair. $ jq 'del(.message)' iss.json { "iss_position": { "latitude": "-6.3152", "longitude": "-101.0441" }, "timestamp": 1583004411 } ------------------------------------------------------------------------ Complex JSON objects ------------------------------------------------------------------------ We call upon a NASA site which stores information on meteor impact sites around the world. $ curl -s https://data.nasa.gov/resource/y77d-th95.json | jq . > strikes.json Inspection of the file shows the file starts with a "[" and thus it is an array. $ jq 'type,length' strikes.jason "array" 1000 Apparently, there are 1,000 elements within this array. $ jq '.[0]' strikes.json { "name": "Aachen", "id": "1", "nametype": "Valid", "recclass": "L5", "mass": "21", "fall": "Fell", "year": "1880-01-01T00:00:00.000", "reclat": "50.775000", "reclong": "6.083330", "geolocation": { "type": "Point", "coordinates": [ 6.08333, 50.775 ] } } $ jq '.[995].name' strikes.json #name of place for entry 995 "Tirupati" or equivalently $ jq '.[995]| .name ' strikes.json "Tirupati" $ jq '.[] | .name' strikes.json #list of places with meteor strikes You can extract parts of the name $ jq '.[995].name[0:1] ' strikes.json "T" $ jq '.[].name[0:1]' strikes.json (and pipe to sort and uniq -c !) If you want to analyze a range then you need to inform the interpreter that each object must be processed. Apparently, this requires an intermediate step as shown below. $ jq '.[995:] | .[] | .name' strikes.json "Tirupati" "Tissint" "Tjabe" "Tjerebon" "Tomakovka" You can apply functions as you go along $ jq '.[995:]| .[] | .name | length' strikes.json 8 7 5 8 9 To retrieve multiple values from each object we use a comma separated list $ jq '.[450:455] | .[] | .name, .mass' strikes.json "Kaptal-Aryk" "3500" "Karakol" "3000" "Karatu" "2220" "Karewar" "180" "Karkh" "22000" The join function allows you to make a "csv" file $ jq -r ".[450:455] | .[] | [.name, .mass] | join(\", \")" strikes.json Kaptal-Aryk, 3500 Karakol, 3000 Karatu, 2220 Karewar, 180 Karkh, 22000 $ jq '.[995:] | .[] | has("year")' strikes.json true true true true true ------------------------------------------------------------------------ Ic. Simple scalars, vectors and strings ------------------------------------------------------------------------ ... are completely compatible with JSON format $ echo 1 | jq . 1 $ echo \"hello, jello\" | jq . "hello, jello" $ echo "true" | jq . true $ echo "[1,2]" | jq . [ 1, 2 ] $ echo "[\"hello\",\"kitty\"]" | jq . [ "hello", "kitty" ]