------------------------------------------------------------------------ TSA daily passenger flux ------------------------------------------------------------------------ Bearing Covid in mind, TSA has provided a comparision of the daily flux of passengers between 2019 and 2020. https://www.tsa.gov/coronavirus/passenger-throughput However, the number of entries is limited to 250. So for a year you would need to make two calls. The second call looks like https://www.tsa.gov/coronavirus/passenger-throughput?page=1 Let us capture the URL in a bash variable. Notice that I have left out the value of the page $ URL="https://www.tsa.gov/coronavirus/passenger-throughput?page=" You can see that these two calls will result in two files a0 and a1 $ curl -o a0 "$URL"0 #we quote $URL since we do not wish the shell to interpret "?" $ curl -o a1 "$URL"1 $ awk -F">" '/tbody/,/\/tbody/{print $2}' a0 | sed 's/<.*$//;s/,//g' | cat -s | awk '{print $1,$2,$3}' RS="" FS="\n" will produce the necessary three column ASCII table suitable for ingestion into a Python or MATLAB program. In order to understand the sequence of commands you may wish to review the stream at each break point (e.g. after the awk command, after the sed command etc). Why stop there? Get rid of the intermediate files! $ :> TSA.dat #this creates file TSA.dat if it did not exist or clears it if it does exist $ true > TSA.dat #equivalent $ for i in {0..1}; do curl "$URL"$i | awk -F">" '/tbody/,/\/tbody/{print $2}' | sed 's/<.*$//;s/,//g' | \ cat -s | awk '{print $1,$2,$3}' RS="" FS="\n" >> TSA.dat done Now you can analyze TSA.dat in MATLAB or Python $ tail -5 TSA.dat 3/5/2020 2130015 2402692 3/4/2020 1877401 2143619 3/3/2020 1736393 1979558 3/2/2020 2089641 2257920 3/1/2020 2280522 2301439