------------------------------------------------------------------------ NAS Members ------------------------------------------------------------------------ My friend Dale Frail wanted to correlatd Stanford's "top 2%" astromomers with members of the NAS. To this end I visited the NAS site http://www.nasonline.org/member-directory/ and seleted "astronomy". There were 100 astronomers I found but each page displayed only 10 Inspecting the first page I found that each member is noted on a like like this. .... <a href="http://www.nasonline.org/member-directory/members/53898.html">K. I. Kellermann</a></p> ... So we filter on it and get rid of html stuff. Use awk to add "\t" preceding the last number (can handle Dutch names like Harry van Woerden); sort on the last name and use tr to get rid of "\t". A simple loop downloads successive pages and quits if no members are found in the latest download. The downloaded data are concatenated to a file which is analyzed as described above. NOTE: I am using "zsh" which allows for both "print" and graceful brace expansion. $ cat listNAS #!/bin/zsh #usage: listNAS section # listNAS #lists members of 12-astronomy (living & dead) # listNAS 13-physics #lists members of 13-physics (") # listNAS 11-mathematics #lists members of 11-mathematics (") # # output on screen & "Members.dat" and "PastMembers.dat" # BUG: maximum number of pages to download is 100 TFILE="t_list" #temporary file CLASS=${1:-"12-astronomy"} #default is "12-astronomy" PAGEMAX=100 #maximum number of pages to download :>$TFILE #empty the output file for PAGENO in {1..$PAGEMAX} do curl -s "http://www.nasonline.org/member-directory/member-search-results.html?primary_section_new=section-$CLASS&page=$PAGENO" > a [ $(grep -c '/members/' a) -eq 0 ] && break cat a >> $TFILE print "page " $PAGENO done #living members print "NAS members (astronomy)\n" grep '/members/' $TFILE | sed 's/^.*">//;s/<.*$//' |\ awk '{$NF="\t"$NF;print}' |\ sort -t$'\t' -k2 | tr '\t' ' ' | tee Members.txt #dead members print "\nDeceased Astronmers\n" grep '/deceased-members/' $TFILE | sed 's/^.*">//;s/<.*$//'| awk '{$NF="\t"$NF;print}' | sort -t$'\t' -k2 | tr '\t' ' ' | tee PastMembers.txt