------------------------------------------------------------------------ NAS Members ------------------------------------------------------------------------ My friend Dale Frail wanted to correlatd Stanford's "top 2%" astromomers with members of the NAS. To this end I visited the NAS site http://www.nasonline.org/member-directory/ and seleted "astronomy". There were 100 astronomers I found but each page displayed only 10 Inspecting the first page I found that each member is noted on a like like this. .... K. I. Kellermann

... So we filter on it and get rid of html stuff. Use awk to add "\t" preceding the last number (can handle Dutch names like Harry van Woerden); sort on the last name and use tr to get rid of "\t". A simple loop downloads successive pages and quits if no members are found in the latest download. The downloaded data are concatenated to a file which is analyzed as described above. NOTE: I am using "zsh" which allows for both "print" and graceful brace expansion. $ cat listNAS #!/bin/zsh #usage: listNAS section # listNAS #lists members of 12-astronomy (living & dead) # listNAS 13-physics #lists members of 13-physics (") # listNAS 11-mathematics #lists members of 11-mathematics (") # # output on screen & "Members.dat" and "PastMembers.dat" # BUG: maximum number of pages to download is 100 TFILE="t_list" #temporary file CLASS=${1:-"12-astronomy"} #default is "12-astronomy" PAGEMAX=100 #maximum number of pages to download :>$TFILE #empty the output file for PAGENO in {1..$PAGEMAX} do curl -s "http://www.nasonline.org/member-directory/member-search-results.html?primary_section_new=section-$CLASS&page=$PAGENO" > a [ $(grep -c '/members/' a) -eq 0 ] && break cat a >> $TFILE print "page " $PAGENO done #living members print "NAS members (astronomy)\n" grep '/members/' $TFILE | sed 's/^.*">//;s/<.*$//' |\ awk '{$NF="\t"$NF;print}' |\ sort -t$'\t' -k2 | tr '\t' ' ' | tee Members.txt #dead members print "\nDeceased Astronmers\n" grep '/deceased-members/' $TFILE | sed 's/^.*">//;s/<.*$//'| awk '{$NF="\t"$NF;print}' | sort -t$'\t' -k2 | tr '\t' ' ' | tee PastMembers.txt