{"id":2459,"date":"2008-05-13T09:36:51","date_gmt":"2008-05-13T13:36:51","guid":{"rendered":"https:\/\/www.quirkspace.com\/quirkblog\/?p=1241"},"modified":"2008-05-13T09:36:51","modified_gmt":"2008-05-13T13:36:51","slug":"unadulterated-geekery","status":"publish","type":"post","link":"https:\/\/www.quirkspace.com\/quirkblog\/?p=2459","title":{"rendered":"Unadulterated Geekery"},"content":{"rendered":"<p>Ok, so there&#8217;s a bit of geekery below the cut for those of you who like looking through shell script source code.  All others should move on&#8230;<\/p>\n<p>Still there?  Ok, you have been warned.<\/p>\n<p>Here&#8217;s the background: during the summer, my new job tends to have a few slack periods when people aren&#8217;t bringing in new work.  Rather than sit around twiddling our thumbs, we go back through our old work and see what needs improvement.  There&#8217;s a rather large database of images which includes a &#8220;quality control&#8221; feedback form.  Images that score low on quality get flagged for review and repair.<\/p>\n<p>The trouble is, we&#8217;ve got literally tens of thousands of images in hundreds of directories.  Gathering up the bad ones takes hours.  So I wrote a script to do that for us.  It was able to go through about 10,000 images and find 120 or so of the ones that we needed in about seven minutes.<\/p>\n<p>Code is below the break&#8230;<\/p>\n<p><!--more--><\/p>\n<p><code>#!\/bin\/bash<\/p>\n<p>##<\/p>\n<p># This script will page through a list of file names and then<\/p>\n<p># perform a recursive search of all files and directories in a<\/p>\n<p># user specified \"source\" path, copying any matching files to a<\/p>\n<p># second, user specified \"destination\" directory in their local<\/p>\n<p># machine's home directory.<\/p>\n<p>##<\/p>\n<p>## Has the script been invoked correctly<\/p>\n<p>if [ -z \"$3\" ]; then<\/p>\n<p> echo command $0 needs source, destination and matchlist<\/p>\n<p> exit<\/p>\n<p>fi<\/p>\n<p>## Assign command-line arguments to variables<\/p>\n<p>source=$1<\/p>\n<p>destination=~\/$2<\/p>\n<p>matchlist=$3<\/p>\n<p>##<\/p>\n<p># NOTE!  The ~\/ in the destination assignment forces the script<\/p>\n<p># to copy files to the user's home directory.  This should help<\/p>\n<p># avoid a potential recursive condition where the destination<\/p>\n<p># directory is one of the ones being searched...<\/p>\n<p>##<\/p>\n<p>## Check if the source, destination and matchlist actually exist<\/p>\n<p>if [ ! -d $source ]; then<\/p>\n<p> echo $source is not a directory<\/p>\n<p> exit<\/p>\n<p>fi<\/p>\n<p>if [ -e $destination ]; then<\/p>\n<p> echo $destination already exists<\/p>\n<p> exit<\/p>\n<p>fi<\/p>\n<p>if [ ! -e $matchlist ]; then<\/p>\n<p> echo $matchlist does not exist<\/p>\n<p> exit<\/p>\n<p>fi<\/p>\n<p>## Create destination directory<\/p>\n<p>mkdir $destination<\/p>\n<p>check=$?<\/p>\n<p>if [ ! $check -eq 0 ]; then<\/p>\n<p> echo error creating destination directory<\/p>\n<p> exit<\/p>\n<p>fi<\/p>\n<p>## Create empty log files<\/p>\n<p>touch $destination\/not_found.txt<\/p>\n<p>touch $destination\/logfile.txt<\/p>\n<p>## Create lookup file from source directories<\/p>\n<p>echo Building lookup table...<\/p>\n<p>ls -R $source > $destination\/files_searched.txt<\/p>\n<p>## Prepare to loop through files in source<\/p>\n<p>line_count=$(wc -l < \"$matchlist\")\n\n\n\necho there are $line_count lines in the search file\n\n\n\ncount=0\n\n\n\n##\n\n# Now to go through the specified matchlist file line by line.\n\n# First, 'grep' checks to see if 'find' would be fruitful -\n\n# this saves a lot of time when there are large numbers of\n\n# files to search through.  The results of the lookup check are\n\n# logged, and the script continues with the 'find' command to\n\n# get the full directory path, and to make sure that the hit\n\n# isn't for a directory instead of a file.  If 'find' fails, it\n\n# logs the query to the not_found.txt file.  On success, it\n\n# copies the file and logs the action.\n\n##\n\n\n\nwhile [ \"$count\" -le $line_count ]\n\ndo\n\n read query\n\n echo looking for $query\n\n\n\n## Check the lookup file first - it's faster\n\n\n\n grep -i \"$query\" $destination\/files_searched.txt\n\n check=$?\n\n if [ ! $check -eq 0 ]; then\n\n  echo $query not found in lookup file\n\n  echo \"$query\" not found in lookup file >> $destination\/not_found.txt<\/p>\n<p> else<\/p>\n<p>## If it's in the lookup file, get exact location and copy<\/p>\n<p>  result=`find $source -iname $query -type f`<\/p>\n<p>  if [ -z \"$result\" ]; then<\/p>\n<p>    echo NOTICE - $query not found after file search - it may be a directory<\/p>\n<p>    echo NOTICE - \"$query\" not found after file search - it may be a directory\\<\/p>\n<p>         >> $destination\/not_found.txt<\/p>\n<p>   else<\/p>\n<p>    cp $result $destination<\/p>\n<p>    echo copied \"$source\"\/\"$query\" to \"$destination\"\/\"$query\"\\<\/p>\n<p>         >> $destination\/logfile.txt<\/p>\n<p>   fi<\/p>\n<p> fi<\/p>\n<p> let \"count += 1\"<\/p>\n<p>done <\"$matchlist\"\n\n\n\nexit 0\n\n<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ok, so there&#8217;s a bit of geekery below the cut for those of you who like looking through shell script source code. All others should move on&#8230; Still there? Ok, you have been warned. Here&#8217;s the background: during the summer, &hellip; <a href=\"https:\/\/www.quirkspace.com\/quirkblog\/?p=2459\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[26],"class_list":["post-2459","post","type-post","status-publish","format-standard","hentry","tag-technical-notes"],"_links":{"self":[{"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=\/wp\/v2\/posts\/2459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2459"}],"version-history":[{"count":0,"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=\/wp\/v2\/posts\/2459\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.quirkspace.com\/quirkblog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}