Ok, so there’s a bit of geekery below the cut for those of you who like looking through shell script source code. All others should move on…
Still there? Ok, you have been warned.
Here’s the background: during the summer, my new job tends to have a few slack periods when people aren’t bringing in new work. Rather than sit around twiddling our thumbs, we go back through our old work and see what needs improvement. There’s a rather large database of images which includes a “quality control” feedback form. Images that score low on quality get flagged for review and repair.
The trouble is, we’ve got literally tens of thousands of images in hundreds of directories. Gathering up the bad ones takes hours. So I wrote a script to do that for us. It was able to go through about 10,000 images and find 120 or so of the ones that we needed in about seven minutes.
Code is below the break…
# This script will page through a list of file names and then
# perform a recursive search of all files and directories in a
# user specified "source" path, copying any matching files to a
# second, user specified "destination" directory in their local
# machine's home directory.
## Has the script been invoked correctly
if [ -z "$3" ]; then
echo command $0 needs source, destination and matchlist
## Assign command-line arguments to variables
# NOTE! The ~/ in the destination assignment forces the script
# to copy files to the user's home directory. This should help
# avoid a potential recursive condition where the destination
# directory is one of the ones being searched...
## Check if the source, destination and matchlist actually exist
if [ ! -d $source ]; then
echo $source is not a directory
if [ -e $destination ]; then
echo $destination already exists
if [ ! -e $matchlist ]; then
echo $matchlist does not exist
## Create destination directory
if [ ! $check -eq 0 ]; then
echo error creating destination directory
## Create empty log files
## Create lookup file from source directories
echo Building lookup table...
ls -R $source > $destination/files_searched.txt
## Prepare to loop through files in source
line_count=$(wc -l < "$matchlist") echo there are $line_count lines in the search file count=0 ## # Now to go through the specified matchlist file line by line. # First, 'grep' checks to see if 'find' would be fruitful - # this saves a lot of time when there are large numbers of # files to search through. The results of the lookup check are # logged, and the script continues with the 'find' command to # get the full directory path, and to make sure that the hit # isn't for a directory instead of a file. If 'find' fails, it # logs the query to the not_found.txt file. On success, it # copies the file and logs the action. ## while [ "$count" -le $line_count ] do read query echo looking for $query ## Check the lookup file first - it's faster grep -i "$query" $destination/files_searched.txt check=$? if [ ! $check -eq 0 ]; then echo $query not found in lookup file echo "$query" not found in lookup file >> $destination/not_found.txt
## If it's in the lookup file, get exact location and copy
result=`find $source -iname $query -type f`
if [ -z "$result" ]; then
echo NOTICE - $query not found after file search - it may be a directory
echo NOTICE - "$query" not found after file search - it may be a directory\
cp $result $destination
echo copied "$source"/"$query" to "$destination"/"$query"\
let "count += 1"
done <"$matchlist" exit 0