Monday, July 05, 2010

Scripting in linux for n00bs

As ever, I'm standing on the shoulders of giants in order to see a bit further. Or to really annoy the giants, if they don't like having scrawny English men clambering over them in their quest to see a bit further. Really, it's not very polite of them. And to think, we're meant to be famous for our gentility. An English gentleman never gives offence unknowingly. But they never said anything about hauling yourself over gigantic people without asking first.  Usual rider about technological gubbins for what follows...

Anyway, there's some helpful advice here on how to make a script iterate through an array of values.  This is quite useful, because I want to do the same thing to 20 different words, combined with one another in every possible way.  Running a script 400 times and changing the parameters each time would not be very interesting.

If you want to iterate through an array, you need to have something in your array first.  I could hardcode the elements, but I might want to run my program more than once against different sets of data, or I might want to change the words that I'm hunting for.  So here's a quick explanation of how to populate your array (the third example is the one we want, for reading values in from a file).

OK.  Now I have a way to get a set of words from a file, and iterate through them, building combinations of each word in the set with all the words in the set.  Now I want to do something with these word combinations - which is going to be the same each time.  So I need to create a script to do that; and that's just a file with a bunch of commands in it, but with each parameter in the script modified to be either $1 or $2, depending on whether it's the first or second word in my process. Thanks to here for some helpful examples.

Like so:



filearray=( `cat commonwords | tr '\n' ' '`)


for firstword in ${filearray[@]}


do
for lastword in ${filearray[@]}
do
./word_extractor_script $firstword $lastword
done


done


What's going on here?


filearray=( `cat commonwords | tr '\n' ' '`) gets all the words in from my commonwords file.  There's then two for loops.  They're nested; the first loop looks at each word in the file in turn, and then combines with every word in the file; so if the only words in there were aardvark, beetle, and cat, it would end up looping through
aardvark aardvark
aardvark beetle
aardvark cat
beetle aardvark
beetle beetle
beetle cat
cat aardvark
cat beetle
cat cat
Now reading that was pretty tedious, but typing it was even worse, let me assure you.  Happily, because of our script above us, we could generate all those combinations without having to do any more than run the script.  It might be a bit difficult to read, but I've colour coded those two loops for you, so you can see how they fit into one another like Russian dolls.


Finally, the script I was writing before, word_extractor, gobbles up my input file (hardcoded...oops) and then looks for all the verbiage between $firstword and $lastword, and saves it all out.  And without further ado, I go and eat a biscuit and my trusty little laptop thinks for three minutes and then spits out 400 files for me to feed into the handy phrase extraction tool that I haven't actually built yet.


So sorry to leave you on tenterhooks, but we're now oh-so-close to the output.  Hurrah!

0 comments:

Post a Comment