Learning MySQL (2007)

Part V. Interacting with MySQL Using Perl

Chapter 16. Perl

One of the most useful functions of a database is allowing regular reports to be created that reflect important trends that a user wants to track. For example, a business might be interested in the relative increase in sales that can be directly attributed to a television advertising campain; similarly, a university academic might be interested in identifying the exam questions on which students struggle. Report generation is best automated in a program that issues SQL statements and summarizes and formats the results into something easier to digest. You can create powerful tools that allow users to interact with the data, and also to interchange data with applications such as statistical analysis tools.

A general-purpose client such as the MySQL monitor allows you to execute any SQL query you like. However, most applications use only a limited set of SQL queries to add data to a database, modify existing data, or list data, so you can write a custom client to perform these frequent queries.

Perl has been a hugely popular scripting language since its first release in 1987; while newer languages such as PHP, Python, and Ruby have appeared since then, Perl remains very popular. Perl is very flexible; in fact, the Perl motto is, “There’s more than one way to do it,” often mentioned as the acronym TMTOWTDI.

Perl is also one of the most portable languages, with support available on a large number of operating systems, including the three that we focus on in this book. This means that you generally don’t need to rewrite your Perl scripts if you want to use them on a different operating system.

It also benefits from many function libraries for applications as diverse as data manipulation programs for Personal Digital Assistants (PDAs), word processors and spreadsheet programs, network programming applications, and even full-color graphical games such as Frozen Bubble (http://www.frozen-bubble.org). More than 10,500 such libraries are available from the Comprehensive Perl Archive Network (CPAN) (http://www.cpan.org).

Perl includes powerful support for data manipulation and interfacing with databases, and is the scripting language most closely linked to MySQL. Most of the scripts that are distributed with MySQL are written in Perl. Over the next three chapters, you’ll learn how to write simple Perl scripts that can be run from the command line, such as the Linux or Mac OS X shell, the Windows command prompt, and CGI scripts that run on a web server.

Command-line scripts are typically used to import data from other software or export data from the database. For example, you can import data from a spreadsheet program or export data from the database to a spreadsheet program.

You can also write command-line join applications to run reporting queries. For example, using the GeoIP (http://www.maxmind.com/app/perl) database, you can write a script that takes an IP address of a computer and looks up the country the computer is located in.

You’ll also see how to use Perl to access to your database from the Web. For example, you can use the techniques you learn to design a music store application in Perl that reads product data from a backend MySQL database and generates a web page containing this information. Note that PHP is a more appropriate choice for new large-scale web database applications.

Writing Your First Perl Program

In this section, we’ll take a very quick look at the Perl language. A Perl script is simply a text file containing statements that the Perl interpreter reads and executes. As with most things, the best way to learn is by doing, so we’ll walk you through your very first Perl script. Open a text editor following the instructions in Using a Text Editor” in Chapter 2 and create a text file containing the following lines:

#!/usr/bin/perl

print "Hello,\nworld!\n1\t2\t3\n";

The first two characters on the first line should be the pound or hash symbol (#) followed by the exclamation mark symbol (!). Together, these two characters form the “shebang” or “hash-bang” marker that tells the shell how to run the script. Immediately after these two characters, specify the path to the location of the Perl interpreter (called perl) on your system. If you’re not sure where this is, check the instructions in Checking Your Existing Setup” in Chapter 2. Save this text file as HelloWorld.pl in your current directory; you can exit your editor if you wish.

On a Linux or Mac OS X system, add the executable permission for the user who owns the file (you) using the chmod program. Here, we grant read, write, and executable permissions for the owning user, and no permissions to the group or other users:

$ chmod u=rwx,g=,o= HelloWorld.pl

You need to do this only once for a file; the permissions don’t change if you edit or move the file. We discuss permission settings in Restricting access to files and directories” in Chapter 2.

You can now run your program from the Linux or Mac OS X command line by typing its name:

$ ./HelloWorld.pl

Hello,

world!

1       2       3

Most Linux distributions and Mac OS X do not look for programs in the current directory, so the the initial dot and slash (./) is needed to tell the operating system where to find the program file. Windows doesn’t use the shebang line, but to improve the portability of your scripts, it’s good to include a line such as #!/usr/bin/perl at the top of any scripts you write. You should follow the instructions of Installing Perl modules under Windows” in Chapter 2 and associate your Perl interpreter with the .pl extension. Windows always looks for the program file in the current directory, so you can simply type:

C:\> HelloWorld.pl

Hello,

world!

1       2       3

Congratulations! You’ve just written and executed your first Perl script.

Scripting With Perl

Let’s examine the first line of Perl that you wrote earlier:

print "Hello, world!\n";

The print command or function takes the text in the quotes (known as a string of characters) and displays it. Be sure to put a semicolon at the end of each Perl statement; if you forget one, it gets quite confused and prints error messages that can in turn confuse you!

You’ve probably noticed already that the \n and \t weren’t printed on the screen. The backslash indicates an escape character that should be handled in a special way. A \n indicates that a new line should be started at this point. Similarly, a \t tells Perl to jump ahead to the next tab stop, which is useful if you want to show columns of information. Note that the print command doesn’t insert any line breaks on its own, even when a program finishes; you have to tell it to do so explicitly through \n.

A program that prints out exactly what we’ve written isn’t very exciting. Perl, like most programming languages, allows us to use placeholders, or variables, to store values; we can manipulate these variables and then display them. For example, we can define a variable called$TemperatureToday to store today’s temperature:

my $TemperatureToday;

The keyword my is used to declare the variable for the first time. Variables that contain a single value are known as scalar variables and are identified with a dollar ($) symbol. We’ll discuss other types of variables later in this chapter. We can assign a value to this variable; for example, we can set today’s temperature to be 33 (Celsius):

$TemperatureToday=33;

The equals (=) symbol assigns the value on the righthand side (33) to the variable. We can also merge the declaration and the assignment into a single statement:

my $TemperatureToday=33;

We can define another variable, $TemperatureYesterday, to store yesterday’s temperature:

my $TemperatureYesterday=30;

We can use mathematical operations on variables; therefore, to display the difference in temperature between today and yesterday, we can write:

print "\nThe temperature difference is: " .

    $TemperatureToday-$TemperatureYesterday. "\n";

Here, we’ve used the concatenation (.) operator to connect several strings of characters together and display the resulting string with a single print statement. We’ve also used the subtraction (-) operator to find the difference between the two values. Let’s have a quick look at some other common mathematical operators.

Mathematical Operators

Mathematical operators can be used to manipulate numbers and variables. There are a few that are easy enough to understand:

=

Assigns the value on the right to the variable on the left, for example:

$Today="23rd November";

$Age=33;

+

Adds one value to another, for example:

$RetailPrice=$CostPrice+$Profit+$Tax;

$Answer=2+2;

-

Subtracts one value from another, for example:

$Loss=$PriceSold-$PriceBought;

*

Multiplies one value by another, for example:

$TemperatureInFahrenheit=$TemperatureInCelsius*1.8+32;

/

Divides one value by another, for example:

$CakePortionSize=1/$NumberOfPeople;

%

Calculates the remainder of dividing one number by another, for example:

print "Dividing 27 by 4 leaves: ". 27%4;

If we want to change the value of a variable based on its existing value, we would use the variable on both the lefthand side and the righthand side of the assignment operator:

$CakesLeft=$CakesLeft-$CakesEaten;

$Counter=$Counter+1;

This syntax is so common that there’s a shorthand way to write it that omits the same variable on the righthand side by merging the two operators:

$CakesLeft -= $CakesEaten;

$Counter += 1;

It’s also very common to increment or decrement a value, so +=1 and -=1 can be written simply as ++ and --, respectively:

++

Increments a number by one, for example:

$Counter++;

--

Decrements a number by one, for example:

$SecondsLeft--;

Finally, there are logical operators that are used to compare two values and return true or false depending on the result of the comparison. Perl considers a zero value or empty string to be false, and a nonzero value to be true. If we try to print the result of a comparison, we’ll get a 1 for a true outcome, and nothing (an empty string) for a false outcome:

==

Tests whether two values are equal:

print "Equal to 33:                 [". $TemperatureToday==33. "]\n";

produces the result:

Equal to 33:                 [1]

!=

Tests whether two values are unequal:

print "Not equal to 33:             [". $TemperatureToday!=33. "]\n";

produces the result:

Not equal to 33:             []

The false result is displayed as an empty string.

Tests whether the first value is less than the second:

print "Less than 33:                [". $TemperatureToday <33. "]\n";

produces the result:

Less than 33:                []

Tests whether the first value is greater than the second:

print "Greater than 33:             [". $TemperatureToday >33. "]\n";

produces the result:

Greater than 33:             []

<=

Tests whether the first value is less than or equal to than the second:

print "Less than or equal to 33:    [". $TemperatureToday<=33. "]\n";

produces the result:

Less than or equal to 33:    [1]

>=

Tests whether the first value is greater than or equal to the second:

print "Greater than or equal to 33: [". $TemperatureToday>=33. "]\n";

produces the result:

Greater than or equal to 33: [1]

Operator precedence

The instructions we’ve looked at so far have been simple, with only one operator. What happens if we have complex expressions with multiple operators? For example, what’s the the value of answer after this statement is executed?

$answer=1+2-3*4/5;

You may remember from your high school math that mathematical operators have an order of precedence: multiplication and division are performed before addition and subtraction. However, relying on the order of precedence can try your memory, and your code will be hard to read. Parentheses override the order of precedence, allowing you to be sure that expressions will be evaluated as you expect. Our example would actually be evaluated as:

$answer=(1+2)-(3*4/5);

but using parentheses, you could specify that it should be evaluated as:

$answer=1+((2-3)*4)/5;

We recommend that you make liberal use of parentheses to keep your code readable and to avoid ambiguity.

More on Variables

Variables can be used to store things other than numbers. In Example 16-1, we use variables to store and display text and numbers.

Example 16-1. Perl script to add several variables and display the totals

#!/usr/bin/perl

use strict;

# Declare variables to store animal names, and assign values to them

my $AnimalNameOne="cats";

my $AnimalNameTwo="dogs";

my $AnimalNameThree="fish";

# Declare variables to store animal counts, and assign values to them

my $AnimalCountOne=3;

my $AnimalCountTwo=7;

my $AnimalCountThree=4;

# Calculate the sum of the animal counts

my $Total=$AnimalCountOne+$AnimalCountTwo+$AnimalCountThree;

# Display the counts and total

print "Pet roll call:\n".

 "===========\n".

 "$AnimalNameOne:\t$AnimalCountOne\n".

 "$AnimalNameTwo:\t$AnimalCountTwo\n".

 "$AnimalNameThree:\t$AnimalCountThree\n".

 "===========\n".

 "Total:\t$Total\n";

In this program, we store animal names and counts in variables, and place the total count into the $Total variable. Save this program as animals.pl and run it; you’ll see the following output:

Pet roll call:

===========

cats:   3

dogs:   7

fish:   4

===========

Total:  14

The second line of this script is a use strict; instruction (also known as a pragma) to the Perl interpreter to ensure that all variables are explicitly declared with the my keyword before they are used. This helps avoid problems with mistyped variable names. You should try to include this line in all your scripts. Otherwise, if you mistype a variable name in one place, Perl assumes you want to create a new variable and doesn’t warn you about the problem, so the program could fail or produce incorrect output that’s hard to detect.

Any braces (also known as curly brackets) enclosing the variable declaration limit the scope of the declaration. For example, here the $Time variable is declared only inside the braces and is not available outside them:

my $Seconds=97;

{

 my $Time=$Seconds+1;

 print "\nTime: ", $Time;

}

However, the variable $Seconds is available both outside and inside the braces.

A variable defined inside braces will override any existing variable with the same name outside the braces. For example, we can have two different variables called $counter:

#!/usr/bin/perl

my $counter=10;

print "Before braces: $counter\n";

{

 my $counter=33;

 print "Within braces: $counter\n";

}

print "After braces:  $counter\n";

This produces the results:

Before braces: 10

Within braces: 33

After braces:  10

It’s generally not good practice to use different variables with the same names, so avoid doing so when you can. We’ve just shown this here to help you understand existing code and possible causes of problems.

Notice that we’ve left blank lines between several statements and substrings; Perl ignores such whitespace outside strings. Perl also ignores any lines starting with the hash, or pound, symbol (#); this allows us to write explanatory comments alongside the code. Judicious use of whitespace and comments can help keep your programs readable and easy to understand.

Single and double quotes

Till now, we’ve used the double-quote (") character to indicate the start and end of a string. We can also enclose strings with the single-quote (') character, but there is an interesting difference. If you run the script:

#!/usr/bin/perl

use strict;

my $Answer=42;

print "The answer is: $Answer\n";

print 'The answer is: $Answer\n';

you’ll see the following output:

The answer is: 42

The answer is: $Answer\n

When the string is enclosed in single quotes, variables are not replaced by their values, and escape characters are treated as normal text.

You may wonder how we can include one of the quote symbols within a string. For example, you can’t have the string:

print 'This is Sarah's bag.';

since the string would end immediately after “Sarah”, and the remainder of the sentence would confuse Perl.

The solution is to enclose one type of quote in a string enclosed by the other type:

print "This is Sarah's bag.";

print 'He said, "This is fun!"';

or to add a backslash symbol to escape the quote symbol and indicate that it should be processed in a special way:

print 'This is Sarah\'s bag.';

print "He said, \"This is fun!\"";

There is a third way of creating strings that is peculiar to Perl. The constructs q(text) and qq(text) have the same effect as enclosing the text in single and double quotes, respectively, but have the advantage that quotes don’t need to be escaped. Thus, for example, the following two statements work as expected:

print q(This is Sarah's bag.);

print qq(He said, "This is fun!");

Arrays and Hashes

Let’s look again at our our addition script. We used a different variable to store the name and count of each animal:

my $AnimalNameOne="cats";

my $AnimalNameTwo="dogs";

my $AnimalNameThree="fish";

my $AnimalCountOne=3;

my $AnimalCountTwo=7;

my $AnimalCountThree=4;

Such scalar variables work well enough for three animals but would be difficult to use if we were trying to keep track of the hundreds of species in a zoo. A better way to manage similar values is to store them as a list in a single array variable, with the data on each animal stored in a numbered element in the array. Example 16-2 rewrites the script in Example 16-1 accordingly.

Example 16-2. Perl script using array variables

#!/usr/bin/perl

use strict;

my @AnimalName=("cats", "dogs", "fish");

my @AnimalCount=(3, 7, 4);

my $Total=$AnimalCount[0]+$AnimalCount[1]+$AnimalCount[2];

print "Pet roll call:\n".

 "===========\n".

 "$AnimalName[0]:\t$AnimalCount[0]\n".

 "$AnimalName[1]:\t$AnimalCount[1]\n".

 "$AnimalName[2]:\t$AnimalCount[2]\n".

 "===========\n".

 "Total:\t$Total\n";

The @AnimalName array contains three elements with the values cats, dogs, and fish. Elements in the list are labeled starting from zero, so the first element is element 0, the second is element 1, the third is element 2, and so on. Array variables are indicated by the at (@) symbol; the individual elements in the array are scalar variables, so they are indicated with the dollar ($) symbol. For example, the second element in the @AnimalName array is $AnimalName[1], with the value dogs.

Instead of referring to elements by their index number, we can use a third type of variable: the hash, that allows us to map elements using a text identifier or key. As shown in Example 16-3, we can store the animal counts in a hash called %Animals, with the animal names as the key.

Example 16-3. Perl script using hash variables

#!/usr/bin/perl

use strict;

print "\nHash:\n";

my %Animals=( cats=>3, dogs=>7, fish=>4);

my $Total= $Animals{cats}+ $Animals{dogs}+ $Animals{fish};

print "Pet roll call:\n".

 "===========\n".

 "cats:\t$Animals{cats}\n".

 "dogs:\t$Animals{dogs}\n".

 "fish:\t$Animals{fish}\n".

 "===========\n".

 "Total:\t$Total\n";

Notice that the hash is indicated by a percentage (%) symbol and that, like for arrays, the individual scalar elements are indicated by a dollar symbol. For example, the number of cats is contained in $Animals{cats}; it’s common to enclose the identifier in single or double quotes, as in$Animals{'cats'} or $Animals{"cats"}.

Note that array elements are enclosed in square brackets—$AnimalName[1]—whereas hash elements are enclosed in curly braces—$Animals{'cats'}.

In this example, we’ve written the hash keys in the program itself. This is called hardcoding and is not good practice. Any change to the keys requires a change to the program. If we don’t know the keys, we can still access the elements by first extracting the keys into an array using the keyskeyword. We can then use the elements in this array to access the hash elements; for example, instead of typing $Animals{"cats"}, we can write $Animals{ $AnimalName[0] }. This may be hard to read, but think of it this way: Perl looks inside the braces and finds$AnimalName[0]. This denotes the first element of the @AnimalName array, which is cats. Perl then plugs cats in where $AnimalName[0] was, in order to select the proper value from the %AnimalName hash. Using this syntax in a program, we can do calculations and printouts:

# Extract the keys of the Animals hash into the AnimalName array

my @AnimalName = keys %Animals;

my $Total=

 $Animals{$AnimalName[0]}+

 $Animals{$AnimalName[1]}+

 $Animals{$AnimalName[2]};

print "Pet roll call:\n".

 "===========\n".

 "$AnimalName[0]:\t$Animals{$AnimalName[0]}\n".

 "$AnimalName[1]:\t$Animals{$AnimalName[1]}\n".

 "$AnimalName[2]:\t$Animals{$AnimalName[2]}\n".

 "===========\n".

 "Total:\t$Total\n";

While it’s nice to be able to use a single variable to store the data, there’s still a lot of ugly manual referencing going on in the print statement; if we had a hundred types of animals, we’d need to reference them all individually. If you thought that we don’t really have to do this, you’re right! In the next section, we’ll look at how loops can help simplify processing of arrays.

Before we end our discussion of arrays and hashes, we note that they can also be created with the qw “quote word” construct. For example, the following two statements to create an array are equivalent:

my @AnimalName=("cats", "dogs", "fish");

my @AnimalName=qw(cats dogs fish);

and the following two statements to create a hash are equivalent:

my %Animals=(

 "cats"=>3,

 "dogs"=>7,

 "fish"=>4);

my %Animals=qw(

 cats 3

 dogs 7

 fish 4);

You need to be comfortable with only one approach, but it’s good to understand what’s happening if you see the other format in other people’s code.

Control Structures: Loops and Conditionals

We often require computers to do a single task many times; for example, we might write a program to count the number of cars that travel along a particular road, or the number of seconds left till a space rocket blasts off. Instead of writing out statements many times, we can write them out once and use a loop construct to repeat them as many times as required.

There are several flavors of loop in Perl; we’ll look at these in the context of a simple example: counting from 1 to 10. The simplest is the for loop:

for(my $counter=1; $counter<=10; $counter++)

{

 print "\nThe value is: $counter";

}

Here, we initialize the $counter variable to 1, then execute the statement between the braces (the body of the loop) as long as the counter is less than or equal to 10. Each time we pass through the loop, we increment the counter using the ++ operator. The body of the loop contains a single statement that displays the value of the counter.

In the while loop, the body is executed as long as the condition in the parentheses is true. We can write the previous counter as:

my $counter=1;

while($counter<=10)

{

 print "\nThe value is: $counter";

 $counter++;

}

Notice that the while loop does not include a particular place for initializing or incrementing the counter. In fact, we generally use the for loop when we know exactly how many times we want to run the loop, and we use other loop constructs such as the while loop when we don’t.

The do...while loop is almost identical to the while loop, with one difference: the condition is first evaluated only after the loop body has been executed once. This means that the body is executed even if the condition is not true, which is useful in some circumstances:

my $counter=1;

do

{

 print "\nThe value is: $counter";

 $counter++;

}while($counter<=10);

Finally, the until loop is identical to the while loop but inverts the condition; the loop is executed as long as the condition is false:

my $counter=1;

until($counter>10)

{

 print "\nThe value is: $counter";

 $counter++;

}

Iterating Through Arrays and Hashes

Earlier in Arrays and Hashes,” we accessed the individual scalar elements in the @Animals array by their index numbers—for example:

my $Total=

 $Animals{$AnimalName[0]}+

 $Animals{$AnimalName[1]}+

 $Animals{$AnimalName[2]};

We can use the foreach construct to walk through all the keys of the %Animals hash (given by keys %Animals) and assign each value in turn to the scalar variable $AnimalName:

my %Animals=( "cats"=>3, "dogs"=>7, "fish"=>4);

my $Total=0;

print "Pet roll call:\n".

      "===========\n";

foreach my $AnimalName (keys %Animals)

{

 $Total+=$Animals{$AnimalName};

 print "$AnimalName:\t$Animals{$AnimalName}\n";

}

print "===========\n".

      "Total:\t$Total\n";

For each value of $AnimalName, the statements between the braces are executed. First, the += operator is used to increase the value of the $Total variable by the count of that animal, and then the name and count of each animal is printed. Note that we initialized the value of $Total to zero before we start adding values to it.

The foreach construct shown here extracts the keys of the %Animals hash, but we have to use the hash together with the key to find each value. The following while statement does the same thing, but in a cleaner way:

while( (my $AnimalName, my $Count) = each(%Animals) )

{

 print "$AnimalName:\t$Count\n";

 $Total+=$Count;

}

Each time round the loop, the each construct assigns the animal name (the key) and count (the value) to the $AnimalName and $Count variables. The loop is repeated, and the statements within the braces are executed until all the items in the hash are exhausted.

Conditional Statements

Sometimes, we want to execute a statement only if something is true, or only if it’s false. The if construct allows this:

# Numerical comparison

my $var1=786;

if($var1 < 786)

{

    print "The value is less than 786.\n";

}

if($var1 >= 786)

{

    print "The value is greater than or equal to 786.\n";

}

if($var1 == 786)

{

    print "The value is equal to 786.\n";

}

If we want to compare strings, rather than numbers, we need to use the string comparison operators. The important ones are eq (equal), lt (alphabetically earlier than), and gt (alphabetically later than):

# String comparison

my $username="Ali";

if($username lt "N")

{

    print "The username appears in the first half of the alphabet.\n";

}

With the if...else construct, we can have some code that is executed when the condition is true, and other code that’s executed when the condition is false:

if($username eq "Ali")

{

    print "Hi Dad!\n";

}

else

{

    print "Hello!\n";

}

If the $username variable has the value "Ali", the "Hi Dad!" message will be displayed; otherwise, the message “Hello!” is displayed instead.

To handle other possible conditions, we can use the if...elsif...else construct. For example:

if($username eq "Ali")

{

    print "Hi Dad!\n";

}

elsif($username eq "Sadri")

{

    print "Hi Mom!\n";

}

else

{

    print "Hello!\n";

}

If the $username variable has the value "Ali", the "Hi Dad! message will be displayed (and the later checks will not be performed); if the $username variable has the value "Sadri", the "Hi Mom!" message will be displayed, and if neither condition is satisfied, the "Hello!" message will be displayed.

We can combine conditions using the Boolean operators AND (&&), OR ||(|| OR), and NOT (!). For example, we can print a message if two conditions are met:

# Combining conditions

my $temperature=19;

# Boolean AND

if( ($temperature > 18) && ($temperature < 35) )

{

    print "The weather is fine.\n";

}

or if either condition is met:

# Boolean OR

if( ($temperature < 18) || ($temperature > 35) )

{

    print "The weather isn't fine.\n";

}

or if a condition is not met:

# Boolean NOT (negating the condition)

if( !($temperature < 18) )

{

    print "The weather isn't cold.\n";

}

You will often see the Boolean operators written in the long form: and, or, and not. For example, you can write:

# Symbolic and long form of Boolean expressions

my $value=74;

# A combined expression...

if( ($value > 80) || ( ($value < 75) && ! ($value == 73)) )

{

    print "The value is greater than 80 or less than 75, but is not 73\n";

}

as:

# ...and the equivalent in long form

if( ($value > 80) or ( ($value < 75) and not ($value == 73)) )

{

    print "The value is greater than 80 or less than 75, but is not 73\n";

}

The long forms and and or aren’t in fact identical to their symbolic counterparts && and ||. Perl assigns the long forms a very low operator precedence; as we noted earlier in Operator precedence,” it’s best to use parentheses to express the precedence you want.

Reading Input from the Command Line and from Files

Consider our sample animals script; the names of the animals and the numbers of each are hardcoded into the program. A better solution is to allow the program to use values provided by the user from the command line or from a file.

Reading in values from the command line

One option is to specify values after the program name on the command line; these values are called command-line arguments and are saved in the special ARGV array variable. For example, if we type:

$ ./program.pl Argument_1 Argument_2 Argument_3

$ARGV[0] will contain Argument_1, $ARGV[1] would contain Argument_2, and $ARGV[2] would contain Argument_3. The number of arguments entered at the command line is the same as the number of elements in the @ARGV array; you can find this number by referring to the name of the array—for example, @ARGV will be 3 if three arguments are typed in at the command line.

Example 16-4 modifies our Animals script to read in the number of cats, dogs, and fish as command-line arguments.

Example 16-4. Reading in numbers from the command line

#!/usr/bin/perl

use strict;

my %Animals;

# If the user hasn't provided the correct number of command-line

# arguments, provide a helpful error message.

if(@ARGV!=3)

{

 die("Syntax: $0 [count of cats] [count of dogs] [count of fish]\n");

}

# If the user has provided the command-line arguments, fill in the

# Animals hash with the corresponding values.

%Animals=(

 "cats"=>$ARGV[0],

 "dogs"=>$ARGV[1],

 "fish"=>$ARGV[2]);

# Process the data to calculate the total; code beyond this point is

# identical to our previous example, and doesn't deal with the

# command-line arguments.

my $Total=0;

print "Pet roll call:\n".

      "===========\n";

while ((my $Animal, my $Count) = each(%Animals))

{

 print "$Animal:\t$Count\n";

 $Total+=$Count;

}

print "===========\n".

      "Total:\t$Total\n";

If an incorrect number of arguments is provided, the die statement prints the message string between the parentheses and then stops the program. Since the program will run beyond this point only if a correct number of arguments is provided, there’s no need to include an else clause to handle such a case.

Save this script as animals.commandline.pl and run it with command-line arguments:

$ ./animals.commandline.pl 3 7 4

Pet roll call:

===========

cats:   3

dogs:   7

fish:   4

===========

Total:  14

As part of the error message, we’ve used the $0 variable, which is the command used to run the script. If, by mistake, you use too many or too few arguments, you get the helpful error message shown below:

$ ./animals.commandline.pl 3 7 4 1

Syntax: ./animals.commandline.pl [count of cats] [count of dogs] [count of fish]

Here, $0 is replaced by ./animals.commandline.pl.

Notice that we still have the animal names hardcoded in the program; each time we want to change the list of animals, we need to change the program code. We can instead change our program to read in both the animal names and counts from the command line, as shown in Example 16-5:

Example 16-5. Reading in both the animal names and counts from the command line

#!/usr/bin/perl

use strict;

my %Animals;

# If the user hasn't provided a nonzero, even number of command-line

# arguments, provide a helpful error message.

if( (@ARGV==0) || ( (@ARGV%2)!=0) )

{

 die("Syntax: $0 [Animal One Name] [Animal One Count] ".

  "[Animal Two Name] [Animal Two Count] ...\n");

}

# If the user has provided the command-line arguments, fill in the

# Animals hash with the corresponding values.

while(@ARGV)

{

 # Read in an argument and take this as the animal name

 my $AnimalName=shift(@ARGV);

 # Read in another argument and take this as the count for this animal

 my $AnimalCount=shift(@ARGV);

 # Add an entry to the Animals hash for this animal name and

 # count pair:

 $Animals{$AnimalName}=$AnimalCount;

}

# Process the data to calculate the total; code beyond this point is

# identical to our previous example and doesn't deal with the

# command-line arguments.

my $Total=0;

print "Pet roll call:\n".

      "===========\n";

while ((my $Animal, my $Count) = each(%Animals))

{

 print "$Animal:\t$Count\n";

 $Total+=$Count;

}

print "===========\n".

      "Total:\t$Total\n";

This compact program combines many of the features of Perl you’ve learned so far. Loops allow you to process as many data items as required; when you’re writing a program, you generally don’t know exactly how many rows of data will be returned by the database. The example also illustrates the if control statement, the || logical OR operator, and an appropriately used array and hash.

You can also see how the program uses scope to limit the visibility of its variables. The first while block defines two variables ($AnimalName and $AnimalCount) that can be used only within the loop. The second while block defines two more variables within the while statement itself; these can be used only within that block. Because these variables serve a temporary function inside the loop and aren’t needed outside it, defining them within the scope of the block is good coding practice.

In our test for command-line arguments, we print the error message if the user hasn’t provided any command-line arguments (the number of arguments is zero), or if there aren’t an integer number of animal name and count pairs (which we’ll know because there will be a remainder when we divide the number of arguments by two: @ARGV%2).

To read in the command-line arguments, we use the shift function to pick up one argument from the list. We expect a name and a count, so we call shift twice for each animal. The while loop continues as long as there are additional arguments, so we can provide data for as many animals as we like.

Let’s try the program out:

$ ./animals.commandline.types.pl dogs 7 fish 33 elephants 1 giraffes 3

Pet roll call:

===========

giraffes:       3

cats:   4

elephants:      1

dogs:   7

fish:   33

===========

Total:  48

Notice that the counts aren’t aligned properly; this is because the longer animal names (giraffes and elephants) reach the end of the first tab column, and so the \t in the print statement moves the count into the next tab column.

Reading in values from a file

Instead of typing in the data as command-line arguments, we can ask our program to read in the data from a file. A popular and simple format for data interchange between applications is the comma-separated values (CSV) format. This is a plain-text format with the data separated by commas. Create the following CSV file in a text editor and save it as animals.csv:

cats,2

dogs,5

fish,3

emus,4

Now, let’s write a simple program to read in a specified file and print the contents on the screen. Example 16-6 uses the open function to open the file and a while loop to read it in line by line.

Example 16-6. Perl script to read in a text file and display the contents

#!/usr/bin/perl

use strict;

# If the user hasn't provided one command-line argument, provide a

# helpful error message.

if(@ARGV!=1)

{

    die("Syntax: $0 [Input file]\n");

}

# Open the file specified on the command line; if we can't open it,

# print an error message and stop.

open(INPUTFILE, $ARGV[0])

    or die("Failed opening $ARGV[0]\n");

# Read in the input file line by line

while(my $Line=<INPUTFILE>)

{

    print $Line;

}

# Close the input file

close(INPUTFILE);

Here, we’ve used the open() function to open the file with the name specified on the command line and configure a file handler to access this file; in our example, we’ve used INPUTFILE for the file handler. Note that unlike other types of Perl variables, file handlers don’t have a symbol such as the dollar symbol ($) or the at symbol (@) before them.

Every standard Perl function ends by passing back a value to the code that called it. In fact, many functions do nothing but return a value. The open() function returns a nonzero value to indicate that it succeeded in opening the file, and returns zero if it failed. Common causes of file-access errors include mistyped filenames and insufficient privileges to access a particular file or directory. We can use an if statement to check for a zero value; if the file-open operation failed, we can use the die() function to display an error message and stop the script:

if(!open(INPUTFILE, $ARGV[0]))

{

  die("Failed opening $ARGV[0]\n");

}

This combination of an if statement and an open function is worth noting; we’ve previously used if on logical tests such as $Username == "Ali", but if is flexible enough to directly test a single value, or the result of a function such as open. We can also use the simpler or construct to call the die() function if the open() function fails:

open(INPUTFILE, $ARGV[0])

  or

  die("Failed opening $ARGV[0]\n");

Save this program as readfile.pl, and then get it to read in and display the contents of the animals.csv file:

$ ./readfile.pl animals.csv

cats,2

dogs,5

fish,3

emus,4

Instead of simply printing out the file contents, let’s load them into our own data structures and process the data. We have to remove the invisible newline at the end of each line of the text file using the chomp() function, then load the contents of each line into array elements by the location of the commas using the split() function. For convenience, we assign the first value to the scalar variable $AnimalName and the second value to the scalar variable $AnimalCount. We then use these to populate the %Animals hash.

For example, the line:

cats,2

is split at the comma into the @AnimalData array, with:

AnimalsData[0]: cats

AnimalsData[1]: 2

and these values are assigned to the variables:

AnimalName:  cats

AnimalCount: 2

The statement:

$Animals{$AnimalName}=$AnimalCount;

is effectively:

$Animals{cats}=2;

In this way, we add entries to the %Animals hash for each animal. The complete program code is listed in Example 16-7.

Example 16-7. Perl script to read in data from a CSV file

#!/usr/bin/perl

use strict;

# If the user hasn't provided one command-line argument, provide a

# helpful error message.

if(@ARGV!=1)

{

    die("Syntax: $0 [Input file]\n");

}

# Open the file specified on the command line; if we can't open it,

# print an error message and stop.

if(!open(INPUTFILE, $ARGV[0]))

{

    die("Failed opening $ARGV[0]\n");

}

my %Animals;

# Read in from input file line by line; each line is

# automatically placed in $_

while(<INPUTFILE>)

{

    # Remove the newline at the end of the line

    chomp($_);

    # Split the line by commas and load into the AnimalsData array

    my @AnimalsData=split(",", $_);

    # Assign the text before the first comma to the name

    my $AnimalName=$AnimalsData[0];

    # Assign the text between the first comma and the second comma

    # (if any) to the count

    my $AnimalCount=$AnimalsData[1];

    # Add an entry to the Animals hash for this animal name and

    # count pair:

    $Animals{$AnimalName}=$AnimalCount;

}

# Close the input file

close(INPUTFILE);

# Process the data to calculate the total; code beyond this point is

# identical to our previous example and doesn't deal with the

# command-line arguments.

my $Total=0;

print "Pet roll call:\n".

      "===========\n";

while ((my $Animal, my $Count) = each(%Animals))

{

    print "$Animal:\t$Count\n";

    $Total+=$Count;

}

print "===========\n".

      "Total:\t$Total\n";

Reading in values from standard input

The console’s standard input is a special file that captures data typed in at the console, sent to the program using a pipe (|), or read from a redirection operator (<). Using the standard input, we can skip the process of opening and closing the file using the file pointer (INPUTFILE in our example), and instead use the built-in Perl STDIN file handle, as shown in Example 16-8.

Example 16-8. Perl script to read in data from a CSV file from standard input

#!/usr/bin/perl

use strict;

my %Animals;

# Read in from standard input line by line; each line is

# automatically placed in $_

while(<STDIN>)

{

     # Remove the newline at the end of the line

     chomp($_);

     # Split the line by commas and load it into the AnimalsData array

     my @AnimalsData=split(",", $_);

     # Assign the text before the first comma to the name

     my $AnimalName=$AnimalsData[0];

     # Assign the text between the first comma and the second comma

     # (if any) to the count

     my $AnimalCount=$AnimalsData[1];

     # Add an entry to the Animals hash for this animal name and

     # count pair:

     $Animals{$AnimalName}=$AnimalCount;

}

# Process the data to calculate the total; code beyond this point is

# identical to our previous example and doesn't deal with the

# command-line arguments.

my $Total=0;

print "Pet roll call:\n".

      "===========\n";

while ((my $Animal, my $Count) = each(%Animals))

{

    print "$Animal:\t$Count\n";

    $Total+=$Count;

}

print "===========\n".

      "Total:\t$Total\n";

We can then run this as:

$ ./Animals.command_line.tofile.pl < animals.csv

or as:

$ cat animals.csv | Animals.command_line.tofile.pl

on a Linux or Mac OS X system, or as:

$ type animals.csv | Animals.command_line.tofile.pl

on a system running Windows.

Writing values to a file or standard output

You’ll often want to permanently store the output of your program in a file. Example 16-9 modifies the program to take a second command-line argument to specify the name of the output file.

Example 16-9. Perl script to read in data from a CSV file and save results to an output file

#!/usr/bin/perl

use strict;

my %Animals;

# If the user hasn't provided any command-line arguments, provide a

# helpful error message.

if(@ARGV!=2)

{

    die("Syntax: $0 [Input file] [Output file]\n");

}

# Open the file specified on the command line; if we can't open it,

# print an error message and stop.

if(!open(INPUTFILE, $ARGV[0]))

{

    die("Failed opening $ARGV[0]\n");

}

# Open the output file specified on the command line; if we can't open it,

# print an error message and stop.

if(!open(OUTPUTFILE, ">$ARGV[1]"))

{

    die("Failed opening $ARGV[1]\n");

}

# Read in from input file line by line; each line is

# automatically placed in $_

while(<INPUTFILE>)

{

    # Remove the newline at the end of the line

    chomp;

    # Split the line by commas

    my @AnimalsData=split(",", $_);

    # Assign the text before the first comma to the name

    my $AnimalName=@AnimalsData[0];

    # Assign the text between the first comma and the second comma

    # (if any) to the count

    my $AnimalCount=@AnimalsData[1];

    # Add an entry to the Animals hash for this animal name and

    # count pair:

    $Animals{$AnimalName}=$AnimalCount;

}

close(INPUTFILE);

# Process the data to calculate the total, then write to the output file

my $Total=0;

print OUTPUTFILE "Pet roll call:\n".

                 "===========\n";

while ((my $Animal, my $Count) = each(%Animals))

{

    print OUTPUTFILE "$Animal:\t$Count\n";

    $Total+=$Count;

}

print OUTPUTFILE "===========\n".

                 "Total:\t$Total\n";

We’re providing the name of the output file as the second command-line argument (ARGV[1]). The interesting part of this program starts from the second open() statement; since we want to write to the file, we add a greater-than (>) symbol before the name of the output file. We also specify the output file handle OUTPUTFILE immediately after the print command.

If we don’t specify an output file handle, program output is sent to the system standard output, known as STDOUT. This is almost always the display screen. As with STDIN, we can use STDOUT without needing to explicitly open and close it. We can also print to standard output by puttingSTDOUT as the file handle in the print statement:

print STDOUT "$Animal:\t$Count\n";

Since the program output is sent to standard output by default anyway, STDOUT is assumed when no other file handle is specified, and we can safely omit it (as we have in all our previous scripts):

print "$Animal:\t$Count\n";

Writing Your Own Perl Functions

As we’ve seen, a function is a statement such as print() or open() that performs an operation for your program. Functions can take arguments within parentheses, and can return a value.

You can define your own functions in Perl. Sometimes, you might want to perform a task in several places of your program, such as to repeatedly display messages or perform calculations. You can define your own function to perform the task and then call the function whenever you need the task to be performed.

Example 16-10 is a program with two small functions: one called sum() to calculate the sum of a list of numbers, and the other called average() to average a list of numbers. The average() function uses the sum() function in its calculations.

Example 16-10. Perl script with functions to sum and average numbers

#!/usr/bin/perl

use strict;

print "\nThe total is: ", sum(1, 5, 7);

print "\nThe average is: ", average(1, 5, 7);

# Function to calculate the sum of all the numbers passed to it

sub sum

{

    my $Total=0;

    while(my $value=shift)

    {

        $Total+=$value;

    }

    return $Total;

}

# Function to calculate the average of all the numbers passed to it

# This function calls the sum function internally.

sub average

{

    return sum(@_)/(@_);

}

The sum() function uses the shift keyword to iterate through the provided values one by one, assigning them in turn to the $value variable. When all the values have been seen and added to the $Total, the function returns the $Total value to the part of the program that called it. This means that sum(1, 5, 7) has the value of $Total, which is 13.

The special array @_ contains all the values passed to the function when it is called. The average() function passes this list—in this example 1, 5, 7—to the sum() function to get the total, and then divides this total by the number of values in the list, given by the array name @_. Finally, the statement returns the resulting average:

return sum(@_)/(@_);

Note that the $Total variable is defined only within the sum() function, since it’s enclosed by the function braces.

Save this program as sum_average.floating.pl and then run it by typing:

$ ./sum_average.functions.pl

The total is:   13

The average is: 4.33333333333333

Of course, we can use variables instead of hardcoding values in the program. For example, to accept the list of numbers from the command line, we can rewrite the two print lines as:

print "\nThe total is:   ", sum(@ARGV);

print "\nThe average is: ", average(@ARGV);

allowing us to call this program as:

$ ./sum_average.functions.pl 19 313 110

The total is:   442

The average is: 147.33333333333333

The value for the average has more precision than we’d generally need, and the numbers aren’t aligned. We can use the printf function to format the values using a format specifier before printing them. The format specifiers you are most likely to come across are:

%d

Integer number (decimal)

%f

Number with a decimal fraction (floating point)

%s

String of characters

For our example, we could write:

printf "\nThe total is:   %10d",   sum(1, 5, 7);

printf "\nThe average is: %10.2f", average(1, 5, 7);

The value of sum(1, 5, 7) is mapped to the format specifier %10d, which sets aside 10 decimal places for the sum. Similarly, in the second statement, the value of average(1, 5, 7) is mapped to the format specifier %10.2f, which sets aside 10 characters total for the average and specifies that only 2 decimal places should be displayed. In other words, we leave room for 7 places to the left of the decimal point, 1 character for the decimal point itself, and 2 places for the decimal part of the number. With these statements, the program output would be:

The total is:           13

The average is:       4.33

which looks much nicer.

Adding a minus (-) symbol immediately after the percentage symbol makes the display left-aligned. For example the statement:

printf("\n%15s", "hello");

would display:

(ten spaces) hello

whereas adding the minus symbol as shown here:

printf("\n%-15s", "hello");

would display:

hello (ten spaces)

It’s typical to display numbers right-aligned, and to display text left-aligned.

Resources

To learn more about Perl, we recommend these resources:

§  The Perl.org page for people learning Perl (http://learn.perl.org)

§  Learning Perl by Randal L. Schwartz et al. (O’Reilly)

§  The Comprehensive Perl Archive Network web site (http://www.cpan.org)

Exercises

1.    What are the strengths of Perl?

2.    What is the difference between an array and a hash?

3.    What does the following Perl script do?

4.  #!/usr/bin/perl

5.  use strict;

6.   

7.  my $Answer;

8.  while(@ARGV)

9.  {

10. $Answer+=shift(@ARGV);

11.}

print "Answer: $Answer\n"