Learning MySQL and MariaDB (2015)

Part III. Basics of Handling Data

Chapter 7. Selecting Data

Previous chapters discussed the important topics of organizing your tables well and getting data in to them. In this chapter, we will cover a key objective that makes the others pay off: retrieving the data stored in a database. This is commonly called a database query.

The simplest way to retrieve data from a MySQL or MariaDB database — to select data — is to use the SQL statement, SELECT. We used this SQL statement a few times in previous chapters. In this chapter, we will cover it in greater detail. It’s not necessary to know or use all of the may options, but some techniques such as joining tables together are basic to using relational databases.

We’ll begin this chapter by reviewing the basics of the SELECT statement, and then progress to more involved variants. When you finish this chapter, you will hopefully have a good understanding of how to use SELECT for most of your needs as you start out as a database developer, as well as be prepared for the many possibilities and special situations that may arise over the years of developing databases with MySQL and MariaDB.

In previous chapters, especially in the exercises, you were asked to enter data into the tables that we created and altered in the chapters of the previous part of this book. Entering data on your own was good for training purposes, but we now need much more data in our database to better appreciate the examples in this chapter. If you haven’t done so already, go to this book’s website and download the dump files that contain tables of data.

Download rookery.sql to get the whole rookery database, with plenty of data for use in our explorations. Once you have the dump file on your system (let’s assume you put it in /tmp/rookery.sql), enter the following from the command line:

mysql --user='your_name' -p \

rookery < /tmp/rookery.sql

The command prompts for your password, logs you in using the username assigned to you, and runs the statements in the rookery.sql file on the rookery database. If everything goes well, there should be no message in response, just the command-line prompt when it’s finished.

Basic Selection

The basic elements of the syntax for the SELECT statement are the SELECT keyword, the column you want to select, and the table from which to retrieve the data:

SELECT column FROM table;

If you want to select more than one column, list them separated by commas. If you want to select all of the columns in a table, you can use the asterisk as a wildcard instead of listing all of the columns. Let’s use the rookery database you just loaded with data to see a practical example of this basic syntax. Enter the following SQL statement in mysql to get a list of all of the columns and rows in the birds table:

USE rookery;

SELECT * FROM birds;

This is the most minimal SELECT statement that you can execute successfully. It tells MySQL to retrieve all of the data contained in the birds table. It displays the columns in the order you defined them in the table’s CREATE or ALTER statements, and displays rows in the order they are found in the table, which is usually the order that the data was entered into the table.

To select only certain columns, do something like this:

SELECT bird_id, scientific_name, common_name

FROM birds;

This SELECT statement selects only three columns from each row found in the birds table. There are also many ways to choose particular rows, change the order in which they are displayed, and limit the number shown. These are covered in the following sections of this chapter.

Selecting by a Criteria

Suppose that we want to select only birds of a certain family, say the Charadriidae (i.e., Plovers). Looking in the bird_families table, we find that its family_id is 103. Using a WHERE clause with the SELECT statement, we can retrieve a list of birds from the birds table for this particular family of birds like so:

SELECT common_name, scientific_name

FROM birds WHERE family_id = 103

LIMIT 3;

+----------------------+-------------------------+

| common_name          | scientific_name         |

+----------------------+-------------------------+

| Mountain Plover      | Charadrius montanus     |

| Snowy Plover         | Charadrius alexandrinus |

| Black-bellied Plover | Pluvialis squatarola    |

+----------------------+-------------------------+

This SELECT statement requests two columns, in a different order from the way the data is listed in the table — in the table itself, scientific_name precedes common_name. I also added the LIMIT clause to keep the results down to the first three rows in the table. We’ll talk more about the LIMITclause in a little while.

NOTE

Because we separated families into a separate table, you had to look at the bird_families table to get the right ID before selecting birds from the birds table. That seems round-about. There is a streamlined way to ask for a family name such as Charadriidae instead of a number. They’re called joins. We’ll cover them later.

This is all fairly straightforward and in line with what we’ve seen in several other examples in previous chapters. Let’s move on and take a look at how to change the order of the results.

Ordering Results

The previous example selected specific columns from the birds table and limited the results with the LIMIT clause. However, the rows were listed in whatever order they were found in the table. We’ve decided to see only a tiny subset of the birds in the Charadriidae family, so ordering can make a difference. If we want to put the results in alphabetical order based on the values of the common_name column, we add an ORDER BY clause like this:

SELECT common_name, scientific_name

FROM birds WHERE family_id = 103

ORDER BY common_name

LIMIT 3;

+-----------------------+----------------------+

| common_name           | scientific_name      |

+-----------------------+----------------------+

| Black-bellied Plover  | Pluvialis squatarola |

| Mountain Plover       | Charadrius montanus  |

| Pacific Golden Plover | Pluvialis fulva      |

+-----------------------+----------------------+

Notice that the ORDER BY clause is located after the WHERE clause and before the LIMIT clause. Not only will this statement display the rows in order by common_name, but it will retrieve only the first three rows based on the ordering. That is to say, MySQL will first retrieve all of the rows based on the WHERE clause, store those results in a temporary table behind the scenes, order the data based on the ORDER BY clause, and then return to the mysql client the first three rows found in that temporary table based on the LIMIT clause. This activity is the reason for the positioning of each clause.

By default, the ORDER BY clause uses ascending order, which means from A to Z for an alphabetic column. If you want to display data in descending order, add the DESC option, as in ORDER BY DESC. There’s also a contrasting ASC option, but you probably won’t need to use it because ascending order is the default.

To order by more than one column, give all the columns in the ORDER BY clause in a comma-separated list. Each column can be sorted in ascending or descending order. The clause sorts all the data by the first column you specify, and then within that order by the second column, etc. To illustrate this, we’ll select another column from the birds table, family_id, and we’ll get birds from a few more families. We’ll select some other types of shore birds: Oystercatchers (i.e., Haematopodidae), Stilts (e.g., Recurvirostridae), and Sandpipers (e.g., Scolopacidae). First, we need thefamily_id for each of these families. Execute the following on your server:

SELECT * FROM bird_families

WHERE scientific_name

IN('Charadriidae','Haematopodidae','Recurvirostridae','Scolopacidae');

+-----------+------------------+------------------------------+----------+

| family_id | scientific_name  | brief_description            | order_id |

+-----------+------------------+------------------------------+----------+

|       103 | Charadriidae     | Plovers, Dotterels, Lapwings |      102 |

|       160 | Haematopodidae   | Oystercatchers               |      102 |

|       162 | Recurvirostridae | Stilts and Avocets           |      102 |

|       164 | Scolopacidae     | Sandpipers and Allies        |      102 |

+-----------+------------------+------------------------------+----------+

In this SELECT statement, we added another item to the WHERE clause, the IN operator. It lists, within parentheses, the various values we want in the scientific_name column. Let’s use the IN operator again to get a list of birds and also test the LIMIT clause:

SELECT common_name, scientific_name, family_id

FROM birds

WHERE family_id IN(103, 160, 162, 164)

ORDER BY common_name

LIMIT 3;

+-------------+--------------------------------+-----------+

| common_name | scientific_name                | family_id |

+-------------+--------------------------------+-----------+

|             | Charadrius obscurus aquilonius |       103 |

|             | Numenius phaeopus phaeopus     |       164 |

|             | Tringa totanus eurhinus        |       164 |

+-------------+--------------------------------+-----------+

Notice that we didn’t put the numeric values in quotes as we did with the family names in the previous SQL statement. Single or double quotes are necessary for strings, but they’re optional for numeric values. However, it’s a better practice to not use quotes around numeric values. They can affect performance and cause incorrect results if you mix them with strings.

There is one odd thing about the results here: there aren’t any common names for the birds returned. That’s not a mistake. About 10,000 birds in the birds table are true species of birds, and about 20,000 are subspecies. Many subspecies don’t have a unique common name. With about 30,000 species and subspecies of birds, with all of the minor nuances between the subspecies bird families, there just aren’t common names for all of them. Each bird has a scientific name assigned by ornithologists, but everyday people who use the common names for birds don’t see the subtle distinctions that ornithologists see. This is why the scientific_name column is necessary and why the common_name column cannot be a key column in the table.

Let’s execute that SQL statement again, but add another factor to the WHERE clause to show only birds with a value for the common_name column:

SELECT common_name, scientific_name, family_id

FROM birds

WHERE family_id IN(103, 160, 162, 164)

AND common_name != ''

ORDER BY common_name

LIMIT 3;

+-----------------------+-----------------------+-----------+

| common_name           | scientific_name       | family_id |

+-----------------------+-----------------------+-----------+

| African Oystercatcher | Haematopus moquini    |       160 |

| African Snipe         | Gallinago nigripennis |       164 |

| Amami Woodcock        | Scolopax mira         |       164 |

+-----------------------+-----------------------+-----------+

In the WHERE clause, we added the AND logical operator to specify a second filter. For a row to match the WHERE clause, the family_id must be one in the list given and the common_name must not be equal to a blank value.

Nonprogrammers will have to learn a few conventions to use large WHERE clauses. We’ve seen that an equals sign says, “The column must contain this value,” but the != construct says, “The column must not contain this value.” And in our statement, we used '' to refer to an empty string. So we’ll get the rows where the common name exists.

In this case, we couldn’t ask for non-NULL columns. We could have set up the table so that birds without common names had NULL in the common_name column, but we chose to instead use empty strings. That’s totally different in meaning: NULL means there is no value, whereas the empty string is still a string even if there are no characters in it. We could have used NULL, but having chosen the empty string, we must use the right value in our WHERE clause.

Incidentally, != is the same as <> (i.e., less-than sign followed by greater-than sign).

Limiting Results

The birds table has nearly 30,000 rows, so selecting data without limits can return more rows than you might want to view at a time. We’ve already used the LIMIT clause to resolve this problem. We limited the results of the SELECT statement to three rows, the first three rows based on theWHERE and ORDER BY clauses. If we’d like to see the subsequent rows, maybe the next two based on the criteria we gave previously, we could change the LIMIT clause to show five rows. But an alternative, which is often a better choice, is to do something like this:

SELECT common_name, scientific_name, family_id

FROM birds

WHERE family_id IN(103, 160, 162, 164)

AND common_name != ''

ORDER BY common_name

LIMIT 3, 2;

+------------------------+-------------------------+-----------+

| common_name            | scientific_name         | family_id |

+------------------------+-------------------------+-----------+

| American Avocet        | Recurvirostra americana |       162 |

| American Golden-Plover | Pluvialis dominica      |       103 |

+------------------------+-------------------------+-----------+

This LIMIT clause has two values: the point where we want the results to begin, then the number of rows to display. The result is to show rows 3 and 4. Incidentally, LIMIT 3 used previously is the same as LIMIT 0, 3: the 0 tells MySQL not to skip any rows.

Combining Tables

So far in this chapter we’ve been working with just one table. Let’s look at some ways to select data from more than one table. To do this, we will have to tell MySQL the tables from which we want data and how to join them together.

For an example, let’s get a list of birds with their family names. To keep the query simple, we’ll select birds from different families, but all in the same order of birds. In earlier examples where we got a list of shore birds, they all had the same order_id of 102. We’ll use that value again. Enter this SELECT statement on your server:

SELECT common_name AS 'Bird',

bird_families.scientific_name AS 'Family'

FROM birds, bird_families

WHERE birds.family_id = bird_families.family_id

AND order_id = 102

AND common_name != ''

ORDER BY common_name LIMIT 10;

+------------------------+------------------+

| Bird                   | Family           |

+------------------------+------------------+

| African Jacana         | Jacanidae        |

| African Oystercatcher  | Haematopodidae   |

| African Skimmer        | Laridae          |

| African Snipe          | Scolopacidae     |

| Aleutian Tern          | Laridae          |

| Amami Woodcock         | Scolopacidae     |

| American Avocet        | Recurvirostridae |

| American Golden-Plover | Charadriidae     |

| American Oystercatcher | Haematopodidae   |

| American Woodcock      | Scolopacidae     |

+------------------------+------------------+

This SELECT statement returns one column from the birds table and one from the bird_families table. This is a hefty SQL statement, but don’t let it fluster you. It’s like previous statements in this chapter, but with some minor changes and one significant one. First, let’s focus on the one significant change: how we’ve drawn data from two tables.

The FROM clause lists the two tables, separated by a comma. In the WHERE clause, we indicated that we want rows in which the value of family_id in the two tables is equal. Otherwise, we would have duplicate rows in the results. Because those columns have the same name (family_id) in both tables, to prevent ambiguity, we put the table name before the colum name, separated by a dot (e.g., birds.family_id). We did the same thing for the scientific name in the column list (bird_families.scientific_name). If we don’t do that, MySQL would be confused as to whether we want thescientific_name from the birds or the bird_families table. This would generate an error like this:

ERROR 1052 (23000): Column 'scientific_name' in field list is ambiguous

You may have noticed that another new item was added to the SELECT statement: the AS keyword. This specifies a substitute name, or alias, for the heading in the results set for the column. Without the AS keyword for the column containing the family names, the heading would saybird_families.scientific_name. That’s not as attractive. This is another style factor, but it can have more practical aspects that we’ll see later. The keyword AS can also be used to specify a table name like so:

SELECT common_name AS 'Bird',

families.scientific_name AS 'Family'

FROM birds, bird_families AS families

WHERE birds.family_id = families.family_id

AND order_id = 102

AND common_name != ''

ORDER BY common_name LIMIT 10;

In this example, we provided an alias for the bird_families table. We set it to the shorter name families. Note that aliases for table names must not be in quotes.

After setting the alias, we must use it wherever we want to refer to the table. So we have to change the column selected in the field list from bird_families.scientific_name to families.scientific_name. We also have to change the column name bird_families.family_id in the WHERE clause to families.family_id. If we don’t make this final change, we’ll get the following error:

ERROR 1054 (42S22):

Unknown column 'bird_families.family_id' in 'where clause'

Let’s add a third table to the previous SQL statement, to get the name of the order of birds to which the birds belong. You can do that by entering this SQL statement on your server:

SELECT common_name AS 'Bird',

families.scientific_name AS 'Family',

orders.scientific_name AS 'Order'

FROM birds, bird_families AS families, bird_orders AS orders

WHERE birds.family_id = families.family_id

AND families.order_id = orders.order_id

AND families.order_id = 102

AND common_name != ''

ORDER BY common_name LIMIT 10, 5;

+------------------+------------------+-----------------+

| Bird             | Family           | Order           |

+------------------+------------------+-----------------+

| Ancient Murrelet | Alcidae          | Charadriiformes |

| Andean Avocet    | Recurvirostridae | Charadriiformes |

| Andean Gull      | Laridae          | Charadriiformes |

| Andean Lapwing   | Charadriidae     | Charadriiformes |

| Andean Snipe     | Scolopacidae     | Charadriiformes |

+------------------+------------------+-----------------+

Let’s look at the changes from the previous statement to this one. We added the third table to the FROM clause and gave it an alias of orders. To properly connect the third table, we had to add another evaluator to the WHERE clause: families.order_id = orders.order_id. This allows the SELECTto retrieve the right rows containing the scientific names of the orders that correspond to the rows we select from the families. We also added a column to the field list to display the name of the order. Because the families we’ve selected are all from the same order, that field seems a little pointless in these results but can be useful as we search more orders in the future. We gave a starting point for the LIMIT clause so that we could see the next five birds in the results.

NOTE

It’s not necessary to put the field alias name for a column in quotes if the alias is only one word. However, if you use a reserved word (e.g., Order), you will need to use quotes.

Expressions and the Like

Let’s change the latest SELECT statement to include birds from multiple orders. To do this, we’ll focus in on the operator in the WHERE clause for the common_name:

AND common_name != ''

We’ll change the simple comparison here (i.e., the LIKE operator, which we saw in Chapter 6) to select multiple names that are similar. Among many families of birds, there are often bird species that are similar but have different sizes. The smallest is sometimes referred to as the least in the common name. So let’s search the database for birds with Least in their name:

SELECT common_name AS 'Bird',

families.scientific_name AS 'Family',

orders.scientific_name AS 'Order'

FROM birds, bird_families AS families, bird_orders AS orders

WHERE birds.family_id = families.family_id

AND families.order_id = orders.order_id

AND common_name LIKE 'Least%'

ORDER BY orders.scientific_name, families.scientific_name, common_name

LIMIT 10;

+------------------+---------------+------------------+

| Bird             | Family        | Order            |

+------------------+---------------+------------------+

| Least Nighthawk  | Caprimulgidae | Caprimulgiformes |

| Least Pauraque   | Caprimulgidae | Caprimulgiformes |

| Least Auklet     | Alcidae       | Charadriiformes  |

| Least Tern       | Laridae       | Charadriiformes  |

| Least Sandpiper  | Scolopacidae  | Charadriiformes  |

| Least Seedsnipe  | Thinocoridae  | Charadriiformes  |

| Least Flycatcher | Tyrannidae    | Passeriformes    |

| Least Bittern    | Ardeidae      | Pelecaniformes   |

| Least Honeyguide | Indicatoridae | Piciformes       |

| Least Grebe      | Podicipedidae | Podicipediformes |

+------------------+---------------+------------------+

In the preceding example, using the LIKE operator, MySQL selected rows in which the common_name starts with Least and ends with anything (i.e., the wildcard, %). We also removed the families.order_id = 102 clause, so that we wouldn’t limit the birds to a single order. The results now have birds from a few different orders.

We also changed the ORDER BY clause to have MySQL order the results in the temporary table first by the bird order’s scientific name, then by the bird family’s scientific name, and then by the bird’s common name. If you look at the results, you can see that’s what it did: it sorted the orders first. If you look at the rows for the Charadriiformes, you can see that the families for that order are in alphabetical order. The two birds in the Caprimulgidae family are in alphabetical order.

NOTE

You cannot use alias names for columns in the ORDER BY clause, but you can use alias table names. In fact, they’re required if you’ve used the aliases in the FROM clause.

The previous example used the LIKE operator, which has limited pattern matching abilities. As an alternative, you can use REGEXP, which has many pattern matching characters and classes. Let’s look at a simpler example, of the previous SELECT statement, but using REGEXP. In the previous example we searched for small birds, birds with a common name starting with the word Least. The largest bird in a family is typically called Great. To add these birds, enter the following SQL statement on your server:

SELECT common_name AS 'Birds Great and Small'

FROM birds

WHERE common_name REGEXP 'Great|Least'

ORDER BY family_id LIMIT 10;

+-----------------------------+

| Birds Great and Small       |

+-----------------------------+

| Great Northern Loon         |

| Greater Scaup               |

| Greater White-fronted Goose |

| Greater Sand-Plover         |

| Great Crested Tern          |

| Least Tern                  |

| Great Black-backed Gull     |

| Least Nighthawk             |

| Least Pauraque              |

| Great Slaty Woodpecker      |

+-----------------------------+

The expression we’re giving with REGEXP, within the quote marks, contains two string values: Great and Least. By default, MySQL assumes the text given for REGEXP is meant to be for the start of the string. To be emphatic, you can insert a carat (i.e., ^) at the start of these string values, but it’s unnecessary. The vertical bar (i.e., |) between the two expressions signifies that either value is acceptable — it means or.

In the results, you can see some common bird names starting with Greater, not just Great. If we don’t want to include the Greater birds, we can exclude them with the NOT REGEXP operator. Enter the following on your server:

SELECT common_name AS 'Birds Great and Small'

FROM birds

WHERE common_name REGEXP 'Great|Least'

AND common_name NOT REGEXP 'Greater'

ORDER BY family_id LIMIT 10;

+--------------------------+

| Birds Great and Small    |

+--------------------------+

| Great Northern Loon      |

| Least Tern               |

| Great Black-backed Gull  |

| Great Crested Tern       |

| Least Nighthawk          |

| Least Pauraque           |

| Great Slaty Woodpecker   |

| Great Spotted Woodpecker |

| Great Black-Hawk         |

| Least Flycatcher         |

+--------------------------+

Using NOT REGEXP eliminated all of the Greater birds. Notice that it was included with AND, and not as part of the REGEXP.

Incidentally, we’re ordering here by family_id to keep similar birds together in the list and to have a good mix of Great and Least birds. The results may seem awkward, though, as the names of the birds are not ordered. We could add another column to the ORDER BY clause to alphabetize them within each family.

REGEXP and NOT REGEXP are case insensitive. If we want an expression to be case sensitive, we’ll need to add the BINARY option. Let’s get another list of birds to see this. This time we’ll search for Hawks, with the first letter in uppercase. This is because we want only Hawks and not other birds that have the word, hawk in their name, but are not a Hawk. For instance, we don’t want Nighthawks and we don’t want Hawk-Owls. The way the data is in the birds table, each word of a common name starts with an uppercase letter — the names are in title case. So we’ll eliminate birds such as Nighthawks by using the BINARY option to require that “Hawk” be spelled with an uppercase H and the other letters in lowercase. We’ll use NOT REGEXP to not allow Hawk-Owls. Try the following on your server:

SELECT common_name AS 'Hawks'

FROM birds

WHERE common_name REGEXP BINARY 'Hawk'

AND common_name NOT REGEXP 'Hawk-Owl'

ORDER BY family_id LIMIT 10;

+-------------------+

| Hawks             |

+-------------------+

| Red-tailed Hawk   |

| Bicolored Hawk    |

| Common Black-Hawk |

| Cuban Black-Hawk  |

| Rufous Crab Hawk  |

| Great Black-Hawk  |

| Black-faced Hawk  |

| White-browed Hawk |

| Ridgway's Hawk    |

| Broad-winged Hawk |

+-------------------+

I stated that REGEXP and NOT REGEXP are case insensitive, unless you add the BINARY option as we did to stipulate the collating method as binary (e.g., the letter H has a different binary value fromn the letter h). For the common_name column, though, we didn’t need to add the BINARY option because the column has a binary collation setting. We did this unknowingly when we created the rookery database near the beginning of Chapter 4. See how we created the database by entering this from the mysql client:

SHOW CREATE DATABASE rookery \G

*************************** 1. row ***************************

       Database: rookery

Create Database: CREATE DATABASE `rookery` /*!40100 DEFAULT

                 CHARACTER SET latin1 COLLATE latin1_bin */

The COLLATE clause is set to latin1_bin, meaning Latin1 binary. Any columns that we create in tables in the rookery database, unless we specify otherwise, will be collated using latin1_bin. Execute the following statement to see how the common_name column in the birds table is set:

SHOW FULL COLUMNS

FROM birds LIKE 'common_name' \G

*************************** 1. row ***************************

     Field: common_name

      Type: varchar(255)

 Collation: latin1_bin

      Null: YES

       Key:

   Default: NULL

     Extra:

Privileges: select,insert,update,references

   Comment:

This shows information just on the common_name column. Notice that the Collation is latin1_bin. Because of that, regular expressions using REGEXP are case sensitive without having to add the BINARY option.

Looking through the birds table, we discover some common names for birds that contain the words, “Hawk Owls,” without the hyphen in between. We didn’t allow for that in the expression we gave. We discover also that there are birds in which the word “Hawk” is not in title case — so we can’t count on looking for the uppercase letter, H. Our previous regular expression left those birds out of the results. So we’ll have to change the expression and try a different method. Enter this on your server:

SELECT common_name AS 'Hawks'

FROM birds

WHERE common_name REGEXP '[[:space:]]Hawk|[[.hyphen.]]Hawk'

AND common_name NOT REGEXP 'Hawk-Owl|Hawk Owl'

ORDER BY family_id;

This first, rather long REGEXP expression uses a character class and a character name. The format of character classes and character names is to put the type of character between two sets of double brackets. A character class is given between a pair of colons (e.g., [[:alpha:]] for alphabetic characters). A character name is given between two dots (e.g., [[.hyphen.]] for a hyphen). Looking at the first expression, you can deduce that we want rows in which the common_name contains either “Hawk” or “-Hawk” — that is to say, Hawk preceded by a space or a hyphen. This won’t allow for Hawk preceded by a letter (e.g., Nighthawk). The second expression excludes Hawk-Owl and Hawk Owl.

Pattern matching in regular expressions in MySQL tends to be more verbose than they are in other languages like Perl or PHP. But they do work for basic requirements. For elaborate regular expressions, you’ll have to use an API like the Perl DBI to process the data outside of MySQL. Because that may be a performance hit, it’s better to try to accomplish such tasks within MySQL using REGEXP.

Counting and Grouping Results

In many of our examples, we displayed only a few rows of data because the results could potentially contain thousands of rows. Suppose we’d like to know how many are contained in the table. We can do that by adding a function to the statement. In this case, we want the COUNT() function. Let’s see how that would work:

SELECT COUNT(*) FROM birds;

+----------+

| COUNT(*) |

+----------+

|    28891 |

+----------+

We put an asterisk within the parentheses of the function to indicate that we want all of the rows. We could put a column name instead of an asterisk to count only rows that have data. Using a column prevents MySQL from counting rows that have a NULL value in that column. But it will count rows that have a blank or empty value (i.e., '').

It’s nice to know how many rows are in the birds table, but suppose we’d like to break apart that count. Let’s use COUNT() to count the number of rows for a particular family of birds, the Pelecanidae — those are Pelicans. Enter this SQL statement in the mysql client on your server:

SELECT families.scientific_name AS 'Family',

COUNT(*) AS 'Number of Birds'

FROM birds, bird_families AS families

WHERE birds.family_id = families.family_id

AND families.scientific_name = 'Pelecanidae'

+-------------+-----------------+

| Family      | Number of Birds |

+-------------+-----------------+

| Pelecanidae |              10 |

+-------------+-----------------+

As you can see, there are 10 bird species recorded for the Pelecanidae family in the birds table. In this example, we used the WHERE clause to limit the results to the Pelecanidae family. Suppose we want to know the number of birds for other bird families in the same order to which Pelicans belong, to the order called Pelecaniformes. To do this, we’ll add the bird_orders table to the previous SELECT statement. Enter the following from the mysql client:

SELECT orders.scientific_name AS 'Order',

families.scientific_name AS 'Family',

COUNT(*) AS 'Number of Birds'

FROM birds, bird_families AS families, bird_orders AS orders

WHERE birds.family_id = families.family_id

AND families.order_id = orders.order_id

AND orders.scientific_name = 'Pelecaniformes';

+----------------+-------------+-----------------+

| Order          | Family      | Number of Birds |

+----------------+-------------+-----------------+

| Pelecaniformes | Pelecanidae |             224 |

+----------------+-------------+-----------------++

This tells us that there are 224 birds in the birds table that belong to Pelecaniformes. There are five families in that order of birds, but it returned only the first family name found. If we want to know the name of each family and the number of birds in each family, we need to get MySQL to group the results. To do this, we have to tell it the column by which to group. This is where the GROUP BY clause comes in. This clause tells MySQL to group the results based on the columns given with the clause. Let’s see how that might look. Enter the following on your server:

SELECT orders.scientific_name AS 'Order',

families.scientific_name AS 'Family',

COUNT(*) AS 'Number of Birds'

FROM birds, bird_families AS families, bird_orders AS orders

WHERE birds.family_id = families.family_id

AND families.order_id = orders.order_id

AND orders.scientific_name = 'Pelecaniformes'

GROUP BY Family;

+----------------+-------------------+-----------------+

| Order          | Family            | Number of Birds |

+----------------+-------------------+-----------------+

| Pelecaniformes | Ardeidae          |             157 |

| Pelecaniformes | Balaenicipitidae  |               1 |

| Pelecaniformes | Pelecanidae       |              10 |

| Pelecaniformes | Scopidae          |               3 |

| Pelecaniformes | Threskiornithidae |              53 |

+----------------+-------------------+-----------------+

We gave the GROUP BY clause the Family alias, which is the scientific_name column from the bird_families table. MySQL returns one results set for all five families, for one SELECT statement.

The GROUP BY clause is very useful. You’ll use it often, so learn it well. This clause and related functions are covered in greater detail in Chapter 12.

Summary

The SELECT statement offers quite a number of parameters and possibilities that I had to skip to keep this chapter from becoming too lengthy and too advanced for a learning book. For instance, there are several options for caching results and a clause for exporting a results set to a text file. You can learn about these from other sources if you need them.

At this point, make sure you’re comfortable with the SELECT statement and its main components: choosing columns and using field aliases; choosing multiple tables in the FROM clause; how to construct a WHERE clause, including the basics of regular expressions; using the ORDER BY and the GROUP BY clauses; and limiting results with the LIMIT clause. It will take time and practice to become very comfortable with all of these components. Before moving on to Chapter 8, make sure to complete the exercises in the next section.

Exercises

The following exercises will help cement your understanding of the SELECT statement. The act of typing SQL statements, especially ones that you will use often like SELECT, helps you to learn, memorize, and know them well.

1.    Construct a SELECT statement to select the common names of birds from the birds table. Use the LIKE operator to select only Pigeons from the table. Order the table by the common_name column, but give it a field alias of Bird'. Don’t limit the results; let MySQL retrieve all of the rows that match. Execute the statement on your server and look over the results.
Next, use the same SELECT statement, but add a LIMIT clause. Limit the results to the first ten rows and execute it. Compare the results to the previous SELECT statement to make sure the results show the 1st through 10th row. Then modify the SELECT statement again to display the next 10 rows. Compare these results to the results from the first SELECT statement to make sure you retrieved the 11th through 20th row. If you didn’t, find your mistake and correct it until you get it right.

2.    In this exercise, you’ll begin with a simple SELECT statement and then make it more complicated. To start, construct a SELECT statement in which you select the scientific_name and the brief_description from the bird_orders table. Give the field for the scientific_name an alias ofOrder — and don’t forget to put quotes around it because it’s a reserved word. Use an alias of Types of Birds in Order for brief_description. Don’t limit the results. When you think that you have the SELECT statement constructed properly, execute it. If you have errors, try to determine the problem and fix the statement until you get it right.
Construct another SELECT statement in which you retrieve data from the birds table. Select the common_name and the scientific_name columns. Give them field aliases: Common Name of Bird and Scientific Name of Bird. Exclude rows in which the common_name column is blank. Order the data by the common_name column. Limit the results to 25 rows of data. Execute the statement until it works without an error.
Merge the first and second SELECT statements together to form one SELECT statement that retrieves the same four columns with the same alias from the same two tables (this was covered in Combining Tables). It involves giving more than one table in the FROM clause and providing value pairs in the WHERE clause for temporarily connecting the tables to each other. This one may seem tricky. So take your time and don’t get frustrated. If necessary, refer back to Combining Tables.
Limit the results to 25 rows. If you do it right, you should get the same 25 birds from the second SELECT of this exercise, but with two more fields of data. Be sure to exclude rows in which the common_name column is blank.

3.    Use the SELECT statement in conjunction with REGEXP in the WHERE clause to get a list of birds from the birds table in which the common_name contains the word “Pigeon” or “Dove” (this was covered in Expressions and the Like). Give the field for the common_name column the alias >Type of Columbidae — that’s the name of the family to which Doves and Pigeons belong.