Sams Teach Yourself PHP, MySQL and Apache All in One (2012)
Part IV. PHP and MySQL Integration
Chapter 15. Understanding the Database Design Process
In this chapter, you learn the following:
• Some advantages to good database design
• Three types of table relationships
• How to normalize your database
• How to implement a good database design process
In this chapter, you learn the reasoning behind designing a relational database. After this concept-focused chapter, you jump headlong into learning the basic MySQL commands in preparation for integrating MySQL in your own applications.
The Importance of Good Database Design
A good database design is crucial for a high-performance application, just as an aerodynamic body is important to a race car. If a car does not have smooth lines, it produces drag and goes slower. Without optimized relationships, your database will not perform as efficiently as possible. Thinking about relationships and database efficiency—which includes ease of maintenance, minimizing duplications, and avoiding inconsistencies—is part of normalization.
Specifically, normalization refers to the process of structuring data to minimize duplication and inconsistencies.
Beyond the issue of performance is the issue of maintenance—your database should be easy to maintain. This includes storing only a limited amount (if any) of repetitive data. If you have a lot of repetitive data and one instance of that data undergoes a change (such as a name change), that change has to be made for all occurrences of the data. To eliminate duplication and enhance your ability to maintain the data, you might create a table of possible values and use a key to refer to the value. That way, if the value changes names, the change occurs only once—in the master table. The reference remains the same throughout other tables.
For example, suppose that you are responsible for maintaining a database of students and the classes in which they are enrolled. If 35 of these students are in the same class, let’s call it Advanced Math, this class name would appear 35 times in the table. Now, if the instructor decides to change the name of the class to Mathematics IV, you must change 35 records to reflect the new name of the class. If the database were designed so that class names appeared in one table and just the class ID number was stored with the student record, you would have to change only 1 record—not 35—to update the name change.
The benefits of a well-planned and designed database are numerous, and it stands to reason that the more work you do up front, the less you have to do later. A really bad time for a database redesign is after the public launch of the application using it—although it does happen, and the results are costly.
So, before you even start coding an application, spend a lot of time designing your database. Throughout the rest of this chapter, you learn more about relationships and normalization, two important pieces to the design puzzle.
Types of Table Relationships
Table relationships come in several forms:
• One-to-one relationships
• One-to-many relationships
• Many-to-many relationships
For example, suppose that you have a table called employees that contains each person’s Social Security number, name, and the department in which he or she works. Suppose that you also have a separate table called departments, containing the list of all available departments, made up of a Department ID and a name. In the employees table, the Department ID field matches an ID found in the departments table. You can see this type of relationship in Figure 15.1. The PK next to the field name indicates the primary key for the table.
Figure 15.1 The employees and departments tables are related through the DeptID key.
In the following sections, you take a closer look at each of the relationship types.
In a one-to-one relationship, a key appears only once in a related table. The employees and departments tables do not have a one-to-one relationship because many employees undoubtedly belong to the same department. A one-to-one relationship exists, for example, if each employee is assigned one computer within a company. Figure 15.2 shows the one-to-one relationship of employees to computers.
Figure 15.2 One computer is assigned to each employee.
The employees and computers tables in your database would look something like Figure 15.3, which represents a one-to-one relationship.
Figure 15.3 One-to-one relationship in the data model.
In a one-to-many relationship, keys from one table appear multiple times in a related table. The example shown in Figure 15.1, indicating a connection between employees and departments, illustrates a one-to-many relationship. A real-world example is an organizational chart of the department, as shown in Figure 15.4.
Figure 15.4 One department contains many employees.
The one-to-many relationship is the most common type of relationship. Another practical example is the use of a state abbreviation in an address database; each state has a unique identifier (CA for California, PA for Pennsylvania, and so on), and each address in the United States has a state associated with it.
If you have eight friends in California and five in Pennsylvania, you use only two distinct abbreviations in your table. One abbreviation (CA) represents a one-to-eight relationship, and the other (PA) represents a one-to-five relationship.
The many-to-many relationship often causes problems in practical examples of normalized databases, so much so that it is common to simply break many-to-many relationships into a series of one-to-many relationships. In a many-to-many relationship, the key value of one table can appear many times in a related table. So far, it sounds like a one-to-many relationship, but here’s the curveball: The opposite is also true, meaning that the primary key from that second table can also appear many times in the first table.
Think of such a relationship this way, using the example of students and classes: A student has an ID and a name. A class has an ID and a name. A student usually takes more than one class at a time, and a class always contains more than one student, as you can see in Figure 15.5.
Figure 15.5 Students take classes, and classes contain students.
As you can see, this sort of relationship does not present an easy method for relating tables. Your tables could look like Figure 15.6, seemingly unrelated.
Figure 15.6 The students table and the classes table, unrelated.
To make the theoretical many-to-many relationship, you create an intermediate table, one that sits between the two tables and essentially maps them together. You might build such a table similar to the one in Figure 15.7.
Figure 15.7 The students_classes_map table acts as an intermediary.
If you take the information in Figure 15.5 and put it into the intermediate table, you can create something like Figure 15.8.
Figure 15.8 The students_classes_map table populated with data.
As you can see, many students and many classes happily coexist within the students_classes_map table.
With this introduction to the types of relationships, learning about normalization should be a snap.
Normalization is simply a set of rules that will ultimately make your life easier when you are acting as a database administrator. It is the art of organizing your database in such a way that your tables relate where appropriate and are flexible for future growth.
The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it is considered in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form.
Throughout this chapter, you learn about each rule in the first, second, and third normal forms—the most foundational of the nine different normalizations—and, I hope, will follow them as you create your own applications. In the chapter, you use a sample set of tables for a students-and-courses database and take it to the third normal form.
Problems with the Flat Table
Before launching into the first normal form, you have to start with something that needs to be fixed. In the case of a database, it’s the flat table. A flat table is like a spreadsheet—it has many, many columns. There are no relationships between multiple tables; all the data you could possibly want is right there in that flat table. This scenario is inefficient and consumes more physical space on your hard drive than a normalized database.
In your students-and-courses database, assume that you have the following fields in your flat table:
• StudentName—The name of the student.
• CourseID1—The ID of the first course taken by the student.
• CourseDescription1—The description of the first course taken by the student.
• CourseInstructor1—The instructor of the first course taken by the student.
• CourseID2—The ID of the second course taken by the student.
• CourseDescription2—The description of the second course taken by the student.
• CourseInstructor2—The instructor of the second course taken by the student.
• Repeat CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes students can take during their academic career.
With what you’ve learned so far, you should be able to identify the first problem area: CourseID, CourseDescription, and CourseInstructor columns are repeated groups.
Eliminating redundancy is the first step in normalization, so next you take this flat table to first normal form. If your table remained in its flat format, you could have a lot of unclaimed space and a lot of space being used unnecessarily—not an efficient table design.
First Normal Form
The rules for the first normal form are as follows:
• Eliminate repeating information.
• Create separate tables for related data.
If you think about the flat table design with many repeated sets of fields for the students-and-courses database, you can identify two distinct topics: students and courses. Taking your students-and-courses database to the first normal form means that you create two tables: one for students and one for courses, as shown in Figure 15.9.
Figure 15.9 Breaking the flat table into two tables.
Your two tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they want and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that existed in the flat table.
The next step is to put the tables into second normal form.
Second Normal Form
The rule for the second normal form is as follows:
• No nonkey attributes depend on a portion of the primary key.
In plain English, this means that if fields in your table are not entirely related to a primary key, you have more work to do. In the students-and-courses example, you need to break out the courses into their own table and modify the students_courses table.
CourseID, CourseDescription, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see this new design in Figure 15.10.
Figure 15.10 Taking your tables to second normal form.
This structure should look familiar to you as a many-to-many relationship using an intermediary mapping table. The third normal form is the last form you look at here, and you should find that it is just as simple to understand as the first two.
Third Normal Form
The rule for the third normal form is as follows:
• No attributes depend on other nonkey attributes.
This rule simply means that you need to look at your tables and see whether you have more fields that can be broken down further and that are not dependent on a key. Think about removing repeated data and you’ll find your answer: instructors. Inevitably, an instructor teaches more than one class. However, CourseInstructor is not a key of any sort. So, if you break out this information and create a separate table purely for the sake of efficiency and maintenance (as shown in Figure 15.11), that is the third normal form.
Figure 15.11 Taking your tables to third normal form.
Third normal form is usually adequate for removing redundancy and allowing for flexibility and growth. The next section gives you some pointers for the thought process involved in database design and where it fits in the overall design process of your application.
Following the Design Process
The greatest problem in application design is a lack of forethought. As it applies to database-driven applications, the design process must include a thorough evaluation of your database—what it should hold, how data relates to each other, and most important, whether it is scalable.
The general steps in the design process are as follow:
• Define the objective.
• Design the data structures (tables, fields).
• Discern relationships.
• Define and implement business rules.
• Create the application.
Creating the application is the last step, not the first. Many developers take an idea for an application, build it, and then go back and try to make a set of database tables fit into it. This approach is completely backward, inefficient, and will cost a lot of time and money.
Before you start any application design process, sit down and talk it out. If you cannot describe your application—including the objectives, audience, and target market—you are not ready to build it, let alone model the database.
After you can describe the actions and nuances of your application to other people and have it make sense to them, you can start thinking about the tables you want to create. Start with big flat tables because after you write them down, your newfound normalization skills will take over. You will be able to find your redundancies and visualize your relationships. As you become more experienced, you will be able to minimize the steps in this process, but there’s nothing wrong with stepping through them carefully and explicitly.
The next step is to do the normalization. Go from a flat table to the first normal form and so on up to the third normal form if possible. Use paper, pencils, sticky notes, or whatever helps you to visualize the tables and relationships. There’s no shame in data modeling on sticky notes until you are ready to create the tables themselves. Plus, using sticky notes is a lot cheaper than buying software to do it for you; modeling software ranges from one hundred to several thousands of dollars.
After you have a preliminary data model, look at it from the application’s point of view. Or look at it from the point of view of the person using the application you are building. This is the point where you define business rules and see whether your data model breaks. An example of a business rule for an online registration application is, “Each user must have one email address, and it must not belong to any other user.” If EmailAddress is not a unique field in your data model, your model will break based on the business rule.
After your business rules have been applied to your data model, only then can application programming begin. You can rest assured that your data model is solid and you will not be programming yourself into a brick wall. The latter event is all too common.
Following proper database design is the only way your application will be efficient, flexible, and easy to manage and maintain. An important aspect of database design is to use relationships between tables instead of throwing all your data into one long flat file. Types of relationships include one-to-one, one-to-many, and many-to-many.
Using relationships to properly organize your data is called normalization. There are many levels of normalization, but the primary levels are the first, second, and third normal forms. Each level has a rule or two that you must follow. Following all the rules helps ensure that your database is well organized and flexible.
To take an idea from inception through to fruition, you should follow a design process. This process essentially says, “Think before you act.” Discuss rules, requirements, and objectives, and only then create the final version of your normalized tables.
Q. Are there only three normal forms?
A. No, there are more than three normal forms. Additional forms are the Boyce-Codd normal form, fourth normal form, and fifth normal form/Join-Projection normal form. These forms are not often followed in practical application development because the benefits of doing so are outweighed by the cost in man-hours and database efficiency (but it is certainly fine if you implement them). For more information, please see http://en.wikipedia.org/wiki/Database_normalization#Normal_forms.
The workshop is designed to help you review what you have learned and begin putting your knowledge into practice.
1. Name three types of data relationships.
2. Because many-to-many relationships are difficult to represent in an efficient database design, what should you do?
3. Name a few ways you can create visualizations of data relationships.
1. One-to-one, one-to-many, many-to-many.
2. Create a series of one-to-many relationships using intermediary mapping tables.
3. You can use a range of tools, from sticky notes and string (where notes are the tables and string shows the relationships between tables) to software used to draw diagrams, to software programs that interpret your SQL statements and produce visualizations.
Explain each of the three normal forms to a person who works with spreadsheets and flat tables.