Sams Teach Yourself PHP, MySQL and Apache All in One (2012)

Part V. Basic Projects

Chapter 28. Working with XML and JSON


In this chapter, you learn the following:

• How to create a basic XML document structure

• How to access XML in PHP using DOM functions

• How to access XML in PHP using SimpleXML functions

• How to work with JSON data


This chapter introduces you to working with XML documents and JSON data via PHP, but in no way is it comprehensive—entire books have been written on XML alone. However, for people new to XML, XML manipulation via PHP, and to receiving and working with JSON data, some sets of functions are more manageable than others; this chapter introduces you to a few of them.

What Is XML?

The name XML comes from the full name of the language, Extensible Markup Language. Although markup is in the name of the language, do not think of XML as you do HTML. XML is used for the storage and exchange of data within tag pairs of your own designation, whereas HTML is a presentation language that allows you to delineate the structure and presentation of the document being viewed, regardless of the data it contains.

Basic XML Document Structure

XML documents contain two major elements: the prolog and the body. The prolog contains the XML declaration statement and any processing instructions and comments you want to add.


Note

For a complete definition of XML documents, read the XML specification at http://www.w3.org/TR/REC-xml.


The following snippet is a valid prolog:

<?xml version="1.0" ?>
<!-- Sample XML document -->

After the prolog comes the content structure. XML is hierarchical, like a book—books have titles and chapters, each of which contain paragraphs, and so forth. There is only one root element in an XML document. Continuing the book example, the element might be called Books, and the tags<Books></Books> surround all other information:

<Books>

Next, add any subsequent elements—called children—to your document. Continuing the book example, you need a master book element and then within it elements for title, author, and publishing information. Call these child elements Title, Author, and PublishingInfo. But the publishing information will likely contain more than one bit of information; you need a publisher’s name, location, and year of publication. Not a problem. Just create another set of child elements within your parent element (which also happens to be a child element of the root element). For example, just the <PublishingInfo> element could look like this:

<PublishingInfo>
    <PublisherName>Sams Publishing</PublisherName>
    <PublisherCity>Indianapolis</PublisherCity>
    <PublishedYear>2012</PublishedYear>
</PublishingInfo>

All together, a sample books.xml document with one entry could look something like this:

<?xml version="1.0" ?>
<!--Sample XML document -->
<Books>
    <Book>
        <Title>A Very Good Book</Title>
        <Author>Jane Doe</Author>
        <PublishingInfo>
            <PublisherName>Sams Publishing</PublisherName>
            <PublisherCity>Indianapolis</PublisherCity>
            <PublishedYear>2012</PublishedYear>
        </PublishingInfo>
    </Book>
</Books>

Keep in mind two important rules for creating valid XML documents:

• XML is case sensitive, so <Book> and <book> are considered different elements.

• All XML tags must be properly closed, XML tags must be properly nested, and no overlapping tags are allowed.

Add some dummy entries to the books.xml file and place it in the document root of your web server for use in later examples. You will use the same XML file throughout the different interface examples shown in this chapter.

When Might You Use XML and PHP?

The short (and snarky) answer to this question is “anytime you want,” but the serious answer is, as you can imagine, a little more complex than that. Earlier in this section, I noted that XML defines and carries content. This is still true. But what does that look like in “real life”?

The examples in this chapter use XML to store a small catalog of books. Imagine a large catalog of books stored in a proprietary database format, but one that has the capability to output data in XML format. If you need to get your hands on that catalog of books, but have no intention of purchasing or using the proprietary software in which it lives, XML would be the answer—using XML as a data interchange format. The owner of that data that is stored in a proprietary format, or in a manner that precludes direct access by a third-party (that is, you), exports the catalog into XML format. You can then parse and display the XML however you want, using one of the formats described in this chapter.

You’ll next learn about parsing XML documents using two different function families—DOM functions and SimpleXML functions. While they both produce the same results (parsing XML documents to provide you with data you can use later in your script), the approach is slightly different for each of them. For example, while SimpleXML functions are much simpler to use than DOM functions, as you’ll see after you work through the code listings, there would be a performance loss when working with large files since the entire XML document is loaded into memory and parsed before you can even begin to work with it. There are tradeoffs to be had with either choice, which is why this chapter simply introduces you to two sets of functions and leaves you to your own devices to figure out what really does (or does not) work for you and your development situation.

Accessing XML in PHP Using DOM Functions

The DOM XML extension has been part of PHP since version 4, but was completely overhauled in PHP 5. The primary change was to include the DOM functionality within a default installation of PHP, which is to say that no additional libraries or extensions need to be installed or configured to use these functions.


Note

DOM stands for Document Object Model. For more information about DOM, visit http://www.w3.org/TR/DOM-Level-2-Core/core.html.


The purpose of DOM functions is to enable you to work with data stored in an XML document using the DOM API. The most basic DOM function is DOMDocument->load(), which creates a new DOM tree from the contents of a file. After you create the DOM tree, you can use other DOM functions to manipulate the data. In Listing 28.1, DOM functions are used to loop through a DOM tree and retrieve stored values for later display.

Listing 28.1 Loop Through an XML Document Using DOM Functions


1:  <?php
2:  $dom = new DomDocument;
3:  $dom->load("books.xml");
4:
5:  foreach ($dom->documentElement->childNodes as $books) {
6:      if (($books->nodeType == 1) && ($books->nodeName == "Book")) {
7:
8:          foreach ($books->childNodes  as $theBook) {
9:              if (($theBook->nodeType == 1) &&
10:             ($theBook->nodeName == "Title")) {
11:                 $theBookTitle = $theBook->textContent;
12:             }
13:
14:             if (($theBook->nodeType == 1) &&
15:             ($theBook->nodeName == "Author")) {
16:                 $theBookAuthor = $theBook->textContent;
17:             }
18:
19:             if (($theBook->nodeType == 1) &&
20:             ($theBook->nodeName == "PublishingInfo")) {
21:
22:                 foreach ($theBook->childNodes as $thePublishingInfo) {
23:                     if (($thePublishingInfo->nodeType == 1) &&
24:                     ($thePublishingInfo->nodeName == "PublisherName")) {
25:                         $theBookPublisher = $thePublishingInfo->textContent;
26:                      }
27:
28:                     if (($thePublishingInfo->nodeType == 1) &&
29:                     ($thePublishingInfo->nodeName == "PublishedYear")) {
30:                          $theBookPublishedYear =
31:                             $thePublishingInfo->textContent;
32:                      }
33:                 }
34:             }
35:         }
36:
37:         echo "
38:         <p><em>".$theBookTitle."</em>
39:         by ".$theBookAuthor."<br/>
40:         published by ".$theBookPublisher." in ".$theBookPublishedYear."</p>";
41:
42:         unset($theBookTitle);
43:         unset($theBookAuthor);
44:         unset($theBookPublisher);
45:         unset($theBookPublishedYear);
46:     }
47: }
48: ?>


Line 2 creates a new DOM document, and line 3 loads the contents of books.xml into this document. The document tree is now accessible through $dom, as you can see in later lines. Line 5 begins the master loop through the document tree, as it places each node of the document into an array called $books.

Line 6 looks for an element called Book, and processing continues if it finds one. Remember, the <Book></Book> tag pair surrounds each entry for a book in the books.xml file. If processing continues, line 8 gathers all the child nodes into an array called $theBook, and the if statements in lines 9–12 and 14–17 look for specific nodes called Title and Author, respectively, and place the values into the variables $theBookTitle and $theBookAuthor for later use.

Line 19 begins a similar if statement, but because this line looks for a node called Publishing Info and you know that the <PublishingInfo></PublishingInfo> tag pair contains its own set of child nodes, another looping construct is needed to obtain the information in the next level of data. On line 22, child nodes are found and placed in the array called $thePublishingInfo, and then if statements in lines 23–26 and lines 28–32 look for specific nodes called PublisherName and PublishedYear, respectively, and place the values into the variables $theBookPublisher and$theBookPublishedYear for later use.

After the loop created in line 8 is closed in line 35, lines 37–40 echo a marked-up string to the browser, using values stored in $theBookTitle, $theBookAuthor, $theBookPublisher, and $theBookPublishedYear variables. After these values are used, they are unset in lines 42–45, and the loop continues for the next Book entry in the document tree.

Save this listing as domexample.php and place it in the document root of your web server. When viewed through your web browser you should see something like Figure 28.1.

image

Figure 28.1 Text extracted and displayed using DOM functions.

For a complete listing of all DOM-related classes, methods, and related functions in PHP, visit the PHP Manual at http://www.php.net/dom.

In the next section, you use the same books.xml file, but retrieve and display its values using the SimpleXML family of functions rather than DOM.

Accessing XML in PHP Using SimpleXML Functions

SimpleXML is enabled by default in PHP5 and requires no additional installation or configuration steps. It lives up to its description in the PHP Manual of being “a very simple and easily usable toolset to convert XML” while still being powerful.

Unlike the DOM family of functions, there are only a few SimpleXML functions and methods. The most basic SimpleXML function parses the XML data into an object that you can directly access and manipulate without SimpleXML-specific functions to do so (in other words, as you would work with any object). The first function you need to know about is simplexml_load_file(), which loads a file and creates an object out of the data:

$object_with_data = simplexml_load_file("somefile.xml");

Listing 28.2 uses a short bit of code to create a SimpleXML object and then displays the hierarchy of the data stored in the object.

Listing 28.2 Load and Display Data Using SimpleXML


1: <?php
2: $theData = simplexml_load_file("books.xml");
3: echo "<pre>";
4: print_r($theData);
5: echo "</pre>";
6: ?>


Line 2 uses simple_load_file() to load the contents of books.xml into an object called $theData. In line 4, the print_r() function outputs a human-readable version of the data stored in the object, surrounded by the <pre></pre> tag pair.

Save this listing as simplexml_dump.php and place it in the document root of your web server. When viewed through your web browser, you should see something like Figure 28.2.

image

Figure 28.2 Data dumped from a SimpleXML object.

Dumping out data is not all that spectacular, but it does show you the structure of the object, which in turn lets you know how to access the data in a hierarchical fashion. For instance, the output of simplexml_dump.php shows the entry for a book:

[0] => SimpleXMLElement Object
(
     [Title] => A Very Good Book
     [Author] => Jane Doe
     [PublishingInfo] => SimpleXMLElement Object
     (
          [PublisherName] => Sams Publishing
          [PublisherCity] => Indianapolis
          [PublishedYear] => 2012
     )
)

To reference this record directly, you use the following:

$theData->Book

You access the elements in the record like this:

• $theData->Book->Title for the Title

• $theData->Book->Author for the Author

• $theData->Book->PublishingInfo->PublisherName for the Publisher Name

• $theData->Book->PublishingInfo->PublisherCity for the Publisher City

• $theData->Book->PublishingInfo->PublishedYear for the Published Year

But because you likely would want to loop through all the records and not just the first one, the references to the data are a little different, as you can see in Listing 28.3.

Listing 28.3 Through an XML Document Using SimpleXML


1:  <?php
2:  $theData = simplexml_load_file("books.xml");
3:
4:  foreach($theData->Book as $theBook) {
5:      $theBookTitle = $theBook->Title;
6:      $theBookAuthor = $theBook->Author;
7:      $theBookPublisher = $theBook->PublishingInfo->PublisherName;
8:      $theBookPublisherCity = $theBook->PublishingInfo->PublisherCity;
9:      $theBookPublishedYear = $theBook->PublishingInfo->PublishedYear;
10:
11:     echo "
12:     <p><em>".$theBookTitle."</em>
13:     by ".$theBookAuthor."<br/>
14:     published by ".$theBookPublisher." (".$theBookPublisherCity.")
15:     in ".$theBookPublishedYear."</p>";
16:
17:     unset($theBookTitle);
18:     unset($theBookAuthor);
19:     unset($theBookPublisher);
20:     unset($theBookPublishedYear);
21: }
22: ?>


In line 2, the contents of books.xml are loaded using simple_load_file() into an object called $theData. In line 4, the contents of $theData->Book, which is to say all the individual records, are put into an array called $theBook. Lines 5–9 gather the value of specific elements, beginning at the level of $theBook, and these values are output in lines 11–15. Lines 17–20 unset the value of the variables for the next pass through the loop.

Save this listing as simplexmlexample.php and place it in the document root of your web server. When viewed through your web browser, you should see something like Figure 28.3.

image

Figure 28.3 Text extracted and displayed using SimpleXML functions.

Note that the output looks quite similar to the output of Listing 28.1, and in fact the SimpleXML example is just a simpler (or more concise) version of the DOM-based example you saw earlier.

For more information about the SimpleXML functions in PHP, visit the PHP Manual at http://www.php.net/simplexml.

Working with JSON

JSON, which stands for JavaScript Object Notation, is another data interchange format (like XML) that is simple for both humans and machines to read and write. Because of this simplicity, JSON output has become increasingly popular and may one day eclipse (if it hasn’t already) the use of XML output for data exposed via application programming interfaces (APIs).

Using JSON, you can have collections of name/value pairs (which take the form of objects) and you can have an ordered list of values (which take the form of an array). If you were to redo an entry in the books.xml file from earlier in the chapter, into JSON format, it might look like the following snippet:

{
 "book":[
   {
     "title":"A Very Good Book",
     "author":"Jane Doe",
     "publisher_name":"Sams Publishing",
     "publisher_city":"Indianapolis",
     "publisher_year":"2012"
   }
  ]
}

Adding two other entries would give you some JSON-formatted data like that shown in Listing 28.4.

Listing 28.4 JSON-Formatted Books Data


1:  {
2:     "book":[
3:       {
4:          "title":"A Very Good Book",
5:          "author":"Jane Doe",
6:          "publisher_name":"Sams Publishing",
7:          "publisher_city":"Indianapolis",
8:          "publisher_year":"2012"
9:       },
10:      {
11:         "title":"An Academic Book",
12:         "author":"Anne Smith",
13:         "publisher_name":"University of California Press",
14:         "publisher_city":"Berkeley",
15:         "publisher_year":"2011"
16:      },
17:      {
18:         "title":"Some Fluff Fiction",
19:         "author":"Jimbo Jones",
20:         "publisher_name":"Fluffy Press",
21:         "publisher_city":"New York",
22:         "publisher_year":"2009"
23:      }
24:   ]
25: }



Note

To learn much more about JSON, see http://www.json.org. A useful tool when creating JSON for the first time is the JSON Parser at http://json.parser.online.fr/, which enables you to paste text and find (and fix) syntax errors.


Once you have some JSON, you can use the PHP json_decode() function to take the well-formatted data and turn it into an object, just as you did with the SimpleXML example earlier. Listing 28.5 uses a short bit of code to load some JSON data and display the hierarchy of the data stored in the object.

Listing 28.5 Load and Display JSON Data


1: <?php
2: $theData = file_get_contents("books.xml");
3: echo "<pre>";
4: print_r(json_decode($theData));
5: echo "</pre>";
6: ?>


Line 2 uses file_get_contents() to load the contents of books.txt (a text file containing the JSON data shown in Listing 28.4) into an object called $theData. In line 4, the print_r() function outputs a human-readable version of the decoded JSON data stored in the object, surrounded by the<pre></pre> tag pair.

Save this listing as json_dump.php and place it in the document root of your web server. When viewed through your web browser, you should see something like Figure 28.4.

image

Figure 28.4 Data formerly in JSON format.

To create a formatted version of this data, you access the elements in the record like this:

• $theData->book->title for the Title

• $theData->book->author for the Author

• $theData->book->publisher_name for the Publisher Name

• $theData->book->publisher_city for the Publisher City

• $theData->book->publisher_year for the Published Year

As mentioned previously, of the most popular ways of consuming JSON is as output from APIs. The following URL contains an example API endpoint for accessing the Google search API with two variables: v for the API version (1.0 in this case) and u for the search term (PHP in this case):

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=PHP

Change line 2 of Listing 28.5 to the following and save the file as json_google_dump.php and place it in the document root of your web server:

$theData = file_get_contents("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=PHP");

When you view this script in your web browser, you should see something like Figure 28.5.

image

Figure 28.5 JSON output from a Google search.

Once you know the basics of working with JSON data, which—after you have used the json_decode() function in PHP—is really just the basics of working with objects, you have all the data on the Internet in the palm of your hands. Well, not all of it, but a fair portion of it. For a comprehensive list and more information about APIs, visit ProgrammableWeb at http://www.programmableweb.com/.

Summary

This brief chapter introduced you to two sets of PHP functions used to manipulate XML (DOM functions and SimpleXML) and JSON. In addition to a brief overview of both topics, you saw examples of displaying information stored in XML or JSON data using these functions. The purpose of this chapter was just to introduce you to the concept of working with XML and JSON using PHP. If you are interested in using XML and PHP together, you might also want to look into AJAX (Asynchronous JavaScript and XML), which often uses PHP to produce or modify XML or JSON data before it is displayed to the client.

Q&A

Q. Why would I use XML to store data when MySQL is a great (and free) database?

A. XML can be used not only as a storage method, but also as an intermediary for data transfer. For instance, you might use XML in conjunction with a database, by extracting data and sending it to a third-party function that only interprets XML data. In addition, although it is true that MySQL is great (and free), some users might not have access to MySQL or any other database, in which case XML files can play the role of a database system.

Q. How do I create JSON from arrays and objects created in other parts of my scripts?

A. If you want to produce JSON output, you just use the json_encode() function, which takes your existing arrays and objects and puts them into JSON format. See http://www.php.net/json_encode for more information.

Workshop

The workshop is designed to help you review what you’ve learned and begin putting your knowledge into practice.

Quiz

1. What should be the opening line of a valid XML document?

2. Does the following code put your XML content into a new DOM document?

$dom = new DomDocument;

3. What code would be used to load the contents of a file called my.xml into a SimpleXML object called $myData?

Answers

1. <?xml version="1.0">

2. No, it just creates a DOM document referenced as $dom. To load the content you must also use something like this:

$dom->load("books.xml");

3. $myData = simplexml_load_File("my.xml");

Activities

1. Create a script that formats the JSON-encoded books data to produce the same result in the browser as shown in Listings 28.1 and 28.3.

2. Make an array of data, either manually or through a script that retrieves information from a database, and then encode it in JSON format. If you then create another script that loads that output and formats it for use, you will have created and used an API for your own application.