Zend PHP 5 Certification Study Guide (2014)

Data Formats and Types

This chapter will explore JSON, Dates and Times, and XML. In addition to providing a comprehensive overview of ext/json and ext/datetime, we will discuss how to read, create, and manipulate XML data using SimpleXML, the DOM functions, and the XML Path Language (XPath).

JSON

While XML used to be the format of choice for communication between disparate systems, over the last couple of years it has been surpassed by JSON: JavaScript Object Notation. JSON is a simple, compact format that (as its name implies) hails from JavaScript and is now supported by most languages.

After starting out primarily for client-side asynchronous HTTP requests, JSON is now the default—and oftentimes the only—supported output for many Web services.

PHP has had support for JSON encoding and decoding since 5.2; however, this functionality has been incrementally improved upon since then.

Encoding Data

To encode JSON, simply use the json_encode() function.

All string data must be UTF-8 encoded for json_encode() to work properly.

PHP will automatically output numerically indexed arrays as JSON arrays, and otherwise indexed arrays and objects, as JSON objects. For example:

$array = ["foo", "bar", "baz"];

echo json_encode($array);

will return:

["foo","bar","baz"]

Using string keys:

$array = ["one" => "foo", "two" => "bar", "three" => "baz"];

echo json_encode($array);

will return:

{"one":"foo","two":"bar","three":"baz"}

json_encode() also supports numerous options, most of which were added in PHP 5.3:

Option

Description

JSON_HEX_TAG

Convert all < and > to their hex equivalents (\u003C and \u003E respectively).

JSON_HEX_AMP

Convert all & to their hex equivalents (\u0026).

JSON_HEX_APOS

Convert all apostrophes (') to their hex equivalents (\u0027).

JSON_HEX_QUOT

Convert all straight double quotes (") to their hex equivalent (\u0022).

JSON_FORCE_OBJECT

Outputs an object instead of an array even when numeric keys are used. This is particularly useful for empty arrays when the recipient is expecting an object.

JSON_NUMERIC_CHECK

Encodes numeric strings as numbers. Added in PHP 5.3.3.

JSON_BIGINT_AS_STRING

Encodes large integers as strings. Added in PHP 5.4.

JSON_PRETTY_PRINT

Format JSON using whitespace to make it easier to read. Added in PHP 5.4.

JSON_UNESCAPED_SLASHES

Don’t escape /. Added in PHP 5.4.

JSON_UNESCAPED_UNICODE

Do not convert Unicode characters to escape sequences (\uXXXX). Added in PHP 5.4.

These options are passed as a bitmask, so if you want to force objects, encode numeric strings and numbers, and pretty-print, you should OR them together like so:

Listing 8.1: JSON options

$array = [

    "name" => "Davey Shafik",

    "age" => "30",

];

$options = JSON_PRETTY_PRINT |

           JSON_NUMERIC_CHECK |

           JSON_FORCE_OBJECT;

echo json_encode($array, $options);

This will output:

{

    "name": "Davey Shafik",

    "age": 30

}

In PHP 5.5, a third parameter, depth was added; this limits how many nested data structures will be encoded.

Encoding Objects

With PHP 5.4, a new interface, JsonSerializable, was added that allows you to control what data is encoded when json_encode() is called on your object.

Listing 8.2: Implementing JsonSerializable

class User implements JsonSerializable

{

    public $first_name;

    public $last_name;

    public $email;

    public $password;

    public function jsonSerialize() {

        return [

            "name" => $this->first_name

                      . ' ' . $this->last_name,

            "email_hash" => md5($this->email),

        ];

    }

}

Now, when we call json_encode() on an instance of your User class, we get our custom dataset back. Given a user instance that looks like this:

class User#188 (4) {

  public $first_name =>

  string(5) "Davey"

  public $last_name =>

  string(6) "Shafik"

  public $email =>

  string(18) "davey@example.com"

  public $password =>

  string(60) "$2y$10$TeDnXI3Oz0P5Bgv9sADE9.v7SIGESaoWhFe28ctpVsU47f/BAtFFa"

}

when we pass this to json_encode(), we get a result like this:

{

    "name": "Davey Shafik",

    "email_hash": "c1c9ccab72904fe94c855dadf5c234ff"

}

This allows you to manipulate what data is encoded, and in what format.

Decoding Data

Decoding JSON is just as easy, using the json_decode() function:

$json = '{ "name": "Davey Shafik", "age": 30 }';

$data = json_decode($json);

This will create an object that is an instance of stdClass, like so:

class stdClass#1 (2) {

  public $name =>

  string(12) "Davey Shafik"

  public $age =>

  int(30)

}

If you want to force json_decode to return an array, just pass true for the second argument assoc:

$json = '{ "name": "Davey Shafik", "age": 30 }';

$data = json_decode($json, true);

This will give you an associative array, like so:

array(2) {

  'name' =>

  string(12) "Davey Shafik"

  'age' =>

  int(30)

}

Additionally, you can specify depth and options as the third and fourth arguments, respectively. These work the same as for json_encode(), with the exception that the JSON_BIGINT_AS_STRING is the only option supported.

Dates and Times

Dates and times are some of the trickiest things to work with. Timezones, daylight saving, leap years…and let’s not even talk about doing date math!

Beginning with PHP 5.1, a default timezone identifier should be set, either using date_default_timezone_set($timezone_identifier) in your scripts or the INI setting date.timezone.

However, PHP 5.2 introduced the DateTime class, which greatly simplifies working with dates and times. With PHP 5.5, we also saw the introduction of DateTimeImmutable. Unlike DateTime, this class will return a new instance of DateTimeImmutable when making modifications, rather than modifying and returning itself ($this).

DateTime is recommended over older functions for working with dates and times, such as date(), mktime(), strtotime(), time(), etc.

The only thing that the DateTime extension doesn’t handle (at least as of PHP 5.6) is timestamps with microseconds. For these, microtime() should still be used.

Using DateTime is easy. The constructor acts like strtotime(), so it will accept most date/time-like values, and it defaults to the current date/time:

Listing 8.3: Datetime construction

// Current Time

$date = new \DateTime();

// Current Time

$date = new \DateTime("now");

// June 18th, 2014, at 2:05pm,

// Eastern Time (Daylight savings is taken in account)

$date = new \DateTime("2014-06-18 14:05 EST");

// Current time, yesterday

$date = new \DateTime("yesterday");

 // Current time, two days ago

$date = new \DateTime("-2 days");

// Current time, same day last week

$date = new \DateTime("last week");

// Current time, same day, 3 weeks ago

$date = new \DateTime("-3 week");

Optionally, you can specify a timezone as the second argument. However, this will not override any timezone specified in the date/time string. The timezone is represented by a DateTimeZone object:

$timezone = new \DateTimeZone("Europe/London");

// The current default local time, adjusted to the

// specified timezone, 3 weeks ago

$date = new \DateTime("3 weeks ago", $timezone);

For example, if the current default timezone is set to American/New_York, or five hours behind Europe/London, and the current date and time is 7:05 p.m. on June 18, it will first find the date three weeks prior (May 28) and then adjust for the timezone, returning 12:05 a.m. on May 29.

If you wish to change the date/time of the current DateTime object, you can simply use one of the following methods:

Method

Description

DateTime->setDate()

Set the date, explicitly passing year/month/day

DateTime->setISODate()

Set the date, passing in the year, and week/day offsets

DateTime->setTime()

Set the time, passing in the hour, minutes and (optionally) seconds

DateTime->setTimestamp()

Set the date and time using a UNIX timestamp

DateTime->setTimezone()

Set the timezone by passing a DateTimeZone object

Additionally, you can change the date/time by passing relative date/time strings in to DateTime->modify() (e.g., -3 week, or +32 hours 16 minutes).

Retrieving a Date/Time

To retrieve the date/time (usually to echo it to the client) from your object, use the DateTime->format() method. This method accepts the same values as the traditional date() function and returns a string. To assist you, there are also a number of constants that represent common date formats.

You can read the complete set of formatting options for outputting date strings at http://php.net/date.

Constant

Description

DateTime::ATOM

Y-m-d\TH:i:sP

DateTime::COOKIE

l, d-M-Y H:i:s T

DateTime::ISO8601

Y-m-d\TH:i:sO

DateTime::RFC822

D, d M y H:i:s O

DateTime::RFC850

l, d-M-y H:i:s T

DateTime::RFC1036

D, d M y H:i:s O

DateTime::RFC1123

D, d M Y H:i:s O

DateTime::RFC2822

D, d M Y H:i:s O

DateTime::RFC3339

Y-m-d\TH:i:sP

DateTime::RSS

D, d M Y H:i:s O

DateTime::W3C

Y-m-d\TH:i:sP

Handling Custom Formats

Since PHP 5.3, it has also been much easier to handle dates that the strtotime() handler cannot manage. Using DateTime::createFromFormat(), you can specify the pattern on which to parse the input explicitly:

$ambiguousDate = '10/11/12';

$date = \DateTime::createFromFormat("d/m/y", $ambiguousDate);

Prefixing DateTime with a \\ ensures we’re using the class from the global namespace.

This will yield a correctly parsed date (DateTime):

object(DateTime)#1 (3) {

  ["date"]=>

  string(26) "2012-11-10 09:05:13.000000"

  ["timezone_type"]=>

  int(3)

  ["timezone"]=>

  string(16) "America/New_York"

}

Note that createFromFormat uses the current local time (including timezone) when none is specified.

DateTime Comparisons

One of the great features of DateTime is smart comparisons. For example, if we create two DateTime objects with the same date/time in different timezones, they will be equal:

Listing 8.4: Comparing dates

$date = new \DateTime("2014-05-31 1:30pm EST");

$tz = new \DateTime("Europe/Amsterdam");

$date2 = new \DateTime("2014-05-31 8:30pm", $tz);

if ($date == $date2) {

    echo "These dates are the same date/time!";

}

In this example, the condition will be true. The DateTime class handles the tricky conversions needed to compare across timezones for us.

DateTime Math

As already touched on, you can use DateTime->modify() to perform date math:

$date = new \DateTime();

$date->modify("+1 month");

You can also add or subtract a specific number of days, months, years, hours, minutes, or seconds from a DateTime object using DateTime->add() or DateTime->sub().

Both of these functions also accept an instance of DateInterval. There are two ways to create a DateInterval.

The first is to instantiate it by passing an interval spec to the constructor. The interval spec always starts with the letter P and then lists the number of a given time period, followed by a unit identifier. These must be listed from largest first: in other words, years (Y), months (M), days (D) or weeks (W), hours (H), minutes (M), and seconds (S). If you have a time portion of your spec, you should separate it from the date portion with the letter T.

Weeks are converted to days, so these two types cannot be combined in a single spec.

Here are some example specs:

·        P38D — 38 Days

·        P1Y3M4D — 1 Year, 3 Months, 4 Days

·        PT45M — 45 Minutes

·        P1WT1H — 1 Week, 1 Hour

We can then pass this to DateTime->sub() or DateTime->add() to decrement or increment the date/time by the given interval.

Listing 8.5: Working with intervals

$date = new \DateTime();

$interval = new \DateInterval('P1Y3M4DT45M');

// Add 1 year, 3 months, 4 days, 45 minutes

$date->add($interval);

$date = new \DateTime();

// Subtract 1 year, 3 months, 4 days, 45 minutes

$date->sub($interval);

Differences between Dates

We can also perform a “diff” between two DateTime objects, using DateTime->diff(). This will return a DateInterval object with properties that denote the difference.

Listing 8.6: Calculating date differences

$davey = new \DateTime(

    "1984-05-31 00:00",

    new \DateTimeZone("Europe/London")

);

$gabriel = new \DateTime(

    "2014-04-07 00:00",

    new \DateTimeZone("America/New_York")

);

$davey->diff($gabriel);

This will return a DateInterval that shows there is a difference of 29 years, 10 months, 7 days, and 5 hours (the timezone difference), or 10,903 days, between my own birthday and my son’s:

class DateInterval#178 (15) {

  public $y =>

  int(29)

  public $m =>

  int(10)

  public $d =>

  int(7)

  public $h =>

  int(5)

  public $i =>

  int(0)

  public $s =>

  int(0)

  public $invert =>

  int(0)

  public $days =>

  int(10903)

  ...

}

If we were to inverse the diff, passing $davey in to $gabriel->diff(), then the invert key will be set to 1 to show that it is a negative difference.

Extensible Markup Language (XML)

One of the most significant changes made in PHP 5 was the way in which PHP handles XML data. The underlying code in the PHP engine was transformed and re-architected to provide a seamless set of XML parsing tools that work together and comply with World Wide Web Consortium (W3C) recommendations. Whereas PHP 4 used a different code library to implement each XML tool, PHP 5 takes advantage of a standardized single library: the Gnome XML library (libxml2). In addition, PHP 5 introduces many new tools to make the task of working with XML documents simpler and easier.

XML is a subset of Standard Generalized Markup Language (SGML); its design goal is to be as powerful and flexible as SGML, with less complexity. If you’ve ever worked with Hypertext Markup Language (HTML), then you’re familiar with an application of SGML. If you’ve ever worked with Extensible Hypertext Markup Language (XHTML), then you’re familiar with an application of XML, since XHTML is a reformulation of HTML 4 as XML.

It is not within the scope of this book to provide a complete primer on XML. However, we will cover some basic XML and XPath concepts and terms.

In order to understand the concepts that follow, it is important that you know some basic principles of XML and how to create well-formed and valid XML documents. In fact, it is now important to define a few terms before proceeding:

·        Entity: An entity is a named unit of storage. In XML, entities can be used for a variety of purposes, such as providing convenient “variables” to hold data or to represent characters that cannot normally be part of an XML document (e.g., angular brackets and ampersand characters). Entity definitions can either be embedded directly in an XML document or included from an external source.

·        Element: A data object that is part of an XML document. Elements can contain other elements or raw textual data, as well as feature zero or more attributes.

·        Document Type Declaration: A set of instructions that describes the accepted structure and content of an XML file. Like entities, Document Type Declarations (DTDs) can either be externally defined or embedded.

·        Well-formed: An XML document is considered well-formed when it contains a single root level element, all tags are opened and closed properly, and all entities (<, >, &, ’, ") are escaped properly. Specifically, it must conform to all “well-formedness” constraints as defined by the W3C XML recommendation.

·        Valid: An XML document is valid when it is both well-formed and obeys a referenced DTD. An XML document can be well-formed and not valid, but it can never be valid and not well-formed.

A well-formed XML document can be as simple as:

<?xml version="1.0"?>

<message>Hello, World!</message>

This example conforms fully to the definition described earlier: it has at least one element, and that element is delimited by start and end tags. However, it is not valid, because it doesn’t reference a DTD. Here’s an example of a valid version of the same document:

<?xml version="1.0"?>

<!DOCTYPE message SYSTEM "message.dtd">

<message>Hello, World!</message>

In this case, an external DTD is loaded from local storage, but the declarations may also be listed locally:

<?xml version="1.0"?>

<!DOCTYPE message [

  <!ELEMENT message (#PCDATA)>

]>

<message>Hello, World!</message>

In practice, most XML documents you work with will not contain a DTD—and, therefore, will not be valid. In fact, the DTD is not a requirement except to validate the structure of a document, which may not even be a requirement for your particular needs. However, all XML documents must be well-formed for PHP’s XML functionality to properly parse them, as XML itself is a strict language.

Creating an XML Document

Unless you are working with a DTD or XML Schema Definition (XSD), which provides an alternate method for describing a document, creating XML is a free-form process, without any rigid constraints except those that define a well-formed document. The names of tags and attributes, and the order in which they appear, are all up to the creator of the XML document.

First and foremost, XML is a language that provides the means for describing data. Each tag and attribute should consist of a descriptive name for the data contained within it. For example, in XHTML, the <p> tag is used to describe paragraph data, while the <td> tag describes table data and the<em> tag describes data that is to be emphasized. In the early days of HTML and text-based Web browsers, HTML tags were intended merely to describe data, but as Web browsers became more sophisticated, HTML was used more for layout and display than as a markup language. For this reason, HTML was reformulated as an application of XML, in the form of XHTML. While many continue to use XHTML as a layout language, its main purpose is to describe types of data. Cascading style sheets (CSS) are now the preferred method for defining the layout of XHTML documents.

Since the purpose of XML is to describe data, it lends itself well to the transportation of data between disparate systems. There is no need for any of the systems that are parties to a data exchange to share the same software packages, encoding mechanisms, or byte order. As long as both systems know how to read and parse XML, they can talk to each other. To understand how to create an XML document, we will discuss one such system, which stores information about books. For the data, we have plucked five random books from our bookshelf. Here they are:

Title

Author

Publisher

ISBN

The Moon Is a Harsh Mistress

R. A. Heinlein

Orb

0312863551

Fahrenheit 451

R. Bradbury

Del Rey

0345342968

The Silmarillion

J.R.R. Tolkien

G. Allen & Unwin

0048231398

1984

G. Orwell

Signet

0451524934

Frankenstein

M. Shelley

Bedford

031219126X

This data may be stored in any number of ways on our system. For this example, assume that it is stored in a database and that we want other systems to access it using a Web service. As we’ll see later on, PHP will do most of the legwork for us.

From the table, it’s clear what types of data need to be described. We have the title, author, publisher, and ISBN columns, which make up a book. Therefore, these will form the basis of the names of the elements and attributes of the XML document. Keep in mind, though, that while you are free to choose to name the elements and attributes of your XML data model, there are a few commonly accepted XML data design guidelines.

One of the most frequently asked questions regarding the creation of an XML data model is when to use elements and when to use attributes. In truth, this doesn’t matter. There is no rule in the W3C recommendation for what kinds of data should be encapsulated in elements or attributes. However, as a general design principle, it is best to use elements to express essential information intended for communication, while attributes can express information that is peripheral or helpful only to process the main communication. In short, elements contain data, while attributes contain metadata. Some refer to this as the “principle of core content.”

To represent the book data in XML, this design principle means that the author, title, and publisher data form elements of the same name, while the ISBN, which we’ll consider peripheral data for the sake of this example, will be stored in an attribute. Thus, our elements are as follows: book,title, author, and publisher. The sole attribute of the book element is isbn. The XML representation of the book data is shown in the following listing:

Listing 8.7: Book XML

<?xml version="1.0"?>

<library>

  <book isbn="0345342968">

    <title>Fahrenheit 451</title>

    <author>R. Bradbury</author>

    <publisher>Del Rey</publisher>

  </book>

  <book isbn="0048231398">

    <title>The Silmarillion</title>

    <author>J.R.R. Tolkien</author>

    <publisher>G. Allen & Unwin</publisher>

  </book>

  <book isbn="0451524934">

    <title>1984</title>

    <author>G. Orwell</author>

    <publisher>Signet</publisher>

  </book>

  <book isbn="031219126X">

    <title>Frankenstein</title>

    <author>M. Shelley</author>

    <publisher>Bedford</publisher>

  </book>

  <book isbn="0312863551">

    <title>The Moon Is a Harsh Mistress</title>

    <author>R. A. Heinlein</author>

    <publisher>Orb</publisher>

  </book>

</library>

You’ll notice that library is the root element, but this might just as easily have been books. What’s important is that it is the main container. All well-formed XML documents must have a root element; the library element contains all the book elements. This list could contain any number of books by simply having an element for each book; this sample, however, contains all data necessary for the sample presented earlier.

SimpleXML

Working with XML documents in PHP 4 was a difficult and confusing process involving many lines of code and a library that was anything but easy to use. In PHP 5.0, the process was greatly simplified by the introduction of a number of different libraries—all of which make heavy use of object orientation. One such library is SimpleXML, which, true to its namesake, provides an easy way to work with XML documents.

SimpleXML is not a robust tool for working with XML: it sacrifices the ability to satisfy complex requirements in favor of providing a simplified interface geared mostly towards reading and iterating through XML data. Luckily, because all of PHP’s XML-handling extensions are based on the same library, you can juggle a single XML document back and forth among them, depending on the level of complexity you are dealing with.

Many of the examples in the coming pages will rely on the book example we presented above; where we access data in a file, we’ll assume that it has been saved with the name library.xml.

Parsing XML Documents

All XML parsing is done by SimpleXML internally using the DOM parsing model. There are no special calls or tricks you need to perform to parse a document. The only restraint is that the XML document must be well-formed, or SimpleXML will emit warnings and fail to parse it. Also, while the W3C has published a recommended specification for XML 1.1, SimpleXML supports only version 1.0 documents. Again, SimpleXML will emit a warning and fail to parse the document if it encounters an XML document with a 1.1 version.

Because SimpleXML loads the entire XML data into memory when parsing , it is not suitable for very large XML documents.

All objects created by SimpleXML are instances of the SimpleXMLElement class. Thus, when parsing a document or XML string, you will need to create a new SimpleXMLElement; there are several ways to do this. The first two ways involve the use of procedural code, or functions, that returnSimpleXMLElement objects. One such function, simplexml_load_string(), loads an XML document from a string, while the other, simplexml_load_file(), loads an XML document from a path. The following example illustrates the use of each, pairing file_get_contents() with simplexml_load_string(); however, in a real-world scenario, it would make much more sense to simply use simple_xml_load_file():

// Load an XML string

$xmlstr = file_get_contents('library.xml');

$library = simplexml_load_string($xmlstr);

// Load an XML file

$library = simplexml_load_file('library.xml');

As SimpleXML was designed to work in an object-oriented (OOP) environment, it also supports an OOP-centric approach to loading a document. In the following example, the first method loads an XML string into a SimpleXMLElement, while the second loads an external document, which can be a local file path or a valid URL (if allow_url_fopen is set to “On” in php.ini, as explained in the Security chapter).

// Load an XML string

$xmlstr = file_get_contents('library.xml');

$library = new SimpleXMLElement($xmlstr);

// Load an XML file

$library = new SimpleXMLElement('library.xml', NULL, true);

Note here that the second method also passes two additional arguments to SimpleXMLElement's constructor. The second argument optionally allows the ability to specify additional libxml parameters that influence the way the library parses the XML. It is not necessary to set any of these parameters at this point, so we left it to NULL. The third parameter is important, though, because it informs the constructor that the first argument represents the path to a file, rather than a string that contains the XML data itself.

Accessing Children and Attributes

Now that you have loaded an XML document and have a SimpleXMLElement object, you will want to access child nodes and their attributes. Again, SimpleXML provides several methods for accessing these, well, simply.

The first method for accessing children and attributes is the simplest method and is one of the reasons SimpleXML is so attractive. When SimpleXML parses an XML document, it converts all its XML elements, or nodes, to properties of the resulting SimpleXMLElement object. In addition, it converts XML attributes to an associative array that may be accessed from the property to which they belong. Each of these properties is, in turn, also an instance of SimpleXMLElement, thus making it easier to access child nodes regardless of their nesting level.

Here’s an example:

Listing 8.8: SimpleXML usage

$library = new SimpleXMLElement('library.xml', NULL, true);

foreach ($library->book as $book) {

  echo $book['isbn'] . "\n";

  echo $book->title . "\n";

  echo $book->author . "\n";

  echo $book->publisher . "\n\n";

}

The major drawback of this approach is that it is necessary to know the names of every element and attribute in the XML document. Otherwise, it is impossible to access them. However, there are times when a provider may change the structure of their file so that, while the overall format remains the same, your code will be unable to access the proper data if you are forced to hard-code the name and nesting level of each node. Thus, SimpleXML provides a means to access children and attributes without needing to know their names. In fact, SimpleXML will even tell you their names.

The following example illustrates the use of SimpleXMLElement::children() and SimpleXMLElement::attributes(), as well as SimpleXMLElement::getName() (introduced in PHP 5.1.3), for precisely this purpose:

Listing 8.9: Iterating with SimpleXML

foreach ($library->children() as $child) {

  echo $child->getName() . ":\n";

  // Get attributes of this element

  foreach ($child->attributes() as $attr) {

    echo '  ' . $attr->getName() . ': ' . $attr . "\n";

  }

  // Get children

  foreach ($child->children() as $subchild) {

    echo '  ' . $subchild->getName() . ': ' . $subchild . "\n";

  }

  echo "\n";

}

What this example doesn’t show is that you may also iterate through the children and attributes of $subchild, and so forth, using either a recursive function or an iterator (explained in the Elements of Object-Oriented Design chapter). It is possible to access every single child and attribute at every depth of an XML document without knowing the structure in advance.

XPath Queries

The XML Path Language (XPath) is a W3C standardized language that is used to access and search XML documents. It is used extensively in Extensible Stylesheet Language Transformations (XSLT) and forms the basis of XML Query (XQuery) and XML Pointer (XPointer). Think of it as a query language for retrieving data from an XML document. XPath can be very complex, but with this complexity comes a lot of power, which SimpleXML leverages with the SimpleXMLElement::xpath() method.

Using SimpleXMLElement::xpath(), you can run an Xpath query on any SimpleXMLElement object. If used on the root element, the query will search the entire XML document. If used on a child, it will search the child and any children it may have. The following illustrates an XPath query on both the root element and a child node. XPath returns an array of SimpleXMLElement objects—even if only a single element is returned.

Listing 8.10: XPath queries in SimpleXML

// Search the root element

$results = $library->xpath('/library/book/title');

foreach ($results as $title) {

  echo $title . "\n";

}

// Search the first child element

$results = $library->book[0]->xpath('title');

foreach ($results as $title) {

  echo $title . "\n";

}

Modifying XML Documents

Prior to PHP 5.1.3, SimpleXML had no means of adding elements and attributes to an XML document. True, it was possible to change the values of attributes and elements, but the only way to add new children and attributes was to export the SimpleXMLElement object to DOM, add the elements and attributes using the latter, and then import the document back into SimpleXML. Needless to say, this process was anything but simple. PHP 5.1.3, however, introduced two new methods to SimpleXML that now give it the power it needs to create and modify XML documents:SimpleXMLElement::addChild() and SimpleXMLElement::addAttribute().

The addChild() method accepts three parameters, the first of which is the name of the new element. The second is an optional value for this element, and the third is an optional namespace to which the child belongs. Since the addChild() method returns a SimpleXMLElement object, you may store this object in a variable to which you can append its own children and attributes. The following example illustrates this concept:

Listing 8.11: Adding children in SimpleXML

$book = $library->addChild('book');

$book->addAttribute('isbn', '0812550706');

$book->addChild('title', "Ender's Game");

$book->addChild('author', 'Orson Scott Card');

$book->addChild('publisher', 'Tor Science Fiction');

header('Content-type: text/xml');

echo $library->asXML();

This script adds a new book element to the $library object, thus creating a new object that we store in the $book variable so that we can add an attribute and three children to it. Finally, in order to display the modified XML document, the script calls the asXML() method of $library SimpleXMLElement. Before doing so, though, it sets a Content-type header to ensure that the client (a Web browser in this case) knows how to handle the content.

Called without a parameter, the asXML() method returns an XML string. However, asXML() also accepts a file path as a parameter, which will cause it to save the XML document to the given path and return a Boolean value to indicate the operation’s success.

If a file with the same path already exists, a call to asXML() will overwrite it without warning (provided that the user account under which PHP is running has the proper permissions).

While SimpleXML provides the functionality for adding children and attributes, it does not provide the means to remove them. It is possible to remove child elements, though, using the following method:

$library->book[0] = NULL;

This only removes child elements and their attributes, however. It will not remove attributes from the element at the book level. Thus, the isbn attribute remains. You may set this attribute to NULL, but doing will only cause it to become empty and will not actually remove it. To effectively remove children and attributes, you must export your SimpleXMLElement to DOM (explained later in this chapter), where this more powerful functionality is possible.

Working With Namespaces

The use of XML namespaces allows a provider to associate certain elements and attribute names with namespaces identified by URIs. This qualifies the elements and attributes, avoiding any potential naming conflicts when two elements of the same name exist yet contain different types of data.

The library.xml document used thus far does not contain any namespaces—but suppose it did. For the purpose of example, it might look something like this:

Listing 8.12: library.xml

<?xml version="1.0"?>

<library xmlns="http://example.org/library"

    xmlns:meta="http://example.org/book-meta"

    xmlns:pub="http://example.org/publisher"

    xmlns:foo="http://example.org/foo">

  <book meta:isbn="0345342968">

    <title>Fahrenheit 451</title>

    <author>Ray Bradbury</author>

    <pub:publisher>Del Rey</pub:publisher>

  </book>

</library>

Since PHP 5.1.3, SimpleXML has had the ability to return all namespaces declared in a document and all namespaces used in a document, and register a namespace prefix used in making an XPath query. The first of these features is SimpleXMLElement::getDocNamespaces(), which returns an array of all namespaces declared in the document. By default, it returns only those namespaces declared in the root element referenced by the SimpleXMLElement object, but passing true to the method will cause it to behave recursively and return the namespaces declared in all children. Since our sample XML document declares four namespaces in the root element of the document, getDocNamespaces() returns four namespaces:

Listing 8.13: Returning document namespaces

$namespaces = $library->getDocNamespaces();

foreach ($namespaces as $key => $value) {

  echo "{$key} => {$value}\n";

}

/* outputs:

=> http://example.org/library

meta => http://example.org/book-meta

pub => http://example.org/publisher

foo => http://example.org/foo

*/

Notice that the foo namespace was listed but was never actually used. A call to SimpleXMLElement::getNamespaces() will return an array that only contains those namespaces that are actually used throughout the document. Like getDocNamespaces(), this method accepts a boolean value to turn on its recursive behavior.

Listing 8.14: Returning used namespaces

$namespaces = $library->getNamespaces(true);

foreach ($namespaces as $key => $value) {

  echo "{$key} => {$value}\n";

}

/* outputs:

=> http://example.org/library

meta => http://example.org/book-meta

pub => http://example.org/publisher

*/

DOM

The PHP 5 DOM extension sounds like it would be similar to the PHP 4 DOMXML extension, but it has undergone a complete transformation and is easier to use. Unlike SimpleXML, DOM can be cumbersome and unwieldy at times. However, this is a trade-off for the power and flexibility it provides. Since SimpleXML and DOM objects are interoperable, you can use the former for simplicity and the latter for power on the same document, with minimal effort.

Loading and Saving XML Documents

There are two ways to import documents into a DOM tree. The first is by loading them from a file:

$dom = new DomDocument();

$dom->load("library.xml");

Alternatively, you can load a document from a string—which is handy when using REST Web services:

$dom = new DomDocument();

$dom->loadXML($xml);

You, can also import HTML files and strings by calling the DomDocument::loadHTMLFile() and DomDocument::loadHTML() methods, respectively.

Just as simply, you can save XML documents using one of DomDocument::save() (to a file), DomDocument::saveXML() (to a string), DomDocument::saveHTML() (also to a string, but it saves an HTML document instead of an XML file), or DomDocument:saveHTMLFile() (to a file in HTML format).

Listing 8.15: Loading XML with DOM

$dom = new DomDocument();

$dom->load('library.xml');

// Do something with our XML here

// Save to file

if ($use_xhtml) {

    $dom->save('library.xml');

} else {

    $dom->saveHTMLFile('library.xml');

}

// Output the data

if ($use_xhtml) {

    echo $dom->saveXML();

} else {

    echo $dom->saveHTML();

}

XPath Queries

One of the most powerful parts of the DOM extension is its integration with XPath—in fact, DomXPath is far more powerful than its SimpleXML equivalent:

Listing 8.16: XPath queries with DOM

$dom = new DomDocument();

$dom->load("library.xml");

$xpath = new DomXPath($dom);

$xpath->registerNamespace(

    "lib", "http://example.org/library"

);

$result = $xpath->query("//lib:title/text()");

foreach ($result as $book) {

    echo $book->data;

}

This example seems quite complex, but in actuality it shows just how flexible the DOM XPath functionality can be.

First, we instantiate a DomXpath object, passing in our DomDocument object so that the former will know what to work on. Next, we register only the namespaces we need—in this case, the default namespace, associating it with the lib prefix. Finally, we execute our query and iterate over the results.

A call to DomXpath::query() will return a DomNodeList object; you can find out how many items it contains by using the length property, and then access any one of them with the item() method. You can also iterate through the entire collection using a foreach() loop:

Listing 8.17: XPath query results with DOM

$result = $xpath->query("//lib:title/text()");

if ($result->length > 0) {

    // Random access

    $book = $result->item (0);

    echo $book->data;

    // Sequential access

    foreach ($result as $book) {

        echo $book->data;

    }

}

Modifying XML Documents

To add new data to a loaded document, you need to create new DomElement objects by using the DomDocument::createElement(), DomDocument::createElementNS(), and DomDocument::createTextNode() methods. In the following example, we will add a new book to our libary.xml document.

Listing 8.18: Adding an element with DOM

$dom = new DomDocument();

$dom->load("library.xml");

$book = $dom->createElement("book");

$book->setAttribute("meta:isbn", "9781940111001");

$title = $dom->createElement("title");

$text = $dom->createTextNode("Mastering the SPL Library");

$title->appendChild($text);

$book->appendChild($title);

$author = $dom->createElement("author","Joshua Thijssen");

$book->appendChild($author);

$publisher = $dom->createElement(

    "pub:publisher", "musketeers.me, LLC."

);

$book->appendChild($publisher);

$dom->documentElement->appendChild($book);

As you can see, in this example, we start by creating a book element and set its meta:isbn attribute with DomElement::setAttribute(). Next, we create a title element and a text node containing the book title, which is assigned to the title element using DomElement::appendChild(). For the author and pub:publisherelements, we again use DomDocument::createElement(), passing the node’s text contents as the second attribute. Finally, we append the entire structure to the DomDocument::documentElement property, which represents the root XML node.

Moving Data

The way to move data is not as obvious as you might expect, because the DOM extension doesn’t provide a method that takes care of that explicitly. Instead, you must use a combination of DomNode::appendChild() and DomNode::insertBefore().

Listing 8.19: Moving a node with DOM

$dom = new DOMDocument();

$dom->load("library.xml");

$xpath = new DomXPath($dom);

$xpath->registerNamespace(

    "lib", "http://example.org/library"

);

$result = $xpath->query("//lib:book");

$result->item(1)->parentNode->insertBefore(

    $result->item(1), $result->item(0)

);

Here, we take the second book element and place it before the first.

In the following example, on the other hand, we take the first book element and place it at the end:

Listing 8.20: Appending a node with DOM

$dom = new DOMDocument();

$dom->load("library.xml");

$xpath = new DomXPath($dom);

$xpath->registerNamespace(

    "lib", "http://example.org/library"

);

$result = $xpath->query("//lib:book");

$result->item(1)->parentNode->appendChild($result->item(0));

DomNode::appendChild() and DomNode::insertBefore() will move the node to the new location. If you wish to duplicate a node, use DomNode::cloneNode() first:

Listing 8.21: Duplicating a node with DOM

$dom = new DOMDocument();

$dom->load("library.xml");

$xpath = new DomXPath($dom);

$xpath->registerNamespace(

    "lib", "http://example.org/library"

);

$result = $xpath->query("//lib:book");

$clone = $result->item(0)->cloneNode();

$result->item(1)->parentNode->appendChild($clone);

Modifying Data

When modifying data, you will typically want to edit the CDATA within a node. Apart from using the methods shown above, you can use XPath to find a CDATA node and modify its contents directly:

Listing 8.22: Modifying XML with DOM

$xml = <<<XML

<xml>

    <text>some text here</text>

</xml>

XML;

$dom = new DOMDocument();

$dom->loadXML($xml);

$xpath = new DomXpath($dom);

$node = $xpath->query("//text/text()")->item(0);

$node->data = ucwords($node->data);

echo $dom->saveXML();

In this example, we apply ucwords() to the text() node’s data property. The transformation is applied to the original document, resulting in the following output:

<?xml version="1.0"?>

<xml>

    <text>Some Text Here</text>

</xml>

Removing Data

There are three types of data you may want to remove from an XML document: attributes, elements, and CDATA. DOM provides a different method for each of these tasks: DomNode::removeAttribute(), DomNode::removeChild(), and DomCharacterData::deleteData():

Listing 8.23: Removing data with DOM

$xml = <<<XML

<xml>

    <text type="misc">some text here</text>

    <text type="misc">some more text here</text>

    <text type="misc">yet more text here</text>

</xml>

XML;

$dom = new DOMDocument();

$dom->loadXML($xml);

$xpath = new DomXpath($dom);

$result = $xpath->query("//text");

$result->item(0)->parentNode->removeChild($result->item(0));

$result->item(1)->removeAttribute('type');

$result = $xpath->query('text()', $result->item(2));

$result->item(0)->deleteData(0, $result->item(0)->length);

echo $dom->saveXML();

In this example, we start by retrieving all of the text nodes from our document; then we remove the first one by accessing its parent and passing the former to DomNode::removeChild(). Next, we remove the type attribute from the second element using DomNode->removeAttribute().

Finally, using the third element, we use Xpath again to query for the corresponding text() node, passing in the third element as the context argument, and then delete the CDATA using DomCharacterData::deleteData(), passing in an offset of 0 and a count that is the same as the length of the CDATA node.

Working With Namespaces

DOM is more than capable of handling namespaces on its own. Typically, you can, for the most part, ignore them and pass attribute and element names with the appropriate prefix directly to most DOM functions:

Listing 8.24: Using Namespace prefixes in DOM

$dom = new DomDocument();

$node = $dom->createElement('ns1:somenode');

$node->setAttribute('ns2:someattribute', 'somevalue');

$node2 = $dom->createElement('ns3:anothernode');

$node->appendChild($node2);

// Set xmlns:* attributes

$node->setAttribute('xmlns:ns1', 'http://example.org/ns1');

$node->setAttribute('xmlns:ns2', 'http://example.org/ns2');

$node->setAttribute('xmlns:ns3', 'http://example.org/ns3');

$dom->appendChild($node);

echo $dom->saveXML();

We can try to simplify the use of namespaces somewhat by using the DomDocument::createElementNS() and DomNode::setAttributeNS() methods:

Listing 8.25: Namespaces in DOM

$dom = new DomDocument();

$node = $dom->createElementNS(

    'http://example.org/ns1', 'ns1:somenode'

);

$node->setAttributeNS(

    'http://example.org/ns2',

    'ns2:someattribute',

    'somevalue'

);

$node2 = $dom->createElementNS(

    'http://example.org/ns3', 'ns3:anothernode'

);

$node3 = $dom->createElementNS(

    'http://example.org/ns1', 'ns1:someothernode'

);

$node->appendChild($node2);

$node->appendChild($node3);

$dom->appendChild($node);

$dom->formatOutput = true;

echo $dom->saveXML();

This results in the following output:

<?xml version="1.0"?>

<ns1:somenode  xmlns:ns1="http://example.org/ns1"

               xmlns:ns2="http://example.org/ns2"

               xmlns:ns3="http://example.org/ns3"

               ns2:someattribute="somevalue">

    <ns3:anothernode xmlns:ns3="http://example.org/ns3"/>

    <ns1:someothernode/>

</ns1:somenode>

Interfacing with SimpleXML

As we mentioned earlier in the chapter, you can easily exchange loaded documents between SimpleXML and DOM, in order to take advantage of each system’s strengths where appropriate.

You can also import SimpleXML objects for use with DOM by using dom_import_simplexml():

Listing 8.26: Interfacing with SimpleXML from DOM

$sxml = simplexml_load_file('library.xml');

$node = dom_import_simplexml($sxml);

$dom = new DomDocument();

$dom->importNode($node, true);

$dom->appendChild($node);

The opposite is also possible, by using the aptly named simplexml_import_dom() function:

Listing 8.27: Interfacing with DOM from SimpleXML

$dom = new DOMDocument();

$dom->load('library.xml');

$sxe = simplexml_import_dom($dom);

echo $sxe->book[0]->title;