Zend PHP 5 Certification Study Guide (2014)

Files, Streams, and Network Programming

An often-forgotten feature of PHP is the streams layer. First introduced in PHP 4.3, the streams layer is most often used without the developer even knowing that it exists: whenever you access a file using fopen(), file(), readfile(), include, require, and a multitude of other functions, PHP uses the functionality provided by the streams layer to do the actual “dirty work.”

The streams layer is an abstraction layer for file access. The term “stream” refers to the fact that a number of different resource—like files—but also network connections, compression protocols, and so on—can be considered “streams” of data to be read and/or written either in sequence or at random.

There are some security considerations connected with the use of file-access operations and the streams layer. They are discussed in the Security chapter.

PHP includes a number of default streams:

Stream

Description

file://

standard file access

http://

access to remote resources via HTTP

ftp://

access to remote resources via FTP

php://

access various I/O such as STDIN/STDOUT, or raw post data

compress.zlib:// and compress.bzip2://

access to compressed files (gzip/bzip2) using the zlib compression library.

zip://

access to compressed zip files (requires the zip extension)

data://

RFC 2397 access to data in strings (added in PHP 5.2)

glob://

find pathnames by matching pattern (e.g., glob())

phar://

PHP Archives (also known as PHAR) stream wrapper (added in PHP 5.3)

If no protocol is specified, the file:// is implied.

In addition to these, PHP supports stream filters that can be applied to the stream during I/O operations to transform the data on the fly:

Filters

Description

string.rot13

encodes the data stream using the ROT-13 algorithm

string.toupper

converts strings to uppercase

string.tolower

converts strings to lowercase

string.strip_tags

removes XML tags from a stream

convert.*

a family of filters that converts to and from the base64 encoding

mcrypt.*

a family of filters that encrypts and decrypts data according to multiple algorithms

zlib.*

a family of filters that compresses and decompresses data using the zlib compression library

While this functionality in itself is very powerful, the real killer feature of streams lies in the ability to implement stream wrappers and filters in your PHP scripts—that is, create your own URL scheme that can access data by any means you desire, or a filter than can be applied to any existing stream access. However, these “userland” streams and filters could fill a large book all by themselves, so in this chapter we will concentrate on general file manipulation and the elements of stream wrappers that will typically appear in the exam.

Accessing Files

PHP provides several different ways to create, read from, and write to files, depending on the type of operation you need to perform. First up, we have the more traditional, C-style functions. Just like their C counterparts, these functions open/create, read, write, and close a file handle. A file handle is a reference to an external resource. This means you are not loading the entire file into memory when manipulating it, but simply dealing with a reference to it. Thus, this family of functions is very resource friendly and—while considered somewhat antiquated and arcane in comparison to some of the more recent additions to PHP—is still best-practice material when it comes to dealing with large files:

Listing 6.1: Reading files with file handles

$file = fopen("counter.txt", 'a+');

if ($file == false) {

    die ("Unable to open/create file");

}

if (filesize("counter.txt") == 0) {

    $counter = 0;

} else {

    $counter = (int) fgets($file);

}

ftruncate($file, 0);

$counter++;

fwrite($file, $counter);

echo "There has been $counter hits to this site.";

In this example, we start by opening the file using fopen(); we will use the resulting resource when calling every other function that will work with our file. Note that fopen() returns false upon failure—and we must check for it explicitly to ensure that PHP doesn’t play any automatic-conversion tricks on us.

Next up, we use filesize() to make sure that the file is not empty and our counter has been started. If it is empty, we set the counter to 0; otherwise, we grab the first line using fgets(), which will continue to fetch data until it reaches a newline character.

Finally, we truncate the file using ftruncate(), increment the counter, and write the new counter value to the file using fwrite().

One thing to take notice of is the second argument to fopen(). This determines two things: first, whether we are reading, writing, or doing both things to the file at the same time, and second, whether the file pointer—the position at which the next byte will be read or written—is set at the beginning or at the end of the file. This flag can take on one of these values:

Mode

Result

r

Opens the file for reading only and places the file pointer at the beginning of the file

r+

Opens the file for reading and writing; places the file pointer at the beginning of the file

w

Opens the file for writing only; places the file pointer at the beginning of the file and truncate it to zero length

w+

Opens the file for writing and reading; places the file pointer at the beginning of the file and truncates it to zero length

a

Opens the file for writing only; places the file pointer at the end of the file

a+

Opens the file for reading and writing; places the file pointer at the end of the file

x

Creates a new file for writing only

x+

Creates a new file for reading and writing

Each of these modes can be coupled with a modifier that indicates how the data is to be read and written. The b flag (e.g., w+b) forces “binary” mode, which will make sure that all data is written to the file unaltered. There is also a Windows only flag, t, which will transparently translate UNIX newlines (\n) to Windows newlines (\r\n). In addition, the w, w+, a, and a+ modes will automatically create a new file if it doesn’t yet exist; in contrast, x and x+ will throw an E_WARNING if the file already exists.

Common C-like File Functions

As we mentioned above, PHP provides a complete set of functions that are compatible with C’s file-access library; in fact, there are a number of functions that, although written using a “C-style” approach, provide non-standard functionality.

The feof() function is used to determine when the internal pointer reaches the end of the file:

Listing 6.2: Detecting end-of-file

if (!file_exist ("counter.txt")) {

    throw new Exception ("The file does not exists");

}

$file = fopen("counter.txt", "r");

$txt = '';

while (!feof($file)) {

    $txt .= fread($file, 1);

}

echo "There have been $txt hits to this site.";

The fread() function is used to read arbitrary data from a file. Unlike fgets(), it does not concern itself with newline characters—it only stops reading data when either the number of bytes specified in its argument have been transferred, or the pointer reaches the end of the file.

Note the use of the file_exists() function, which returns a Boolean value that indicates whether a given file is visible to the user under which the PHP interpreter runs.

The file pointer itself can be moved without reading or writing data by using the fseek() function, which takes three parameters: the file handle, the number of bytes by which the pointer is to be moved, and the position from which the move must take place. This last parameter can contain one of three values: SEEK_SET (start from the beginning of the file), SEEK_CUR (start from the current position), and SEEK_END (start from the end of the file):

$file = fopen('counter.txt', 'r+');

fseek($file, 10, SEEK_SET);

You should keep in mind that the value of the second parameter is added to the position you specify as a starting point. Therefore, when your starting position is SEEK_END, this number should always be zero or less, while when you use SEEK_SET, it should always be zero or more. When you specifySEEK_CURR as a starting point, the value can be either positive (move forward) or negative (move backwards); in this case, a value of zero, while perfectly legal, makes no sense.

To find the current position of the pointer, you should use ftell().

The last two functions that we are going to examine here are fgetcsv() and fputcsv(), which vastly simplify the task of accessing CSV files. As you can imagine, the former reads a row from a previously-opened CSV file into an enumerative array, while the latter writes the elements of an array in CSV format to an open file handle.

Both of these functions require a file handle as their first argument, and accept an optional delimiter and enclosure character as their last two arguments:

Listing 6.3: Reading CSV files

// open for reading and writing

$f = fopen('file.csv', 'a+');

while ($row = fgetcsv($f)) {

    // handle values

}

$values = array(

    "Davey Shafik",

    "http://zceguide.com",

    "Win Prizes!"

);

// append line to csv file

fputcsv($f, $values);

fclose($f);

If you don’t specify a delimiter and an enclosure character, both fgetcsv() and fputcsv() use a comma and quotation marks respectively.

Simple File Functions

In addition to the “traditional” C-like file-access functions, PHP provides a set of simplified functions that allow you to perform multiple file-related operations with a single function call.

As an example, readfile() will read a file and write it immediately to the script’s standard output. This is useful when you need to include static files, as it offers much better performance and resource utilization than C-style functions:

header("content-type: video/mpeg");

readfile("my_home_movie.mpeg");

Similarly, file() will let you read a file into an array of lines (that is, one array element for each line of text in the file). Prior to PHP 4.3.0, it was common to use this function together with implode() as a quick-and-dirty way to load an entire file into memory. More recent versions of PHP provide the file_get_contents() function specifically for this purpose:

// Old Way

$file = implode("\r\n", file("myfile.txt"));

// New Way

$file = file_get_contents("myfile.txt");

Loading an entire file in memory is not always a good idea—large files require a significant amount of system resources (primarily memory) and will very rapidly starve your server under load. You can, however, limit the amount of data read by file_get_contents() by specifying an appropriate set of parameters to the function.

As of PHP 5.0.0, file_put_contents() was added to the language core to simplify the writing of data to files. Like file_get_contents(), file_put_contents() allows you to write the contents of a PHP string to a file in one pass:

$data = "My Data";

file_put_contents("myfile.txt", $data, FILE_APPEND);

$data = array("More Data", "And More", "Even More");

file_put_contents("myfile.txt", $data, FILE_APPEND);

As you can see, this function allows you to specify a number of flags to alter its behaviour:

Flag

Result

FILE_USE_INCLUDE_PATH

causes the function to use the include_path to find the file

FILE_APPEND

appends the data to the file, rather than overwriting

LOCK_EX

acquires an exclusive lock before accessing the file (since PHP 5.1)

In the example above, we pass an array to file_put_contents() instead of a string. The function will automatically apply the equivalent of implode("", $data) on the $data array and write the resulting string to the file. In addition, it is possible to pass file_put_contents() a stream resource instead of a string or an array; in this case, the unread remainder of the stream will be placed in the file.

Working with Directories

PHP offers a very powerful set of directory manipulation functions. The simplest one is chdir(), which like the UNIX command, changes the current working directory of the interpreter:

$success = chdir('/usr/bin');

This function can fail for a number of reasons—for example, because the name you specify points to a directory that doesn’t exist, or because the account under which PHP runs does not have the requisite privileges for accessing it. In these cases, the function returns false and emits a warning error.

Incidentally, you can find out what the current working directory is by calling getcwd():

echo "The current working directory is " . getcwd();

It is interesting to note that, on some UNIX systems, this function can fail and return false if the any of the parents of the current directory do not have the proper permissions set.

Directory creation is just as simple, thanks to the mkdir() function:

if (!mkdir ('newdir/mydir', 0666, true)) {

    throw new Exception ("Unable to create directory");

}

This function accepts three parameters. The first is the path to the directory you want to create. Note that, normally, only the last directory in the path will be created, and mkdir() will fail if any other component of the path does not correspond to an existing directory. The third parameter to the function, however, allows you to override this behavior and actually create any missing directories in the path. The second parameter allows you to specify the access mode for the file—an integer parameter that most people prefer to specify in the UNIX-style octal notation. Note that this parameter is ignored under Windows, where access control mechanisms are different.

Controlling File Access

Access to a file is determined by a variety of factors, such as the type of operation we want to perform, and the filesystem’s permissions. For example, we can’t create a directory that has the same name as an existing file, any more than we can use fopen() on a directory.

Therefore, a whole class of functions exists for the sole purpose of helping you determine the type of a filesystem resource:

Function

Description

is_dir()

checks if the path is a directory

is_executable()

checks if the path is executable

is_file()

checks if the path exists and is a regular file

is_link()

checks if the path exists and is a symlink

is_readable()

checks if the path exists and is readable

is_writable()

checks if the path exists and is writable

is_uploaded_file()

checks if the path is an uploaded file (sent via HTTP POST)

Each of these functions returns a Boolean value; note that the results of a call to any of these functions will be cached, so that two calls to a given function on the same stream resource and during the same script will return the same value, regardless of whether the underlying resource has changed in the meantime. Given the relatively short lifespan of a script, this is not generally a problem, but it is something to keep in mind when dealing with long-running scripts, or with scripts whose purpose is precisely that of waiting for a resource to change. For example, consider the following script:

$f = '/test/file.txt';

while (!is_readable($f)) {}

$data = file_get_contents();

Besides the obviously unhealthy practice of performing an operation inside an infinite loop, this code has the added handicap that, if /test/file.txt is not readable when the script first enters into the while() loop, this script will never stop running, even if the file later becomes readable, since the data is cached when is_readable() is first executed.

The internal cache maintained within PHP for these functions can be cleared by calling clearstatcache().

File permissions on UNIX systems can be changed using a number of functions, including chmod(), chgrp() and chown(). For example:

chmod ('/test/file.txt', 0666);

Note how chmod() in particular takes a numeric value for the file’s permissions—text permissions specifiers like gu+w are not allowed. As you can see above, the octal notation makes it easier to use the same values that you would use when calling the chmod UNIX shell utility.

Accessing Network Resources

As we mentioned earlier, one of the strongest points of the streams layer is the fact that the same set of functionality that you use to access files can be used to access a number of network resources, often without the need for any special adjustments. This has the great advantages of both greatly simplifying tasks like opening a remote web page or connecting to an FTP server, and eliminating the need to learn another set of functions.

Simple Network Access

The easiest way to access a network resource is to treat it in exactly the same way as you would a file. For example, suppose you wanted to load up the main page of php[architect]:

Listing 6.4: Accessing network resources as files

$f = fopen('http://www.phparch.com');

$page = '';

if ($f) {

    while ($s = fread($f, 1000)) {

        $page .= $s;

    }

} else {

    throw new Exception(

        "Unable to open connection to www.phparch.com"

    );

}

Clearly, not all file functions may work with a given network resource; for example, you cannot write to an HTTP connection, because doing so is not allowed by the protocol, and would not make sense.

One aspect of streams that is not always immediately obvious is the fact that they affect pretty much all of PHP’s file access functionality, including require() and include(). For example, the following is perfectly valid (depending on your configuration):

include 'http://phparch.com';

This capability is, of course, something that you should both love and fear: on one hand, it allows you to include remote files from a different server; on the other, it represents a potential security hole of monumental proportions if the wrong people get their hands on your code or, worse, if you are using a variable to indicate the remote file to include.

It is possible to disable the use of streams entirely (except for file://) by setting the allow_url_fopen INI setting to 0 (or off), or you can just disable their use in include and require (and their *_once variants) by setting the allow_url_include INI setting to 0 (or off).

Stream Contexts

Stream contexts allow you to pass options to the stream handlers that you transparently use to access network resources, thus allowing you to tweak a handler’s behavior in ways that go beyond what normal file functionality can do. For example, you can instruct the HTTP stream handler to perform a POST operation, which is very handy when you want to work with web services.

Stream contexts are created using stream_context_create():

Listing 6.5: Creating a stream context

$http_options = stream_context_create([

    'http' => [

        'user_agent' => "Davey Shafik's Browser",

        'max_redirects' => 3

    ]

]);

$file = file_get_contents(

    "http://localhost/", false, $http_options

);

In this example, we set context options for the http stream, providing our own custom user agent string (which is always the polite thing to do to help people identify the activity you perform on their server), and set the maximum number of transparent redirections to three. Finally, as you can see, we pass the newly-created context as a parameter to file_get_contents().

If you wish to set a default context for a stream, you can use stream_context_set_default(), which takes the same input as stream_context_create().

Listing 6.6: Setting a default stream context

stream_context_set_default([

    'http' => [

        'user_agent' => "Davey Shafik's Browser",

        'max_redirects' => 3

    ]

]);

$file = file_get_contents("http://localhost/");

Advanced Stream Functionality

While the built-in stream handlers cover the most common network and file operations, there are some instances—such as when dealing with custom protocols—when you need to take matters into your own hands. Luckily, the stream layer makes even this much easier to handle than, say, if you were using C. In fact, you can create socket servers and clients using the stream functions stream_socket_server() and stream_socket_client(), and then use the traditional file functions to exchange information:

$socket = stream_socket_server("tcp://0.0.0.0:1037");

while ($conn = stream_socket_accept($socket)) {

    fwrite($conn, "Hello World\n");

    fclose($conn);

}

fclose($socket);

You can then connect to this simple “Hello World” server using stream_socket_client().

$socket = stream_socket_client('tcp://0.0.0.0:1037');

while (!feof($socket)) {

    echo fread($socket, 100);

}

fclose($socket);

Finally, we can run our server just like any other PHP script:

$ php ./server.php &

and our client:

$ php ./client.php

Hello World

Stream Filters

Stream filters allow you to pass data in and out of a stream through a series of filters that can alter it dynamically, for example, changing it to uppercase, passing it through a ROT-13 encoder, or compressing it using bzip2. Filters on a given stream are organized in a chain. Thus, you can set them up so that the data passes through multiple filters, sequentially.

You can add a filter to a stream by using stream_filter_prepend() and stream_filter_append()—which, as you might guess, add a filter to the beginning and end of the filter chain respectively:

Listing 6.7: Filtering stream input

$socket = stream_socket_server("tcp://0.0.0.0:1037");

while ($conn = stream_socket_accept($socket)) {

    stream_filter_append($conn, 'string.toupper');

    stream_filter_append($conn, 'zlib.deflate');

    fwrite($conn, "Hello World\n");

    fclose($conn);

}

fclose($socket);

In this example, we apply the string.toupper filter to our server stream, which will convert the data to upper case, followed by the zlib.deflate filter to compress it whenever we write data to it.

We can then apply the zlib.inflate filter to the client, and complete the implementation of a compressed data stream between server and client:

$socket = stream_socket_client('tcp://0.0.0.0:1037');

stream_filter_append($socket, 'zlib.inflate');

while (!feof($socket)) {

    echo fread($socket, 100);

}

fclose($socket);

If you consider how complex the implementation of a similar compression mechanism would have normally been, it’s clear that stream filters are a very powerful feature.

PHP Archives (PHAR)

PHP 5.3 added the phar extension, which allows you to create and distribute entire PHP applications as single file archives, known as phars, or PHP Archives.

PHP Archives can be in tar, zip, or phar format, each of which has its own merits. Tar and zip formats are standard formats that can be read by any standard tool, whereas the phar format requires ext/phar or the PHP_Archive PEAR package.

However, the phar format does not require the extension to run (though it is highly recommended), which makes distributing them much easier. The phar extension is enabled by default in PHP 5.3+.

If you do not have the ability to install extensions, and phar is not available, you can use the PHP_Archive PEAR package as a replacement if using the phar format.

A PHP Archive (regardless of format) contains 3 parts:

1.    A stub

2.    A manifest describing its contents

3.    The file contents

In addition, phar format archives may also contain a signature file for verifying the integrity of the archive.

Since PHP 5.3, it has been possible to add streams to your include_path, meaning you can put a PHP Archive in your include_path and PHP will be able to pull files out of it automatically.

Building a PHP Archive

We recommend using the phar format, as it is optimized specifically for this purpose, and provides you with the best feature set, although you are unable to extract it using standard tools (though of course, you can easily write a simple PHP script that will do this).

There are two ways to create PHP Archives: the PEAR package PHP_Archive and the phar extension itself. It is recommended that you use the phar extension.

Assuming we have an application that looks like this:

├── app

│   ├── cli

│   │   └── app.php

│   └── public

│       ├── css

│       ├── img

│       ├── index.php

│       └── js

└── build

Our application code lives in /app, which contains a CLI and web facing frontend (cli and public, respectively). We then have a build directory, where our resulting PHP Archive will end up, and finally our build.php to create the archive.

Our build file might look like this:

Listing 6.8: Phar build file

$phar = new Phar("build/app.phar", 0, "app.phar");

$phar->buildFromDirectory("./app");

$phar->setStub(

    $phar->createDefaultStub(

        "cli/app.php", "public/index.php"

    )

);

Running this will create our app.phar in the build directory; that is, built from the contents of our app directory. We then add the default stub, which supports using different index files for CLI (cli/app.php) and web (public/index.php). Note that we use paths relative to the application root, not the current working directory.

Using the default stub also makes it easier to run traditional apps from within the archive; if the phar extension is not installed, it will unpack the archive to a temporary directory and then run the code.

If you create the archive using PHP_Archive instead, it will bundle itself inside the stub, making it entirely self-contained—however, using the extension is recommended.

Now we have a single distributable file that contains all of our PHP, CSS, JavaScript, and images. We can either run this from the command line using php app.phar, or serve it via our web server.

Using a Custom Stub

While the phar extension has the ability to generate a simple default stub—which is often all we need—we can use custom stubs to do whatever we want.

The stub is simply PHP code that is run when the phar is included directly (include 'file.phar';), or run directly with PHP (e.g. php file.phar). It does not run when using the stream to access an individual file (e.g., include 'phar://app.phar/file.php';).

It is simply a bootstrap that at its most simple will map the phar (run the phar and register its manifest) and run an index file.

Phar::mapPhar();

include 'phar://' . __FILE__ . '/public/index.php';

__HALT_COMPILER();

The stub must end with the __HALT_COMPILER() token.

To use the custom stub, just pass the code to Phar->setStub() instead of using the Phar->createDefaultStub() method.

Summary

As you can see, streams penetrate to the deepest levels of PHP, from general file access to TCP and UDP sockets. It is even possible to create your own stream protocols and filters, making this the ultimate interface for sending and receiving data with any data source and encoding, from case-changes to stripping tags, to more complex compression and encryption.