PHP Web Services (2013)

Chapter 3. Headers

So far, we’ve seen various presentations of the HTTP format, and examined the idea that there is a lot more information being transferred in web requests and responses than what appears in the body of the response. The body is certainly the most important bit, and often is the meatiest, but the headers provide key pieces of information for both requests and responses, which allow the client and the server to communicate effectively. If you think of the body of the request as a birthday card with a check inside it, then the headers are the address, postmark, and perhaps the “do not open until…” instruction on the outside (see Figure 3-1).

This additional information gets the body data to where it needs to go and instructs the target on what to do with it when it gets there.

Envelope with stamp, address, and postmark

Figure 3-1. Envelope with stamp, address, and postmark

Request and Response Headers

Many of the headers you see in HTTP make sense in both requests and responses. Others might be specific to either a request or a response. Here’s a sample set of real request and response headers from when I request my own site from a browser (I’m using Chrome).

Request headers:

GET / HTTP/1.1

Host: www.lornajane.net

Connection: keep-alive

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.19 (KHTML, like Gecko) Chrome/25.0.1323.1 Safari/537.19

Accept-Encoding: gzip,deflate,sdch

Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

Accept-Charset: ISO-8859-1,utf-8;q=0.7,+;q=0.3

Response headers:

HTTP/1.1 200 OK

Server: Apache/2.2.14 (Ubuntu)

X-Powered-By: PHP/5.3.2-1ubuntu4.11

X-Pingback: http://www.lornajane.net/xmlrpc.php

Last-Modified: Thu, 06 Dec 2012 14:46:05 GMT

Cache-Control: no-cache, must-revalidate, max-age=0

Content-Type: text/html; charset=UTF-8

Content-Length: 25279

Date: Thu, 06 Dec 2012 14:46:05 GMT

X-Varnish: 2051611642

Age: 0

Via: 1.1 varnish

Connection: keep-alive

Here, you see Content-Type set in the body of the response, but it would also be used when POSTing data with a request. Such multiple-use headers are called entity headers and relate to the body being sent with the HTTP request or response. Specific headers that are sent with requests areUser-Agent, Accept, Authorization, and Cookie, and Set-Cookie is returned with responses.

Common HTTP Headers

The previous examples showed off a selection of common headers, while the next sections move on to take a look at the headers most often encountered when working with APIs. The following examples show how to send and receive various types of headers from PHP so that you can handle headers correctly in your own applications.

User-Agent

The User-Agent header gives information about the client making the HTTP request and usually includes information about the software client. Take a look at the header here:

User-Agent Mozilla/5.0 (Linux; U; Android 2.3.4; en-gb; SonyEricssonSK17i Build/4.0.2.A.0.62) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1

What device do you think made this request? You would probably guess that it was my Sony Ericsson Android phone…and perhaps you would be right. Or perhaps I used a Curl command:

curl -H "User-Agent: Mozilla/5.0 (Linux; U; Android 2.3.4; en-gb; SonyEricssonSK17i Build/4.0.2.A.0.62) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" http://requestb.in/example

We simply have no way of knowing, when a request is received with a User-Agent like this, if it really came from an Android phone, or if it came from something else pretending to be an Android phone. This information can be used to customize the response we send—after all, if someone wants to pretend to be a tiny Android phone, then it is reasonable to respond with the content that would normally be sent to this phone. It does mean, however, that the User-Agent header cannot be relied upon for anything more important, such as setting a custom header and using it as a means of authenticating users. Just like any other incoming data, it is wide open to abuse and must be treated with suspicion.

In PHP, it is possible both to parse and to send the User-Agent header, as suits the task at hand. Here’s an example of sending the header using streams:

<?php

$url = 'http://localhost/book/user-agent.php';

$options = array(

    "http" => array(

        "header"  => "User-Agent: Advanced HTTP Magic Client"

    )

);

$page = file_get_contents($url, false , stream_context_create($options));

echo $page;

We can set any arbitrary headers we desire when making requests, all using the same approach. Similarly, headers can be retrieved using PHP by implementing the same approach throughout. The data of interest here can all be found in $_SERVER, and in this case it is possible to inspect$_SERVER["HTTP_USER_AGENT"] to see what the User-Agent header was set to.

To illustrate, here’s a simple script:

<?php

echo "This request made by: "

    . filter_var($_SERVER['HTTP_USER_AGENT'], FILTER_SANITIZE_STRING);

It’s common when developing content for the mobile web to use headers such as User-Agent in combination with WURFL to detect what capabilities the consuming device has, and adapt the content accordingly. With APIs, however, it is better to expect the clients to use other headers so they can take responsibility for requesting the correct content types, rather than allowing the decision to be made centrally.

Headers for Content Negotiation

Commonly, the Content-Type header is used to describe what format the data being delivered in the body of a request or a response is in; this allows the target to understand how to decode this content. Its sister header, Accept, allows the client to indicate what kind of content isacceptable, which is another way of allowing the client to specify what kind of content it actually knows how to handle. As seen in the earlier example showing headers, here’s the Accept header Google Chrome usually sends:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

To read an Accept header, consider each of the comma-separated values as an individual entity. This client has stated a preference for (in order):

§  text/html

§  application/xhtml+xml

§  application/xml

§  */*

This means that if any of these formats are supplied, the client will understand our meaning. The second two entries, however, include some additional information: the q value. This is an indication of how much a particular option is preferred, where the default value is q=1.

Here, Chrome claims to be able to handle a content type of */*. The asterisks are wildcards, meaning it thinks it can handle any format that could possibly exist—which seems unlikely. If an imaginary format is implemented that both our client and server understand, for example, Chrome won’t know how to parse it, so */* is misleading.

Using the Accept and Content-Type headers together to describe what can be understood by the client, and what was actually sent, is called “Content Negotiation.” Using the headers to negotiate the usable formats means that meta-information is not tangled up with actual data as it would be when sending both kinds of parameters with the body or URL of the request. Including the headers is generally a better approach.

We can negotiate more than just content, too. The earlier example contained these lines:

Accept-Encoding: gzip,deflate,sdch

Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

Accept-Charset: ISO-8859-1,utf-8;q=0.7,+;q=0.3

These headers show other kinds of negotiation, such as declaring what encoding the client supports, which languages are preferred, and which character sets can be used. This enables decisions to be made about how to format the response in various ways, and how to determine which formats are appropriate for the consuming device.

Parsing an Accept header

Let’s start by looking at how to parse an Accept header correctly. All Accept headers have a comma-separated list of values, and some include a q value that indicates their level of preference. If the q value isn’t included for an entry, it can be assumed that q=1 for that entry. Using theAccept header from my browser again, I can parse it by taking all the segments, working out their preferences, and then sorting them appropriately. Here’s an example function that returns an array of supported formats in order of preference:

<?php

function parseAcceptHeader() {

    $hdr = $_SERVER['HTTP_ACCEPT'];

    $accept = array();

    foreach (preg_split('/\s*,\s*/', $hdr) as $i => $term) {

        $o = new \stdclass;

        $o->pos = $i;

        if (preg_match(",^(\S+)\s*;\s*(?:q|level)=([0-9\.]+),i", $term, $M)) {

            $o->type = $M[1];

            $o->q = (double)$M[2];

        } else {

            $o->type = $term;

            $o->q = 1;

        }

        $accept[] = $o;

    }

    usort($accept, function ($a, $b) {

        /* first tier: highest q factor wins */

        $diff = $b->q - $a->q;

        if ($diff > 0) {

            $diff = 1;

        } else if ($diff < 0) {

            $diff = -1;

        } else {

            /* tie-breaker: first listed item wins */

            $diff = $a->pos - $b->pos;

        }

        return $diff;

    });

    $accept_data = array();

    foreach ($accept as $a) {

        $accept_data[$a->type] = $a->type;

    }

    return $accept_data;

}

NOTE

The headers sent by your browser may differ slightly and result in different output when you try the previous code snippet.

When using the Accept header sent by my browser, I see the following output:

array(4) {

  ["text/html"]=>

  string(9) "text/html"

  ["application/xhtml+xml"]=>

  string(21) "application/xhtml+xml"

  ["application/xml"]=>

  string(15) "application/xml"

  ["*/*"]=>

  string(3) "*/*"

}

We can use this information to work out which format it would be best to send the data back in. For example, here’s a simple script that calls the parseAcceptHeader() function, then works through the formats to determine which it can support, and sends that information:

<?php

$data = array ("greeting" => "hello", "name" => "Lorna");

$accepted_formats = parseAcceptHeader();

$supported_formats = array("application/json", "text/html");

foreach($accepted_formats as $format) {

    if(in_array($format, $supported_formats)) {

        // yay, use this format

        break;

    }

}

switch($format) {

    case "application/json":

        header("Content-Type: application/json");

        $output = json_encode($data);

        break;

    case "text/html":

    default:

        $output = "<p>" . implode(',', $data) . "</p>";

        break;

}

echo $output;

There are many, many ways to parse the Accept header (and the same techniques apply to the Accept-Language, Accept-Encoding, and Accept-Charset headers), but it is vital to do so correctly. The importance of Accept header parsing can be seen in Chris Shiflett’s blog post, The Accept Header; the parseAcceptHeader() example shown previously came mostly from the comments on this post. You might use this approach, an existing library such as the PHP mimeparse port, a solution you build yourself, or one offered by your framework. Whichever you choose, make sure that it parses these headers correctly, rather than using a string match or something similar.

Demonstrating Accept headers with Curl

Using Curl from the command line, here are some examples of how to call exactly the same URL by setting different Accept headers and seeing different responses:

curl http://localhost/book/hello.php

hello,Lorna

curl -H "Accept: application/json" http://localhost/book/hello.php

{"greeting":"hello","name":"Lorna"}

curl -H "Accept: text/html;q=0.5,application/json" http://localhost/book/hello.php

{"greeting":"hello","name":"Lorna"}

To make these requests from PHP rather than from Curl, it is possible to simply set the desired headers as the request is made. Here’s an example that uses PHP’s curl extension to make the same request as the previous example:

<?php

$url = "http://localhost/book/hello.php";

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HEADER, array(

    "Accept: text/html;q=0.5,application/json",

));

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);

echo $response;

curl_close($ch);

The number of headers you need to support in your application will vary. It is common and recommended to offer various content types such as JSON, XML, or even plain text. The selection of supported encodings, languages, and character sets will depend entirely on your application and users’ needs. If you do introduce support for variable content types, however, this is the best way to do it.

Securing Requests with the Authorization Header

Headers can provide information that allows an application to identify users. Again, keeping this type of information separate from the application data makes things simpler and, often, more secure. The key thing to remember when working on user security for APIs is that everything you already know about how to secure a website applies to web services. There’s no need for anything new or inventive, and in fact I’ve seen some mistakes made because new wheels were invented instead of existing standards being embraced.

HTTP basic authentication

One of the simplest ways to secure a web page is to use HTTP basic authentication. This means that an encoded version of the user’s credentials is sent in the Authorization header with every request. The underlying mechanics of this approach are simple: the client is given a username and password, and they do the following:

1.    Arrange the username and password into the format username:password.

2.    Base64 encode the result.

3.    Send it in the header, like this: Authorization: Basic base64-encoded string.

4.    Since tokens are sent in plain text, HTTPS should be used throughout.

We can either follow the steps here and manually create the correct header to send, or we can use the built-in features of our toolchain. Here’s PHP’s curl extension making a request to a page protected by basic authentication:

<?php

$url = "http://localhost/book/basic-auth.php";

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ) ;

curl_setopt($ch, CURLOPT_USERPWD, "user:pass");

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);

echo $response;

curl_close($ch);

In PHP, these details can be found on the $_SERVER superglobal. When basic authentication is in use, the username and password supplied by the user can be found in $_SERVER["PHP_AUTH_USER"] and $_SERVER["PHP_AUTH_PASSWORD"], respectively. When a request is made without credentials, or with invalid credentials, a 401 Unauthorized status code can be sent to tell the client why the server is not sending him the content requested.

OAuth

Another alternative for securing web services, especially when you have a third party consumer accessing data that belongs to a user, is OAuth. OAuth sets up a standard way for a consumer to gain access to anoher user’s data that is held by a provider with whom the user already has a relationship, without the user giving away her password. The user visits the main provider’s site to verify her identity and grant access to the consumer, and can also revoke that access at any time. Using this approach, the provider can distinguish between requests made by the user and requests made by something or someone else on behalf of the user.

The OAuth approach is beyond the scope of this book (Getting Started with OAuth 2.0 [O’Reilly] is an excellent reference), but it does make use of the Authorization header and is widely used with APIs, so it is well worth a mention.

Custom Headers

As with almost every aspect of HTTP, the headers that can be used aren’t set in stone. It is possible to invent new headers if there’s more information to convey for which there isn’t a header. Headers that aren’t “official” can be used, but they should be prefixed with X-.

A good example, often seen on the Web, is when a tool such as Varnish has been involved in serving a response, and it adds its own headers. I have Varnish installed in front of my own site, and when I request it, I see:

HTTP/1.1 302 Found

Server: Apache/2.2.14 (Ubuntu)

Location: http://www.lornajane.net/

Content-Type: text/html; charset=iso-8859-1

Content-Length: 288

Date: Tue, 11 Dec 2012 15:53:46 GMT

X-Varnish: 119643096 119643059

Age: 5

Via: 1.1 varnish

Connection: keep-alive

That additional X-Varnish header shows me that Varnish served the request. It isn’t an official header, but these X-* headers are used to denote all kinds of things in APIs and on the Web. A great example comes from GitHub. Here’s what happens when I make a request to fetch a list of the repositories associated with my user account:

HTTP/1.1 200 OK

Server: nginx

Date: Tue, 11 Dec 2012 16:01:00 GMT

Content-Type: application/json; charset=utf-8

Connection: keep-alive

Status: 200 OK

X-Content-Type-Options: nosniff

Cache-Control: public, max-age=60, s-maxage=60

X-GitHub-Media-Type: github.beta

X-RateLimit-Limit: 60

Content-Length: 106586

Last-Modified: Sat, 01 Dec 2012 11:23:32 GMT

Vary: Accept

X-RateLimit-Remaining: 59

ETag: "8c0bde8e577f52c7f68de5d7099e041b"

There are a few custom headers in this example but the X-RateLimit-* headers are particularly worth noting, which check whether too many requests are being made. Using custom headers like these, any additional data can be transferred between client and server that isn’t part of the body data, which means all parties can stay “on the same page” with the data exchange.