Professional WordPress: Design and Development, 3rd Edition (2015)

Chapter 11. Migrating to WordPress

WHAT’S IN THIS CHAPTER?            

·     Planning to move an existing site or content pages to WordPress

·     Choosing among the different import options available

·     Listing of potential cleanup or manual fine-tuning steps needed to complete a migration

·     Using WP-CLI to simplify the process

WROX.COM CODE DOWNLOADS FOR THIS CHAPTER

The wrox.com code downloads for this chapter are found at www.wrox.com/go/wordpress3e on the Download Code tab. The code is in the Chapter 11 download file and individually named according to the filenames noted throughout the chapter.

The bulk of this book extols the virtues of WordPress and we hope it has made you more of a WordPress fan, evangelist, and expert. If you are ready, willing, and able to help WordPress conquer the world, but you are not starting with a clean slate, you will need to migrate existing content into WordPress. Alternatively, if you are adding “Family WordPress expert” to the title of “Family SysAdmin,” you are likely to have a line of friends and family asking you to help them get started. Finally this chapter will also touch on taking a local test site to launch using a special WordPress command line tool.

A variety of reasons exist to move existing content into WordPress:

·     You want to move from static, time-invariant content to a narrative style. Rather than publishing “brochureware,” you want to tell a story, and the timeline element of a WordPress site is the best approach.

·     You expect comments, and discussion around your content and organizing by post (topic) corrals the discussion better than an unstructured bulletin board.

·     There is sufficient traffic coming to your site that online advertising or sponsorships are economically viable, and you need full control over the platform.

·     You want to customize the user experience, style, and presentation of your site, or you are in a position to take an existing site, perhaps hosted by your employer, to a self-hosted environment.

The first part of this chapter looks at various ways to move static pages and existing sites to WordPress, with the assumption that you already have content that you need to import. Although WordPress makes it easy to publish content or delegate the writing to other users, gaining a critical mass of pages and posts is essential for generating readership and establishing context for online advertising engines. There is no easier way to get there than by moving your content into a fresh WordPress installation. This content migration will form the foundation for any move of any WordPress site, whether it is into WordPress from another source, or from WordPress to WordPress, such as when moving hosts. This groundwork helps you understand the process when we show you simpler but powerful command line tools at the end of the chapter.

UNDERSTANDING THE PROCESS

The first step in migrating a site to WordPress is to make a plan. It is equally important—and unexciting—to have a plan. Without a map of all of the components and targets, you will either lose content or end up repeating steps until your content is imported in some usable fashion. Spending a little time up front to plan will save time in the long run, and definitely reduce future frustrations.

Content Sources

When planning a migration you must decide what data sources you will want to move. Certainly, you will want to move the actual content and the related media; otherwise this whole exercise is silly. “Actual content” has many different interpretations, however: Are you moving posts from an existing site, documents in a word processing system, static HTML from your current site, or some combination of all of these options? We discuss migrating static content briefly in the next section; however, most of this chapter focuses on the bulk import of posts from other blogging systems and building custom import scripts.

When you decide to migrate to WordPress, you have an opportunity to revisit the pages-versus-posts dilemma covered in Chapter 9. If you are moving a static brochureware website to WordPress, you can probably get away with mapping existing pages to WordPress pages. By thinking outside the box, you may choose to have content on your existing site translate into posts in WordPress. Using a category template page, you can replicate the presentation feel of your old site in WordPress and add functionality available through the post structure. Finally, mapping static pages to posts also allows you to add a chronological background to shed light on how an idea or topic developed.

It is possible to migrate from bulletin boards such as phpBB to WordPress, but the conversion from a threaded discussion structure to posts, pages, and comments requires that you carefully plan the disposition of each topic. One common approach is to make each topic a new category and then organize stories (discussions) into posts in those categories, but you are probably going to end up hand-editing the import script to get the desired result. The custom import script described in this chapter, along with the migration checklist, should help you build a toolset to extract threaded content in a useful form.

Finally, make sure you have the rights to re-use whatever content you are looking to appropriate for your new site. If all of the content is your own, this issue is trivial, but if you have been posting on your employer’s corporate site or sharing a site with coauthors, ensure you have appropriate rights to the content, including copyright, rights to redistribute, and rights for commercial use.

Migration Checklist

Migration is never a clean and simple process. Tools will help you automate the vast majority of the work, but the purpose of planning is to be sure you have accounted for all of the content, metadata, and supporting features to capture the desired intent of your migration to WordPress.

Here is a migration checklist:

·     Content identification—Build a site map, as described in the section “Building a Custom Import Script,” to be sure you do not orphan any pages in the process. If you are importing from word processor files, build a file inventory of what you want to capture.

·     Media—Prepare any actual media assets that are in your content, even though they are not the theme and presentation graphics. Do you have any images, graphs, PowerPoint presentations or other linked documents that need to be moved over? Plan ahead of time where you are going to house these assets in the WordPress site. Are you going to follow WordPress convention or keep them in their current directory structure?

·     Metadata—Is there any metadata that describes the content that you also need to move over and reapply, such as tag or category information, or will you let WordPress do the minimum import so you can fine-tune the categories and tags in post-import processing?

·     Authors and users—Are you moving a single-author or non-attributed website, or do you need to keep content and author associations? This is a further complicating factor for discussing group migration: Are all registered users of the forum also authors?

·     Theme and presentation—Rarely will an existing site's CSS and presentation HTML translate directly into a WordPress theme. Should you create a new theme for your target WordPress site that attempts to preserve the look and feel of the existing site, or are you going to break away and re-launch your new site with a whole new look and feel? You can review Chapter 9, which covers creating a new theme for your site. Also consider whether there is any special or unique content on your site that will require distinctive design concerns prompting custom page templates (again, Chapter 9) or custom post types (covered in Chapter 7).

·     Unique functionality—Working with financial institution websites, we often run into financial calculators and various applications (like the paper kind you fill out) that will not immediately convert to the new site, or will require individual attention. This could be nearly anything; it could be something that was custom-coded for the website such as a poll, or a map for directions or CRM integration. Often you can find WordPress plugins that will provide similar functionality for your new site. Or, as we have often said, a strength of WordPress is that you can always write your own plugin, as covered in Chapter 8. Other times you can use the custom page templates to wrap your custom integration code inside of the WordPress framework. This topic is covered more a little later in this chapter.

·     Cleanup—You will need to tweak and fine-tune your content, especially URLs. You will have to visually inspect a fair sampling of your new WordPress site for anomalies. You will want to map old URLs to new URLs so visitors can still find you and search engine results continue to work.

·     Launch—Bite the bullet and launch your new site. No website is ever complete or perfect, but at some point it has to be good enough to let it loose on the web. Remember, shipping is a feature.

Recognize up front that any migration is not going to be perfect. You are moving core website content from where it was happily living (or perhaps not happily, and that is why you are migrating) to a whole new shell. There is going to be some work involved. We just want to set some expectations—we will look over each step in a little more depth.

Site Preparation

One quick thought before you get into the actual migration. You need a way to work on your new site while still serving up your content on your old site. How you go about this depends on what resources you have available to you. Setting up your development site, or import playground, also affects your URL structure and may require the manual editing changes discussed in the section “Cleaning Up,” later in this chapter.

If you are adding content to an existing site, our recommended method is to set up a whole new WordPress instantiation on a new subdomain. For example, if your current site is http://example.com, make a new DNS entry and website host for a subdomain likehttp://new.example.com or http://test.example.com. This will make it so that you can work on the development site without interrupting your existing site. It will also permit you to use root relative links and make certain steps easier in the long run. On the other hand, if you are establishing a new WordPress site, use the basic installation as your starting point for importing content.

Although you can set up your test environment in other ways, this method can simplify some steps because you are working on what will eventually become your production site. Local development is a good method if you are the only developer working on the site. If you are developing your new site locally, you can use your hosts file to skip all the URL transitions discussed later. However, as you will see at the end of this chapter, handling URL changes in WordPress has been greatly simplified.

CONTENT IDENTIFICATION

All content migration follows a similar pattern: Extract bits from the existing repository, automate preparation for the new system as much as possible, and import the content, typically repeating that loop as you find steps that require manual editing or fine-tuning. This section identifies and prepares the content that we want to move to WordPress and walks you through the three major approaches to WordPress import functions, starting with the fully manual migration of text documents and then exploring the WordPress built-in administrative functions to convert popular formats. It concludes with an in-depth development of a custom extraction and import script.

Migrating Text Documents

We define “text documents” as content primarily associated with a word processor for manipulation, and many text documents will end up as pages in your WordPress site. The content is not static in the sense that it is fixed, but it is not part of the temporal narrative. These tend to be the product pages of your brochureware site.

Brute-force is often the simplest approach here. Copy and paste your text, or export a document as HTML and glue that into a WordPress page. Be warned, however, that most word processing applications insert a huge array of embedded HTML tags, local formatting, and other style elements that make the output page render the way it would as seen from the word processor, but not the way you would want it to within your WordPress theme structure. If you do not want to strip out all but the most elemental paragraph, table, and link information, consider exporting the file as raw text and then hand-editing it to match the style of your other pages. It is ugly, but so is removing half of the document in the form of HTML directives. Even something as simple as the Mac OS TextEdit application uses custom HTML tags for paragraphs if you use Save As for an HTML document. Save your text, without any formatting.

A variation on this theme is merging wiki entries into a site. Most wikis have their own somewhat arcane syntax, different enough from HTML to make copy and paste time-consuming but not worth automated editing unless you are moving a large wiki. If the wiki is really a collection of topics, it makes sense to migrate the wiki pages to WordPress posts, creating categories for each topic or set of topics and relying on tags to even more finely identify the content. The upside to migrating out of a wiki is that you gain use of the WordPress metadata functions and can parse comments and discussion into comment threads rather than endless edits to the wiki document; the downside is that you lose the edit history contained in platforms such as MediaWiki. If you are running a wiki with a MySQL database as the repository, you can use the custom extract and import script discussed later in this chapter to build a migration tool.

Built-In WordPress Import Tools

For most users looking to transport content from one home to another, WordPress offers a variety of built-in import facilities. This section covers the basic conversion process and the use of WordPress eXtended RSS (WXR) files for more flexible or powerful data conversion.

Site Conversion

WordPress offers basic importers for commonly used blogging platforms. You can find these built-in importers on the Import Screen in WordPress, and you will be prompted to install the selected plugin. These conversion tools fall into two main migration categories: They read a file exported by your current platform, or they use the source platform's API set to pull content and re-post it in WordPress. For example, LiveJournal and Tumblr are migrated using their APIs, whereas Movable Type and TypePad are handled by exporting the contents from those platforms into a file that is uploaded and executed by WordPress.

Using WordPress eXtended RSS Files

What if the simple site-to-site conversion does not work, or does not capture enough of the metadata, author information, or other content that you want to migrate? In the most basic case, what you are going to do is export your existing content into an XML import file in the WordPress eXtended RSS (WXR) format, which is, as it is named, an extension of the RSS format.

The process for creating a WXR file depends on the starting point for your source content. Some applications have a built-in WXR exporter function, which will create this file for you. For example, the WordPress Export dashboard will create this file, but this is only useful when moving content from an existing WordPress site to a new one. If your source content does not have this functionality, you can create the WXR file by hand in a text editor. The easiest way to start your WXR file, if you have to create it by hand, is to use the sitemap process.

To create a WXR file using a site map, first you need a site map of your source site. Start with a site map created for search engines, or use a whole site link checker such as Xenu (http://home.snafu.de/tilman/xenulink.html—the site is scary, but the tool works) to create one. Xenu is only available for Windows, but there are similar tools for Mac OS. This site map will list all the pages that need to be migrated.

We do so many migrations from static HTML or other random CMS systems that rather than working a migration plan for each site, we built a special PHP application that spiders a website and builds the WXR for us. Assuming each page is somewhat consistent, this works very well. Using a combination of PHP, curl, and jQuery, we can feed the page list in from the site map, parse the HTML, and gather the content, and then write it out in WXR format.

Once you have a WXR import file, you can then edit it to make any necessary changes to the file prior to importing. You probably will want to read this whole chapter ahead of time so you are aware of the pitfalls that can affect your import and you can benefit from some search and replace on the import file rather than hand-editing or fixing each occurrence later.

Because this WXR import file is straight up XML, you can edit it in your favorite text editor. This allows you to make some bulk changes to URLs, paths, authors, and anything ahead of doing the import. This can save you a lot of time in the long run, but do not go overboard. WordPress is set up to do an import and can automatically create a lot of information for you based on the import.

When editing the WXR import file, you can see how the import content will play out. The nice thing about this format is that it really is extended RSS, so the format is simple. Unfortunately, there is not much documentation about it on WordPress.org (http://codex.wordpress.org/Importing_Content#Importing_from_an_RSS_feed). There is also some example WXR formatting to use as a model in the Google Blog Converter Google Code website at http://code.google.com/p/google-blog-converters-appengine/source/browse/trunk/samples/wordpress-sample.wxr.

This import file is an easy way to move content from your existing site. As discussed later in this chapter, other ways exist to move sites from other content management systems, but we often find it easier to use the WXR import file method.

Outside of the WordPress built-in migration functions, WXR files represent the fastest path to get your content moved into WordPress. In many cases, automating only the “easy” parts is sufficient to get your new site up and running. Assuming your current site has an Export to WXR, a live RSS feed, or some other export mechanism that you can then hand-edit into a WXR format, this can be your simplest approach.

Building a Custom Import Script

More advanced than a simple WXR migration is to extract entries from an existing content management system that is database (ideally MySQL) based, and attempt a direct data manipulation to migrate the content. In this case, you need to have your old and new database tables in the same MySQL database. Then you can run a set of SQL scripts that will read from the old database, transform the data appropriately, and import the content into the WordPress tables. This method can get tricky because you are juggling several SQL scripts to export, convert, and then import the content; however, it is also the most flexible and powerful approach that operates at the data management level.

Listing 11-1 explores an example script to import data directly into the WordPress database. Take a look at the full source before we break it down.

LISTING 11-1: MySQL import script

<?php

//set database connection info for database to import

$hostname = "localhost";

$username = "USERNAME";

$password = "PASSWORD";

$sourcedb = "DATABASE"; // database to import from

$sourcetable = "stories"; // table that stores posts to import

$sourcecomments = "comment"; // table that stores comments to import

//set database connection info for WordPress database

$destdb = "WORDPRESS-DATABASE"; // WordPress database

$wp_prefix = "wp_"; // WordPress table prefix

//database connection

$db_connect = @mysql_connect($hostname, $username, $password)

  or die("Fatal Error: ".mysql_error());

mysql_select_db($sourcedb, $db_connect);

$srcresult = mysql_query("select * from $sourcetable", $db_connect)

  or die("Fatal Error: ".mysql_error());

// used to generate the dashed titles in the URLs

function sanitize($title) {

  $title = strtolower($title);

  $title = preg_replace('/&.+?;/', '', $title); // kill entities

  $title = preg_replace('/[^a-z0-9 _-]/', '', $title);

  $title = preg_replace('/\s+/', ' ', $title);

  $title = str_replace(' ', '-', $title);

  $title = preg_replace('|-+|', '-', $title);

  $title = trim($title, '-');

  return $title;

}

while ($myrow = mysql_fetch_array($srcresult))

{

  //generate post title

  $my_title = mysql_escape_string($myrow['title']);

  //generate post content

  $my_content = mysql_escape_string($myrow['content']);

  //generate post permalink

  $myname = mysql_escape_string(sanitize($my_title));

  //generate SQL to insert data into WordPress

  $sql = "INSERT INTO '" . $wp_prefix . "posts'

(

    'ID' ,

    'post_author' ,

    'post_date' ,

    'post_date_gmt' ,

    'post_content' ,

    'post_title' ,

    'post_name' ,

    'post_category' ,

    'post_excerpt' ,

    'post_status' ,

    'comment_status' ,

    'ping_status' ,

    'post_password' ,

    'to_ping' ,

    'pinged' ,

    'post_modified' ,

    'post_modified_gmt' ,

    'post_content_filtered' ,

    'post_parent',

    'post_type' )

    VALUES

(

    '$myrow[sid]',

        '1',

    '$myrow[time]',

    '0000-00-00 00:00:00',

    '$my_content',

    '$my_title',

    '$myname',

    '$myrow[category]',

    '',

    'publish',

    'open',

    'open',

    '',

    '',

    '',

    '$myrow[time]',

    '0000-00-00 00:00:00',

    '',

    '0',

    'post' );";

    mysql_select_db($destdb, $db_connect);

    //execute query

    mysql_query($sql, $db_connect);

    // load the ID of the post we just inserted

    $sql = "select MAX(ID) from " . $wp_prefix . "posts";

    $getID = mysql_query($sql, $db_connect);

    $currentID = mysql_fetch_array($getID);

    $currentID = $currentID['MAX(ID)'];

    // retrieve all associated post comments

    $mysid = $myrow["pn_sid"];

    mysql_select_db($sourcedb, $db_connect);

    $comments = mysql_query("select * from "

        .$sourcecomments. " where pn_sid = $mysid", $db_connect);

    //import post comments in WordPress

    while ($comrow = mysql_fetch_array($comments))

    {

      $myname = mysql_escape_string($comrow['pn_name']);

      $myemail = mysql_escape_string($comrow['pn_email']);

      $myurl = mysql_escape_string($comrow['pn_url']);

      $myIP = mysql_escape_string($comrow['pn_host_name']);

      $mycomment = mysql_escape_string($comrow['pn_comment']);

      $sql = "INSERT INTO '" . $wp_prefix . "comments'

      (

        'comment_ID' ,

        'comment_post_ID' ,

        'comment_author' ,

        'comment_author_url' ,

        'comment_author_IP' ,

        'comment_date' ,

        'comment_date_gmt' ,

        'comment_content' ,

        'comment_karma' ,

        'comment_approved' ,

        'user_id' )

        VALUES

        (

          '',

          '$currentID',

          '$myname',

          '$myemail',

          '$myurl',

          '$myIP',

          '$comrow[date]',

          '0000-00-00 00:00:00',

          '$mycomment',

          '0',

          '1',

          '0'

        );";

      if ($submit)

      {

        mysql_select_db($destdb, $db_connect);

        mysql_query($sql, $db_connect)

          or die("Fatal Error: ".mysql_error());

      }

    }

  }

  //Update comment count

  mysql_select_db($destdb, $db_connect);

  $tidyresult = mysql_query("select * from $wp_prefix" . "posts", $db_connect)

    or die("Fatal Error: ".mysql_error());

  while ($myrow = mysql_fetch_array($tidyresult))

  {

    $mypostid=$myrow['ID'];

    $countsql="select COUNT(*) from $wp_prefix" . "comments"

       . " WHERE 'comment_post_ID' = " . $mypostid;

    $countresult=mysql_query($countsql) or die("Fatal Error: ".mysql_error());

    $commentcount=mysql_result($countresult,0,0);

    $countsql="UPDATE '" . $wp_prefix . "posts'

      SET 'comment_count' = '" . $commentcount .

      "' WHERE 'ID' = " . $mypostid . " LIMIT 1";

    $countresult=mysql_query($countsql) or die("Fatal Error: ".mysql_error());

  }

At first glance this looks a little complicated, so let’s break it down and discuss each section of the import script:

//set database connection info for database to import

$hostname = "localhost";

$username = "USERNAME";

$password = "PASSWORD";

$sourcedb = "DATABASE"; // database to import from

$sourcetable = "stories"; // table that stores posts to import

$sourcecomments = "comment"; // table that stores comments to import

//set database connection info for WordPress database

$destdb = "WORDPRESS-DATABASE"; // WordPress database

$wp_prefix = "wp_"; // WordPress table prefix

First you set your database connection info. You also set the table names for the source of the content you plan on importing. This example assumes both the source tables and WordPress tables exist in the same database server. Then you set your WordPress tables and table prefix.

Next you need to initialize your database connections. You will have to modify this depending on the MySQL connection libraries available in your PHP. Preferably, you should be using PHP’s PDO library, but for compatibility, a straight up MySQL connection library is shown here:

//database connection

$db_connect = @mysql_connect($hostname, $username, $password)

    or die("Fatal Error: ".mysql_error());

mysql_select_db($sourcedb, $db_connect);

$srcresult = mysql_query("select * from $sourcetable", $db_connect)

    or die("Fatal Error: ".mysql_error());

After setting your database connections, you execute a query to select the data from your source table. This is the data you are going to import into WordPress as posts. Next, you create your sanitize function for creating permalinks. This function removes and replaces any characters that are not legal for URLs and also replaces spaces with dashes to conform to the WordPress permalink structure.

// used to generate the dashed titles in the URLs

function sanitize($title) {

  $title = strtolower($title);

  $title = preg_replace('/&.+?;/', '', $title); // kill entities

  $title = preg_replace('/[^a-z0-9 _-]/', '', $title);

  $title = preg_replace('/\s+/', ' ', $title);

  $title = str_replace(' ', '-', $title);

  $title = preg_replace('|-+|', '-', $title);

  $title = trim($title, '-');

  return $title;

}

After your sanitize function is in place, you start your while loop to loop through the data you are going to import as posts in WordPress:

while ($myrow = mysql_fetch_array($srcresult))

{

Next, you set variables for your post title, content, and permalink values:

  //generate post title

  $my_title = mysql_escape_string($myrow['title']);

  //generate post content

  $my_content = mysql_escape_string($myrow['content']);

  //generate post permalink

  $myname = mysql_escape_string(sanitize($my_title));

Notice that you send the values through the mysql_escape_string function. This PHP function escapes a string for use in a MySQL query. Next, you create the query to insert the post data into the WordPress posts table:

   //generate SQL to insert data into WordPress

  $sql = "INSERT INTO '" . $wp_prefix . "posts' (

    'ID' ,

    'post_author' ,

    'post_date' ,

    'post_date_gmt' ,

    'post_content' ,

    'post_title' ,

    'post_name' ,

    'post_category' ,

    'post_excerpt' ,

    'post_status' ,

    'comment_status' ,

    'ping_status' ,

    'post_password' ,

    'to_ping' ,

    'pinged' ,

    'post_modified' ,

    'post_modified_gmt' ,

    'post_content_filtered' ,

    'post_parent',

    'post_type' )

    VALUES (

    '$myrow[sid]',

    '1',

    '$myrow[time]',

    '0000-00-00 00:00:00',

    '$my_content',

    '$my_title',

    '$myname',

    '$myrow[category]',

    '',

    'publish',

    'open',

    'open',

    '',

    '',

    '',

    '$myrow[time]',

    '0000-00-00 00:00:00',

    '',

    '0',

    'post' );";

As you can see, you set specific values for each row in the wp_posts WordPress table. At this point, you will need to match the values you want to import from the source table with the correct table fields in WordPress. The preceding script is just an example showing how that can be accomplished. Next, you execute the generated query:

    mysql_select_db($destdb, $db_connect);

    //execute query

    mysql_query($sql, $db_connect);

After your query has successfully run, the source data will start to populate in the WordPress posts table. Now you need to import the post comments and associate them with the correct posts. The first step to accomplish this is to load the ID of the post you just inserted, as shown here:

    // load the ID of the post we just inserted

    $sql = "select MAX(ID) from " . $wp_prefix . "posts";

    $getID = mysql_query($sql, $db_connect);

    $currentID = mysql_fetch_array($getID);

    $currentID = $currentID['MAX(ID)'];

This is the ID used to associate a comment with a post in WordPress. Next you need to execute a query to retrieve all of the comments from the source table:

    // retrieve all associated post comments

    $mysid = $myrow["pn_sid"];

    mysql_select_db($sourcedb, $db_connect);

    $comments = mysql_query("select * from "

        .$sourcecomments. " where pn_sid = $mysid", $db_connect);

Next, you start a loop to loop through all of the comments attached to this post and insert them into the WordPress comments table:

    //import post comments in WordPress

    while ($comrow = mysql_fetch_array($comments))

    {

      $myname = mysql_escape_string($comrow['pn_name']);

      $myemail = mysql_escape_string($comrow['pn_email']);

      $myurl = mysql_escape_string($comrow['pn_url']);

      $myIP = mysql_escape_string($comrow['pn_host_name']);

      $mycomment = mysql_escape_string($comrow['pn_comment']);

You also set some variables with the comment data you are going to insert into WordPress. Remember that these values will need to be matched to whatever system you are importing from. Next, it is time to build the query and insert the comment data in WordPress:

       $sql = "INSERT INTO '" . $wp_prefix . "comments'

      (

        'comment_ID' ,

        'comment_post_ID' ,

        'comment_author' ,

        'comment_author_email' ,

        'comment_author_url' ,

        'comment_author_IP' ,

        'comment_date' ,

        'comment_date_gmt' ,

        'comment_content' ,

        'comment_karma' ,

        'comment_approved' ,

        'user_id' )

        VALUES

        (

          '',

          '$currentID',

          '$myname',

          '$myemail',

          '$myurl',

          '$myIP',

          '$comrow[date]',

          '0000-00-00 00:00:00',

          '$mycomment',

          '0',

          '1',

          '0'

        );";

      if ($submit)

      {

        mysql_select_db($destdb, $db_connect);

        mysql_query($sql, $db_connect)

          or die("Fatal Error: ".mysql_error());

      }

    }

  }

As you can see, you match each value to the correct WordPress table field in the INSERT query. After generating your query, you initialize the database connection and execute the query. Remember that this is in a loop, so if ten comments exist on this post, it will execute this INSERT statement for all ten comments.

The final section of code in the importer updates the comment_count value on your posts. When viewing total comments on a post, WordPress does not dynamically generate that number. Instead, it is stored as an integer value in the post record. The first step is to load a single post to count comments for:

  //Update comment count

  mysql_select_db($destdb, $db_connect);

  $tidyresult = mysql_query("select * from $wp_prefix" . "posts", $db_connect)

    or die("Fatal Error: ".mysql_error());

  while ($myrow = mysql_fetch_array($tidyresult))

  {

You also start a while loop to loop through each one of the posts in the WordPress posts table. Next, you run a SELECT COUNT query to count how many comments this single post has:

    $mypostid=$myrow['ID'];

    $countsql="select COUNT(*) from $wp_prefix" . "comments"

       . " WHERE 'comment_post_ID' = " . $mypostid;

    $countresult=mysql_query($countsql) or die("Fatal Error: ".mysql_error());

    $commentcount=mysql_result($countresult,0,0);

Once this code has executed, the variable $commentcount will contain the total number of comments attached to this post. The final part is to update the comment_count field in the WordPress posts table to match this value:

     $countsql="UPDATE '" . $wp_prefix . "posts'

      SET 'comment_count' = '" . $commentcount .

      "' WHERE 'ID' = " . $mypostid . " LIMIT 1";

    $countresult=mysql_query($countsql) or die("Fatal Error: ".mysql_error());

  }

The UPDATE query updates the comment count based on the value of $commentcount. This is a loop so it iterates through each post in the WordPress posts table and updates the comment count for each post.

Remember that this is an example of how to create a script to do a direct import from a source database table into the WordPress database tables. The individual values set in this script would need to be matched to the appropriate values in your source database tables to import.

Whichever method you choose to transport your content, the next step is to import into WordPress. We recommend importing into a fresh installation of WordPress. Or make sure that your import plan includes purging existing content in order to avoid duplicate entries. If you are layering your import on top of existing content, you will need to hand-edit your import files or scripts to make sure no conflicts occur.

Next, do a trial import and see where you end up. Even with up-front planning and consideration there is very little chance you can get it all right on the first try. Review the new site and see what needs to be hand-edited in the script. Again, there will be more “grep-fu” and find-and-replace fun.

If you are going the WXR route, you can use the WordPress Import screen to import this file. An important consideration is the maximum file size and execution time for PHP. If your import file is large, you may need to edit your PHP configuration to increase these timeouts.

MEDIA MIGRATION

There are really two sets of media and asset files for your site: graphics that make up the theme and site frame and graphics and documents that are embedded in the content.

We discuss the theme presentation later in this chapter. This section is about moving the media that is embedded in your content portions of the page—for example, linked Word documents, PowerPoint Presentations, and Adobe PDFs. This also includes any images in your content, like screenshots or graphs. In a traditional WordPress installation, these files are uploaded into your uploads folder and, depending on your configuration, are filed by dates.

Odds are that your existing site is not going to have this sort of directory structure. But, as you know, link structures and naming conventions matter, and if you are moving from Windows to Linux, case now matters, too. You have options for moving your content and how to structure it in the new site. Each migration is different, so you will have to evaluate which method is going to work out best for your case.

Many sites have a top-level folder in the webroot for images, often called /img/ or /images/. The simplest method to move this directory over is just to keep it intact and put it in the top level of your WordPress site. This is primarily copying files from one directory tree on the source server to the new server using a file transfer protocol (FTP) utility and a tarball file. Really, we would rather you used a secure file transfer protocol utility (SFTP) to move files, but it all depends on what your server supports. Keeping the original images in the same directory structure on the new server as the old—for example, it might be /pdfs/—is beneficial because you may not have to remap each image in the content. However, you will see in the cleaning-up stage that changing the URLs is not that difficult, or you could edit the WXR import file ahead of time. This technique may be undesirable because it breaks from the WordPress convention of storing your assets in the wp-uploads directory and your media assets will become separated into two different locations when you start using the WordPress Media Manager to upload new images.

The second option is to move all the images into the WordPress uploads folder on the new server. Make sure the target directory choice matches the WordPress configuration; otherwise, you are not really using this method effectively. You can set WordPress to organize uploads by month and year in the Media Screen. This option pretty much guarantees that you will need to remap every image in your content so you will either need to plan ahead for this in your import scripts or handle the image URL changes in the clean-up phase.

MOVING METADATA

If you need to maintain a certain site structure, you should establish that back in the planning phase. If your existing site already has categories and tag information, likely that information will transfer as part of your migration. You will need to pay careful attention that all your information is exact, or the import will make multiple similar categories.

Otherwise, you may just want WordPress to establish new categories for your content during the import. You should review the template files in your theme, and the template file hierarchy discussed in Chapter 9. You may find that some of the structure that you had to manually maintain in the old site is automatically created in WordPress simply through the WordPress site architecture.

Remember to consider your permalink structure and how it relates to your new structured content. Likewise, consider the category base and tag base URL settings in the Administration Screens. Setting all of these properly on your new site can save you a fair amount of time.

Preserving the site structure or at least the URLs is important if you are moving an established site. Search engines have been indexing your previous site and you have probably made some efforts to optimize the site for search engines, so having the search engines’ indexed links remain will continue to drive traffic to your site.

Even if the default WordPress URL for content is a different link than your original site, you will want to map the old URLs to the new ones. We cover this step in the section “Cleaning Up,” later in this chapter.

MOVING AUTHORS AND USERS

Most brochureware websites are author agnostic. That is, you do not really have content attributed to specific site authors because they are representing a business entity. You can continue with this method, even when using WordPress, which enforces authorship. All you need to do is turn off the author information in your theme.

However, if you are moving from a site that has authorship ingrained, or this is something you want to implement on the new site, you will need to set up your authors in WordPress and attribute the appropriate posts. If you are using the WXR method, your authors can be created for you automatically. If you are using the SQL conversions from another CMS, you will want to carefully build this information into your transformations.

If you are creating a multi-author WordPress site, you may also want to consider the multi-user functionality of WordPress Multisite. WordPress Multisite is covered in depth in Chapter 10.

THEME AND PRESENTATION

The presentation of your new site represents the next set of decisions for your migration. Is your new site going to look exactly like your old site or are you making a design change at the same time?

If you are making a design change, you can use an existing theme or build a new theme and not have to dwell on this step too much. Remember to evaluate whether certain content areas need specific, or unique, design considerations. Otherwise, activate your new theme and work out the kinks.

However, if you intend to keep the look and feel consistent across the migration, you are most likely in for building a new theme. Usually this is not too much of an undertaking because most websites are created with essentially the same building blocks: header, footer, content area, and sidebars. You can pretty easily map these to the proper theme template files.

Nevertheless, you will have to decide if you are going to take your current HTML files and add in the proper WordPress hooks and code, or start with a working WordPress starter theme or theme framework and style it to match. It really is a mixed bag, and you will have to decide the best path for yourself. For practicality, we recommend taking a theme framework and styling it to match your site’s look and feel. In the long run, this has worked out better for us.

UNIQUE FUNCTIONALITY

For integration and functionality, there is really nothing to migrate, except for the actual operations—for example, contact forms, event calendars, and polls. Unless you move the existing PHP code to the WordPress site through page templates, your best bet is to find equivalent plugins. There are many plugins for various tasks; you will simply have to pick the one that closely matches your needs, and activate and configure it.

CLEANING UP

Even when you are done moving the bulk of the visible content, there is always the final fit and finish work. You have done the heavy lifting and at least have something to look at on the new site. Now you need to go through and review all the content with spot tests, fix any glaring issues, and then do some cleaning up to put the final polish on the site.

This section also covers some steps of the “go live” process of launching a site. The balance between when you can make these final changes and when you are still testing can often be difficult to gauge. At some point you have to bite the bullet and make the change. Remember that shipping is a feature.

Though this is called the cleaning-up step, you could be making some drastic changes in this phase so we recommend doing your first backup of the new WordPress database. The wp-DBManager plugin by Lester Chan (http://wordpress.org/plugins/wp-dbmanager/) is an excellent plugin for the job. It allows you to make a backup of the database so you will have this point to roll back to. In addition, this plugin allows scheduling of the backups and some other database tools.

Manual Fine-Tuning

An important pre-launch task is to check all the page links, posts, and page names to make sure everything is how you want it. You can tweak all of the relative paths now. You will be changing the fully qualified URL at launch time. This is the time for you to go through the entire site and manually inspect and adjust your imported content. Rarely is a migration import flawless and nearly always your site will require some manual intervention, either because the import was incapable of making the necessary changes or because the effort required to automate, or fix and re-import, requires more than simply making the adjustments by hand.

Import Limitations

Be mindful of the PHP memory limit on your server. Because the entire import script file is loaded into memory and executed, you can easily hit low limits. If you have a large number of posts to import, try breaking the source export file into pieces and run the import in sections.

Also, an import cannot catch everything. You will have to manually review all your content. It is time-consuming and painful, akin to being forced to listen to your own recorded voice, but it is worthwhile to get fully up and running on WordPress as well as to ensure that all of your content appears on the other side of the migration.

Updating URLs

If many of the links in your content contain hard-coded test site URLs, you can set these to site relative links to make the launch easier. By default, WordPress uses fully qualified links to reference assets and other pages. Changing these links to site relative means that when you make the final DNS switch, you will not have to run this process again, unless of course, you have added new content after this step.

To change some of the common absolute paths that you would encounter in your content you can run some simple SQL queries. Use the SQL page of the wp-DBManager plugin, phpmyadmin, or the command-line MySQL to update them.

Change all the in-site links from absolute to site relative. This query assumes you are running your WordPress site on the test.example.com domain:

UPDATE 'wp_posts' SET post_content=replace(post_content,

'href="http://test.example.com/','href="/');

This query changes all your absolute image source links to site relative, again assuming you are running your new site on the test.example.com domain:

UPDATE 'wp_posts' SET post_content=replace(post_content,

'src="http://test.example.com/','src="/');

Now all your internal site links and image sources are root relative, meaning you can check out your site for bad links either manually or with an automated tool and look for missing graphics. It does not matter what your test URL is. This is temporary and allows you to test all your images and graphics on the test site. Again, if you add any new content after this step, WordPress will use fully qualified URLs using the settings in the WordPress Administration Screens. As part of launching, you will want to run what is essentially the reverse of this SQL statement to set all the paths to the live URL. This preserves links in RSS feeds and other syndication systems to link back to your fully qualified domain name path.

Another method for updating hard-coded URLs in your content is the Search Regex plugin (http://wordpress.org/plugins/search-regex/). This plugin adds a powerful set of search-and-replace functionality to WordPress. It also allows you to view the content before and after without making any database changes so you can fully test your methods before pulling the trigger. Regular expression (regex) patterns can also be used for defining search-and-replace rules.

Redirection

This is a very important step. You will want search engines that have previously indexed your site to continue to refer visitors to the same content. You do not want to lose any investment of time and achievement just because you are switching underlying website platforms. The search spiders do not care how you make your website, other than being readable to them, but they do care where your content is located. After all, it is the only way they can find you.

You have a couple of options for maintaining your existing URL structure on your shiny new WordPress site, the most basic being permalinks. Depending on how your site was laid out and how the planning phase went, you may be able to duplicate the site structure by manually editing the permalink structure of your pages and posts.

If you are running on Apache, this is an easy fix. You can use the site map created in the content migration step to build a list of redirects from the old URL to the new WordPress permalink.

A second option is to use an .htaccess redirect to map the old URLs to the new URLs. Generally this option is only for Apache web servers with the mod_rewrite module enabled, but some .htaccess-like modules exist for IIS and nginx web servers. The .htaccessmethod is an easy fit if you created a site map back in the planning or import file creation phase. You can easily take this import file and with a small script generate the necessary lines for your .htaccess file. Your .htaccess then includes the simple one-to-one matching, one match per line, such as:

Redirect /about.php http://example.com/services/

Redirect /portfolio.php http://example.com/category/portfolio/

Redirect /cool-article.php http://example.com/2009/10/09/cool-article/

Make sure these redirect pairs are listed first in your .htaccess file, and make sure you include the WordPress redirect stanza at the bottom of the file. Also make note that if you use the built-in WordPress permalink changer, you risk overwriting this file, so make a backup.

Finally, there is a great redirection plugin, aptly called Redirection (http://wordpress.org/plugins/redirection/). This plugin, made by John Godley, incorporates your redirect settings directly in the WordPress Administration Screens. This can make it very easy for you to set up the necessary redirects if you are unfamiliar with editing .htaccess files. The redirection plugin also supports WordPress-based redirects in case your site is not on an Apache host.

This plugin has several other notable features, including support for regular expression redirects, 404 logging to track broken links, and several import and export functions. Monitoring your 404 pages is an excellent way to see what you missed in a migration, but will also help you in the long term when you add new content. It will show if you mislinked something in your own site. Reviewing 404 logs is not a fun task, but something that our developers do daily to ensure sites are running fully functional.

LAUNCHING

At some point you have to bite the bullet and launch your new site. You have done all the manual review and automated updates you can, and for better or for worse you are going to make the move to switch sites.

The actual steps for launching your site vary depending on how you did your import and new site development. You may need to make the actual DNS change. If this is the case, you will also need to change the URLs in the General Settings Screen of WordPress. Change these URLs to the live site URLs. This will intentionally break the website rendering. You can temporarily add define('RELOCATE',true); to the wp-config.php file to regain access to your site. Remember to pull this setting back out when the DNS finally propagates. If you set your internal links to root relative during the testing phase, you can now set them to use your live URL. This is also a final opportunity to validate that your planning and the migration process worked. You should verify each and every page on your website and double-check all functionality.

Some other things to consider when moving your WordPress site to production include enabling privacy settings to make your site visible. Set the admin e-mail address to the proper value. Often, if you are moving someone else’s site, you develop the new site with your own e-mail address so the final user does not get confused. Finalize the database backup plugin to regularly back up the database, or if you have some other backup plan, put it into effect. Likewise, confirm the settings of other plugins that should be active in the live site; caching is a good example. Double-check your 404 handling. Finally, make sure your web traffic statistics, like Google Analytics, are enabled. We usually disable analytics in the development phase.

WP-CLI

The first section of this chapter covered the essentials. These are the building blocks, the nitty-gritty so to speak about the process. As you have seen in this book, we are teaching the fundamentals of being a professional WordPress developer. We feel it is important for you to understand the process of what you are trying to accomplish before you learn the shortcuts. If you are importing content from a non-WordPress source into WordPress, you are going to have to brute-force the migration through trial and error and using the previously mentioned tools will give you a starting point. However, if you are moving from a WordPress site to another WordPress site, such as changing hosting servers or domains names, this section is going to show you an easier way, using a newer tool.

What Is WP-CLI?

WP-CLI is exactly what its name suggests—a command line interface for managing WordPress. It's a PHP application that provides useful tools to control and manipulate WordPress functionality without browser interaction, therefore making it scriptable.

WP-CLI is available online at http://wp-cli.org. Originally created by Andreas Creten and Cristi Burca , it is now a community project maintained by Daniel Bachhuber.

WP-CLI includes many commands that allow you, the site administrator or developer, to install WordPress and take control of the features, functionality, and content of the site. Furthermore, because it is a community project, many plugin authors have contributed additional commands that extend the capabilities of the application.

Installing WP-CLI

WP-CLI requires a UNIX-like environment to operate. That means you are restricted to using Linux, Mac OS X, or Cygwin on Windows. WP-CLI will not work on Windows without it. WP-CLI also requires PHP 5.3.2 or later, and WordPress 3.5.2 or later. To be clear, WP-CLI is an advanced tool.

Installing is simple, and there are steps on the website. First, download the installer from GitHub.com. Because you are using a UNIX-like environment, you can use wget or curl to download this package, such as:

curl -O https://raw.githubusercontent.com/wp-cli/builds/gh-pages/phar/wp-cli.phar

Most users want to alias the command wp to use WP-CLI. In order to use wp as the command instead of php wp-cli.phar, you need to make the file executable and relocate it to somewhere in your system path. For example:

chmod +x wp-cli.phar

sudo mv wp-cli.phar /usr/local/bin/wp

Once you have WP-CLI in your executable path, give it a try to make sure you have everything working correctly. Type wp –-info on the command line to see some basic information about WP-CLI. For example:

vagrant@vvv:/srv/www/wordpress-trunk$ wp --info

PHP binary: /usr/bin/php5

PHP version: 5.5.9-1ubuntu4

php.ini used: /etc/php5/cli/php.ini

WP-CLI root dir: /srv/www/wp-cli

WP-CLI global config:

WP-CLI project config:

WP-CLI version: 0.16.0

Depending on your situation, you may need to make configuration changes for finding the correct versions of your LAMP stack or tab completions. See the WP-CLI website for more configuration steps and support.

MIGRATION EXAMPLE

This example is going to focus on taking an existing WordPress site, for example your local development site, and launch it on a production server, such as launching test.example.com to example.com. We are going to assume you have WP-CLI installed on both your development site and also your production servers.

First, we will focus on your local development environment and gather the necessary elements for moving to your production server. Export the database from your development site using WP-CLI.

wp db export

This will give you an SQL file of your content and configuration from WordPress. When doing this manually, as shown earlier, you would want to hand-edit this file or run it through some regex substitutions to make the necessary URL changes. However, using WP-CLI we will save this step for later.

The other development files you need are your custom theme. You can either zip or tarball your development files, or if you are deploying a theme from source code control, you can skip this step and checkout your theme files into your production location. Copy the SQL export file and your theme to your production server via SFTP or whatever access you have. You can store them in your home directory for now, as we will be moving them into the proper location shortly.

Now, ssh into your production server. Note that you should have already set up your web server to serve your domain name and installed WordPress in the document root. (However, you can also use WP-CLI to install WordPress.)

Change to your WordPress directory and then import your SQL file into WordPress using WP-CLI. Your paths may be different, but the following offers an example

cd /var/www/example.com/htdocs

wp import db ~/test_example_com.sql

It is very important that you do not copy and leave your SQL file in your webserver document root as it can be accessed via the web and you might leak sensitive information.

At this point, you can install any plugins needed. With WP-CLI, this is as simple as

wp plugin install wp-super-cache --activate

This command will download and activate the WP Super Cache plugin.

Copy your theme files to the appropriate directory so it is available to your WordPress installation. Using WP-CLI, you can view which themes are available on your site. The command

wp theme list

will display a table showing all the themes available, which theme is active, and their current versions, like so:

vagrant@vvv:/srv/www/wordpress-trunk$ wp theme list

+-------------------------+----------+-----------+---------+

| name                    | status   | update    | version |

+-------------------------+----------+-----------+---------+

| child-of-twentyfourteen | inactive | none      | 1.0     |

| responsive              | inactive | available | 1.9.6.9 |

| thematic                | inactive | none      | 1.0.4   |

| twentyeleven            | inactive | none      | 1.8     |

| twentyfourteen          | active   | none      | 1.1     |

| twentyten               | inactive | none      | 1.6     |

| twentythirteen          | inactive | none      | 1.2     |

| twentytwelve            | inactive | none      | 1.4     |

+-------------------------+----------+-----------+---------+

Next, activate your custom theme with WP-CLI. In this example, you will activate the Child of Twenty Fourteen theme that you created in Chapter 9:

wp theme activate child-of-twentyfourteen

You now have all the basics in place with one thing missing. All the content and configuration still has the test domain set. This is where WP-CLI really shines. You can use WP-CLI to change the domain names in the WordPress database. As you will recall from the first part of this chapter, this is a time consuming and error-prone step. With WP-CLI, it becomes super simple. Using the following command, you can run a test to see everywhere the URL information will be changed in the database without making any changes, yet.

wp search-replace test.example.com example.com --dry-run

This command will display a comprehensive table of all the places this change will be made. To actually run the changes, remove the --dry-run flag. As you can see, this method is significantly easier than the manual method.

This example is just the tip of the iceberg of all the commands available in WP-CLI. Furthermore, with the scriptable capabilities, full WordPress migration and management scripts can be developed with minimal effort.

SUMMARY

In conclusion, transitioning a website to WordPress can seem like a daunting task. However, when you break it down into steps and spend a little time planning up front, the process easily falls into place. The real trick is to establish a new development environment with a fresh WordPress installation, and then you can iterate over trial imports until you get an import that is far enough along to finish up with some manual fine-tuning. Remember that a little elbow grease will go a long way and in the end, the long-term reward of using WordPress and all the built-in and plugin functionality will make the endeavor worthwhile. In addition, the ecosystem of managing WordPress is expanding through tools like WP-CLI, which make the administration of WordPress simpler.

Now that you have your site functioning in WordPress, the next chapter will focus on how you can improve the user experience for both human and non-human visitors.