Programming Google App Engine with Python (2015)
On the Internet, popularity is swift and fleeting. A mention of your website on a popular news site can bring 300,000 potential customers your way at once, all expecting to find out who you are and what you have to offer. But if you’re a small company just starting out, your hardware and software aren’t likely to be able to handle that kind of traffic. You’ve sensibly built your site to handle the 30,000 visits per hour you’re actually expecting in your first six months. Under heavy load, such a system would be incapable of showing even your company logo to the 270,000 others that showed up to look around. And those potential customers are not likely to come back after the traffic has subsided.
The answer is not to spend time and money building a system to serve millions of visitors on the first day, when those same systems are only expected to serve mere thousands per day for the subsequent months. If you delay your launch to build big, you miss the opportunity to improve your product by using feedback from your customers. Building big early risks building something your customers don’t want.
Historically, small companies haven’t had access to large systems of servers on day one. The best they could do was to build small and hope that meltdowns wouldn’t damage their reputation as they try to grow. The lucky ones found their audience, got another round of funding, and halted feature development to rebuild their product for larger capacity. The unlucky ones, well, didn’t.
These days, there are other options. Large Internet companies such as Amazon.com, Google, and Microsoft are leasing parts of their high-capacity systems by using a pay-per-use model. Your website is served from those large systems, which are plenty capable of handling sudden surges in traffic and ongoing success. And because you pay only for what you use, there is no up-front investment that goes to waste when traffic is low. As your customer base grows, the costs grow proportionally.
Google’s offering, collectively known as Google Cloud Platform, consists of a suite of high-powered services and tools: virtual machines in a variety of sizes, multiple forms of reliable data storage, configurable networking, automatic scaling infrastructure, and even the big data analysis tools that power Google’s products. But Google Cloud Platform does more than provide access to Google’s infrastructure. It encapsulates best practices for application architecture that have been honed by Google engineers for their own products.
The centerpiece of Google Cloud Platform is Google App Engine, an application hosting service that grows automatically. App Engine runs your application so that each user who accesses it gets the same experience as every other user, whether there are dozens of simultaneous users or thousands. Your application code focuses on each individual user’s experience. App Engine takes care of large-scale computing tasks—such as load balancing, data replication, and fault tolerance—automatically.
The scalable model really kicks in at the point where a traditional system would outgrow its first database server. With such a system, adding load-balanced web servers and caching layers can get you pretty far, but when your application needs to write data to more than one place, you face a difficult problem. This problem is made more difficult when development up to that point has relied on features of database software that were never intended for data distributed across multiple machines. By thinking about your data in terms of Cloud Platform’s model up front, you save yourself from having to rebuild the whole thing later.
Often overlooked as an advantage, App Engine’s execution model helps to distribute computation as well as data. App Engine excels at allocating computing resources to small tasks quickly. This was originally designed for handling web requests from users, where generating a response for the client is the top priority. Combining this execution model with Cloud Platform’s task queue service, medium-to-large computational tasks can be broken into chunks that are executed in parallel. Tasks are retried until they succeed, making tasks resilient in the face of service failures. The execution model encourages designs optimized for the parallelization and robustness provided by the platform.
Running on Google’s infrastructure means you never have to set up a server, replace a failed hard drive, or troubleshoot a network card. You don’t have to be woken up in the middle of the night by a screaming pager because an ISP hiccup confused a service alarm. And with automatic scaling, you don’t have to scramble to set up new hardware as traffic increases.
Google Cloud Platform and App Engine let you focus on your application’s functionality and user experience. You can launch early, enjoy the flood of attention, retain customers, and start improving your product with the help of your users. Your app grows with the size of your audience—up to Google-sized proportions—without having to rebuild for a new architecture. Meanwhile, your competitors are still putting out fires and configuring databases.
With this book, you will learn how to develop web applications that run on Google Cloud Platform, and how to get the most out of App Engine’s scalable execution model. A significant portion of the book discusses Google Cloud Datastore, a powerful data storage service that does not behave like the relational databases that have been a staple of web development for the past decade. The application model and the datastore together represent a new way of thinking about web applications that, while being almost as simple as the model we’ve known, requires reconsidering a few principles we often take for granted.
A Brief History of App Engine
If you read all that, you may be wondering why this book is called Programming Google App Engine and not Programming Google Cloud Platform. The short answer is that the capabilities of the platform as a whole are too broad for one book. In particular, Compute Engine, the platform’s raw virtual machine capability, can do all kinds of stuff beyond serving web applications.
By some accounts (mine, at least), App Engine started as an early rendition of the Cloud Platform idea, and evolved and expanded to include large-scale and flexible-scale computing. When it first launched in 2008, App Engine hosted web applications written in Python, with APIs for a scalable datastore, a task queue service, and services for common features that lay outside of the “container” in which the app code would run (such as network access). A “runtime environment” for Java soon followed, capable of running web apps based on Java servlets using the same scalable infrastructure. Container-ized app code, schemaless data storage, and service-oriented architecture proved to be not only a good way to build a scalable web app, but a good way to make reliability a key part of the App Engine product: no more pagers.
App Engine evolved continuously, with several major functionality milestones. One such milestone was a big upgrade for the datastore, using a new Paxos-based replication algorithm. The new algorithm changed the data consistency guarantees of the API, so it was released as an opt-in migration (including an automatic migration tool). Another major milestone was the switch from isolated request handlers billed by CPU usage to long-running application instances billed by instance uptime. With the upgraded execution model, app code could push “warm-up” work to occur outside of user request logic and exploit local memory caches.
Google launched Compute Engine as a separate product, a way to access computation on demand for general purposes. With a Compute Engine VM, you can run any 64-bit Linux-based operating system and execute code written in any language compiled to (or interpreted by) that OS. Apps—running on App Engine or otherwise—can call into Compute Engine to start up any number of virtual machines, do work, and either shut down machines when no longer needed or leave them running in traditional or custom configurations.
App Engine and Compute Engine take different approaches to provide different capabilities. But these technologies are already starting to blend. In early 2014, Google announced Managed VMs, a new way to run VM-based code in an App Engine-like way. (This feature is not fully available as I write this, but check the Google Cloud Platform website for updates.) Overall, you’re able to adopt as much of the platform as you need to accomplish your goals, investing in flexibility when needed, and letting the platform’s automaticity handle the rest.
This book is being written at a turning point in App Engine’s history. Services that were originally built for App Engine are being generalized for Cloud Platform, and given REST APIs so you can call them from off the platform as well. App Engine development tools are being expanded, with a new universal Cloud SDK and Cloud Console. We’re even seeing the beginnings of new ways to develop and deploy software, with integrated Git-based source code revision control. As with any book about an evolving technology, what follows is a snapshot, with an emphasis on major concepts and long-lasting topics.
The focus of this book is building web applications using App Engine and related parts of the platform, especially Cloud Datastore. We’ll discuss services currently exclusive to App Engine, such as those for fetching URLs and sending email. We’ll also discuss techniques for organizing and optimizing your application, using task queues and offline processes, and otherwise getting the most out of Google App Engine.
Using This Book
Programming Google App Engine with Python covers App Engine’s runtime environment for the Python programming language. The Python runtime environment provides a fast interpreter for the Python language, and includes Python libraries for all of App Engine’s features. It is compatible with many major open source web application frameworks, such as Django and Flask.
App Engine supports three other runtime environments: Java, PHP, and Go. Java support includes a complete Java servlet environment, with a JVM capable of running bytecode produced by compilers for Java and other languages. The PHP environment runs a native PHP interpreter with the standard library and many extensions enabled, and is capable of running many off-the-shelf PHP applications such as WordPress and Drupal. With the Go runtime environment, App Engine compiles your Go code on the server and executes it at native CPU speeds.
The information contained in this book was formerly presented in a single volume, Programming Google App Engine, which also covered Java. To make it easy to find the information you need for your language, that book has been split into language-specific versions. You are reading the Python version. Programming Google App Engine with Java covers the same material using the Java language, as well as Java-specific topics.
We are considering PHP and Go versions of this book as a future endeavor. For now, the official App Engine documentation is the best resource for using these languages on the platform. If you’re interested in seeing versions of this book for PHP or Go, let us know by sending email email@example.com.
The book is organized so you can jump to the subjects that are most relevant to you. The introductory chapters provide a lay of the land, and get you working with a complete example that uses several features. Subsequent chapters are arranged by App Engine’s various features, with a focus on efficient data storage and retrieval, communication, and distributed computation. Project life cycle topics such as deployment and maintenance are also covered.
Cloud Datastore is a large enough subject that it gets multiple chapters to itself. Starting with Chapter 6, datastore concepts are introduced alongside Python APIs related to those concepts. Python examples use the ndb data modeling library, provided in the Cloud SDK. Data modeling gets its own chapter, in Chapter 9.
Here’s a quick look at the chapters in this book:
Chapter 1, Introducing Google App Engine
A high-level overview of Google App Engine and its components, tools, and major features, as well as an introduction to Google Cloud Platform as a whole.
Chapter 2, Creating an Application
An introductory tutorial in Python, including instructions on setting up a development environment, using template engines to build web pages, setting up accounts and domain names, and deploying the application to App Engine. The tutorial application demonstrates the use of several App Engine features—Google Accounts, the datastore, and memcache—to implement a pattern common to many web applications: storing and retrieving user preferences.
Chapter 3, Configuring an Application
A description of how App Engine handles incoming requests, and how to configure this behavior. This introduces App Engine’s architecture, the various features of the frontend, app servers, and static file servers. We explain how the frontend routes requests to the app servers and the static file servers, and manages secure connections and Google Accounts authentication and authorization. This chapter also discusses quotas and limits, and how to raise them by setting a budget.
Chapter 4, Request Handlers and Instances
A closer examination of how App Engine runs your code. App Engine routes incoming web requests to request handlers. Request handlers run in long-lived containers called instances. App Engine creates and destroys instances to accommodate the needs of your traffic. You can make better use of your instances by writing threadsafe code and enabling the multithreading feature.
Chapter 5, Using Modules
Modules let you build your application as a collection of parts, where each part has its own scaling properties and performance characteristics. This chapter describes modules in full, including the various scaling options, configuration, and the tools and APIs you use to maintain the modules of your app.
Chapter 6, Datastore Entities
The first of several chapters on Cloud Datastore, a scalable object data storage system with support for local transactions and two modes of consistency guarantees (strong and eventual). This chapter introduces data entities, keys and properties, and Python APIs for creating, updating, and deleting entities from App Engine.
Chapter 7, Datastore Queries
An introduction to Cloud Datastore queries and indexes, and the Python APIs for queries. This chapter describes the features of the query engine in detail, and how each feature uses indexes. The chapter also discusses how to define and manage indexes for your application’s queries. Advanced features like query cursors and projection queries are also covered.
Chapter 8, Datastore Transactions
How to use transactions to keep your data consistent. Cloud Datastore uses local transactions in a scalable environment. Your app arranges its entities in units of transactionality known as entity groups. This chapter attempts to provide a complete explanation of how the datastore updates data, and how to design your data and your app to best take advantage of these features.
Chapter 9, Data Modeling with ndb
How to use the Python ndb data modeling library to enforce invariants in your data schema. The datastore itself is schemaless, a fundamental aspect of its scalability. You can automate the enforcement of data schemas by using App Engine’s data modeling interface.
Chapter 10, Datastore Administration
Managing and evolving your app’s datastore data. The Cloud Console, SDK tools, and administrative APIs provide a myriad of views of your data, and information about your data (metadata and statistics). You can access much of this information programmatically, so you can build your own administration panels. This chapter also discusses how to use the Remote API, a proxy for building administrative tools that run on your local computer but access the live services for your app.
Chapter 11, Using Google Cloud SQL with App Engine
Google Cloud SQL provides fully managed MySQL database instances. You can use Cloud SQL as a relational database for your App Engine applications. This chapter walks through an example of creating a SQL instance, setting up a database, preparing a local development environment, and connecting to Cloud SQL from App Engine. We also discuss prominent features of Cloud SQL such as backups, and exporting and importing data. Cloud SQL complements Cloud Datastore and Cloud Storage as a new choice for persistent storage, and is a powerful option when you need a relational database.
Chapter 12, The Memory Cache
App Engine’s memory cache service (“memcache”), and its Python APIs. Aggressive caching is essential for high-performance web applications.
Chapter 13, Fetching URLs and Web Resources
How to access other resources on the Internet via HTTP by using the URL Fetch service. Python applications can call this service using a direct API as well as via Python’s standard library.
Chapter 14, Sending and Receiving Email Messages
How to use App Engine services to send email. This chapter covers receiving email relayed by App Engine by using request handlers. It also discusses creating and processing messages by using tools in the API.
Chapter 15, Sending and Receiving Instant Messages with XMPP
How to use App Engine services to send instant messages to XMPP-compatible services (such as Google Talk), and receive XMPP messages via request handlers. This chapter discusses several major XMPP activities, including managing presence.
Chapter 16, Task Queues and Scheduled Tasks
How to perform work outside of user requests by using task queues. Task queues perform tasks in parallel by running your code on multiple application servers. You control the processing rate with configuration. Tasks can also be executed on a regular schedule with no user interaction.
Chapter 17, Optimizing Service Calls
A summary of optimization techniques, plus detailed information on how to make asynchronous service calls, so your app can continue doing work while services process data in the background. This chapter also describes AppStats, an important tool for visualizing your app’s service call behavior and finding performance bottlenecks.
Chapter 18, The Django Web Application Framework
How to use the Django web application framework with the Python runtime environment. This chapter discusses setting up a project by using the Django 1.5 library included in the runtime environment, and using Django features such as component composition, URL mapping, views, and templating. It also discusses how to use a newer version of Django than what is built into the runtime environment. The chapter introduces WTForms, a web form framework with special features for integrating with App Engine’s ndb data modeling library.
Chapter 19, Managing Request Logs
Everything you need to know about logging messages, browsing and searching log data in the Cloud Console, and managing and downloading log data. This chapter also introduces the Logs API, which lets you manage logs programmatically within the app itself.
Chapter 20, Deploying and Managing Applications
How to upload and run your app on App Engine, how to update and test an application using app versions, and how to manage and inspect the running application. This chapter also introduces other maintenance features of the Cloud Console, including billing. The chapter concludes with a list of places to go for help and further reading.
Conventions Used in This Book
The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.