Writing Idiomatic Python (2013)

6. Organizing Your Code

6.1 Formatting

6.1.1 Use all capital letters when declaring global constant values

To distinguish constants defined at the module level (or global in a single script) from imported names, use all uppercase letters.

6.1.1.1 Harmful

seconds_in_a_day = 60 * 60 * 24

# ...

def display_uptime(uptime_in_seconds):

    percentage_run_time = (

        uptime_in_seconds/seconds_in_a_day) * 100

    # "Huh!? Where did seconds_in_a_day come from?"

    return 'The process was up {percent} percent of the day'.format(

        percent=int(percentage_run_time))

# ...

uptime_in_seconds = 60 * 60 * 24

display_uptime(uptime_in_seconds)

6.1.1.2 Idiomatic

SECONDS_IN_A_DAY = 60 * 60 * 24

# ...

def display_uptime(uptime_in_seconds):

    percentage_run_time = (

        uptime_in_seconds/SECONDS_IN_A_DAY) * 100

    # "Clearly SECONDS_IN_A_DAY is a constant defined

    # elsewhere in this module."

    return 'The process was up {percent} percent of the day'.format(

        percent=int(percentage_run_time))

# ...

uptime_in_seconds = 60 * 60 * 24

display_uptime(uptime_in_seconds)

6.1.2 Format your code according to PEP8

Python has a language-defined standard set of formatting rules known as PEP8. If you’re browsing commit messages on Python projects, you’ll likely find them littered with references to PEP8 cleanup. The reason is simple: if we all agree on a common set of naming and formatting conventions, Python code as a whole becomes instantly more accessible to both novice and experienced developers. PEP8 is perhaps the most explicit example of idioms within the Python community. Read the PEP, install a PEP8 style-checking plugin for your editor (they all have one), and start writing your code in a way that other Python developers will appreciate. Listed below are a few examples.

Unless wildly unreasonable, abbreviations should not be used (acronyms are fine if in common use, like ‘HTTP’)

Identifier Type

Format

Example

Class

Camel case

class StringManipulator():

Variable

Words joined by _

joined_by_underscore = True

Function

Words joined by _

def multi_word_name(words):

Constant

All uppercase

SECRET_KEY = 42

Basically everything not listed should follow the variable/function naming conventions of ‘Words joined by an underscore’.

6.1.3 Avoid placing multiple statements on a single line

Though the language definition allows one to use ; to delineate statements, doing so without reason makes one’s code harder to read. When multiple statements occur on the same line as an if, else, or elif, the situation is even further confused.

6.1.3.1 Harmful

if this_is_bad_code: rewrite_code(); make_it_more_readable();

6.1.3.2 Idiomatic

if this_is_bad_code:

    rewrite_code()

    make_it_more_readable()

6.2 Documentation

6.2.1 Follow the docstring conventions described in PEP-257

Given that Python has an official stance on code formatting, it should come as no surprise that a similar set of recommendations exist for documentation. In particular, PEP-257 sets forth rules for writing and formatting a docstring. A docstring is (according to PEP-257), “a string literal that occurs as the first statement in a module, function, class, or method definition.” Basically, it’s a line or set of lines enclosed in triple-quotes which immediately follow def, class, or reside at the top of a file.

Writing a “good” docstring, like writing good documentation in general, takes practice. Following the rules in PEP-257 will help a good deal. Two in particular will get you 90% of the way there: Everything public or exported should have a docstring, and how to properly format them.

Writing documentation for all of a class’s public methods and everything exported by a module (including the module itself) may seem like overkill, but there’s a very good reason to do it: helping documentation tools. Third-party software like Sphinx are able to automatically generate documentation (in formats like HTML, LaTex, man pages, etc) from properly documented code. If you decide to use such a tool, all of your classes’ public methods and all exported functions in a module will automatically have entries in the generated documentation. If you’ve only written documentation for half of these, the documentation is far less useful to an end user. Imagine trying to read the official Python documentation for itertools if half of the functions listed only their signature and nothing more.

In addition, this is one of those rules that helps remove a cognitive burden on the programmer. By following this rule, you never have to ask yourself “does this function merit a docstring?” or try to determine the threshold for documentation. Just follow the rule and don’t worry about it. Of course, use common sense if there’s a good reason not to write documentation for something.

The formatting rules help both documentation tools and IDEs. Using a predictable structure to your documentation allows it to be parsed in a useful way. For example, the first line of a docstring should be a one-sentence summary. If more lines are necessary, they are separated from the first line by a blank line. This allows documentation tools and IDEs to present a summary of the code in question and hide more detailed documentation if it’s not needed. There’s really no good reason not to follow the formatting rules (that I can think of), and you’re only helping yourself by doing so.

6.2.1.1 Harmful

def calculate_statistics(value_list):

    # calculates various statistics for a list of numbers

    <The body of the function>

6.2.1.2 Idiomatic

def calculate_statistics(value_list):

    """Return a tuple containing the  mean, median,

    and mode of a list of integers

    Arguments:

    value_list -- a list of integer values

    """

    <The body of the function>

6.2.2 Use Inline Documentation Sparingly

Novice programmers, if they document at all, tend to over document their code. Writing an appropriate docstring is one thing, but writing inline comments about almost every line of code is quite another. Too much documentation is more of a burden on readers and maintainers than none at all. Your goal should be to write self-documenting code. Idiomatic Python is so clear that it reads as if it were documentation. If you frequently find the need to document a single line or small set of lines, it’s an indication your code isn’t as clear as it should be. Worse, if you come back and make changes in a month, you need to remember to change the documentation and the code. The only thing worse than too much documentation is wrong documentation.

6.2.2.1 Harmful

def calculate_mean(numbers):

    """Return the mean of a list of numbers"""

    # If the list is empty, we have no mean!

    if not numbers:

        return 0

    # A variable to keep track of the running sum

    total = 0

    # Iterate over each number in the list

    for number in numbers:

        total += number

    # Divide the sum of all the numbers by how

    # many numbers were in the list

    # to arrive at the sum. Return this value.

    return total / len(numbers)

6.2.2.2 Idiomatic

def calculate_mean(numbers):

    """Return the mean of a list of numbers"""

    return sum(numbers) / len(numbers)

6.2.3 Document What Something Does, Not How

When writing a docstring, be sure to document what the code does rather than how it does it. Readers of the documentation don’t need to know how a function works to use it. Indeed, describing how a function works in its documentation creates a leaky abstraction. Further, it increases the probability that the code and documentation will diverge at some point. To see why, look at the harmful example below; changing the method for determining if a number is prime requires a corresponding change the function’s docstring. If the docstring is not updated, the documentation now says one thing but the code says another. The idiomatic example requires no documentation changes when the underlying implementation changes.

6.2.3.1 Harmful

def is_prime(number):

    """Mod all numbers from 2 -> number and return True

    if the value is never 0"""

    for candidate in range(2, number):

        if number % candidate == 0:

            print(candidate)

            print(number % candidate)

            return False

    return number > 0

6.2.3.2 Idiomatic

def is_prime(number):

    """Return True if number is prime"""

    for candidate in range(2, number):

        if number % candidate == 0:

            return False

    return number > 0

6.3 Imports

6.3.1 Arrange your import statements in a standard order

As projects grow (especially those using web frameworks), so do the number of import statements. Stick all of your import statements at the top of each file, choose a standard order for your import statements, and stick with it. While the actual ordering is not as important, the following is the order recommended by Python’s Programming FAQ:

1.    standard library modules

2.    third-party library modules installed in site-packages

3.    modules local to the current project

Many choose to arrange the imports in (roughly) alphabetical order. Others think that’s ridiculous. In reality, it doesn’t matter. What matters it that you do choose a standard order (and follow it of course).

6.3.1.1 Harmful

import os.path

# Some function and class definitions,

# one of which uses os.path

# ....

import concurrent.futures

from flask import render_template

# Stuff using futures and Flask's render_template

# ....

from flask import (Flask, request, session, g,

    redirect, url_for, abort,

    render_template, flash, _app_ctx_stack)

import requests

# Code using flask and requests

# ....

if __name__ == '__main__':

    # Imports when imported as a module are not so

    # costly that they need to be relegated to inside

    # an 'if __name__ == '__main__'' block...

    import this_project.utilities.sentient_network as skynet

    import this_project.widgets

    import sys

6.3.1.2 Idiomatic

# Easy to see exactly what my dependencies are and where to

# make changes if a module or package name changes

import concurrent.futures

import os.path

import sys

from flask import (Flask, request, session, g,

    redirect, url_for, abort,

    render_template, flash, _app_ctx_stack)

import requests

import this_project.utilities.sentient_network as skynet

import this_project.widgets

6.3.2 Prefer absolute imports to relative imports

When importing a module, you have two choices of the import “style” to use: absolute imports or relative imports. absolute imports specify a module’s location (like <package>.<module>.<submodule>) from a location which is reachable from sys.path.

Relative imports specify a module relative to the current module’s location on the file system. If you are the module package.sub_package.module and need to import package.other_module, you can do so using the dotted relative import syntax: from ..other_module import foo. A single .represents the current package a module is contained in (like in a file system). Each additional . is taken to mean “the parent package of”, one level per dot. Note that relative imports must use the from ... import ... style. import foo is always treated as an absolute import.

Alternatively, using an absolute import you would write

import package.other_module (possibly with an as clause to alias the module to a shorter name.

Why, then, should you prefer absolute imports to relative? Relative imports clutter a module’s namespace. By writing from foo import bar, you’ve bound the name bar in your module’s namespace. To those reading your code, it will not be clear where bar came from, especially if used in a complicated function or large module. foo.bar, however, makes it perfectly clear where bar is defined. The Python programming FAQ goes so far as to say, “Never use relative package imports.”

6.3.2.1 Harmful

# My location is package.sub_package.module

# and I want to import package.other_module.

# The following should be avoided:

from ...package import other_module

6.3.2.2 Idiomatic

# My location is package.sub_package.another_sub_package.module

# and I want to import package.other_module.

# Either of the following are acceptable:

import package.other_module

import package.other_module as other

6.3.3 Do not use from foo import * to import the contents of a module.

Considering the previous idiom, this one should be obvious. Using an asterisk in an import (as in from foo import *) is an easy way to clutter your namespace. This may even cause issues if there are clashes between names you define and those defined in the package.

But what if you have to import a number of names from the foo package? Simple. Make use of the fact that parenthesis can be used to group import statements. You won’t have to write 10 lines of import statements from the same module, and your namespace remains (relatively) clean.

Better yet, simply use absolute imports. If the package/module name is too long, use an as clause to shorten it.

6.3.3.1 Harmful

 from foo import *

6.3.3.2 Idiomatic

from foo import (bar, baz, qux,

        quux, quuux)

# or even better...

import foo

6.4 Modules and Packages

6.4.1 Use modules for encapsulation where other languages would use Objects

While Python certainly supports Object Oriented programming, it does not require it. Most experienced Python programmers (and programmers in general using a language that facilitates it) use classes and polymorphism relatively sparingly. There are a number of reasons why.

Most data that would otherwise stored in a class can be represented using the simple list, dict, and set types. Python has a wide variety of built-in functions and standard library modules optimized (both in design and implementation) to interact with them. One can make a compelling case that classes should be used only when necessary and almost never at API boundaries.

In Java, classes are the basic unit of encapsulation. Each file represents a Java class, regardless of whether that makes sense for the problem at hand. If I have a handful of utility functions, into a “Utility” class they go! If we don’t intuitively understand what it means to be a “Utility” object, no matter. Of course, I exaggerate, but the point is clear. Once one is forced to make everything a class, it is easy to carry that notion over to other programming languages.

In Python, groups of related functions and data are naturally encapsulated in modules. If I’m using an MVC web framework to build “Chirp”, I may have a package named chirp with model, view, and controller modules. If “Chirp” is an especially ambitious project and the code base is large, those modules could easily be packages themselves. The controller package may have a persistence module and a processing module. Neither of those need be related in any way other than the sense that they intuitively belong under controller.

If all of those modules became classes, interoperability immediately becomes an issue. We must carefully and precisely determine the methods we will expose publicly, how state will be updated, and the way in which our class supports testing. And instead of a dict or list, we haveProcessing and Persistence objects we must write code to support.

Note that nothing in the description of “Chirp” necessitates the use of any classes. Simple import statements make code sharing and encapsulation easy. Passing state explicitly as arguments to functions keeps everything loosely coupled. And it becomes far easier to receive, process, and transform data flowing through our system.

To be sure, classes may be a cleaner or more natural way to represent some “things”. In many instances, Object Oriented Programming is a handy paradigm. Just don’t make it the only paradigm you use.

6.5 Executable Scripts

6.5.1 Use the if __name__ == '__main__' pattern to allow a file to be both imported and run directly

Unlike the main() function available in some languages, Python has no built-in notion of a main entry point. Rather, the interpreter immediately begins executing statements upon loading a Python source file. If you want a file to function both as an importable Python module and a stand-alone script, use the if __name__ == '__main__' idiom.

6.5.1.1 Harmful

import sys

import os

FIRST_NUMBER = float(sys.argv[1])

SECOND_NUMBER = float(sys.argv[2])

def divide(a, b):

    return a/b

# I can't import this file (for the super

# useful 'divide' function) without the following

# code being executed.

if SECOND_NUMBER != 0:

    print(divide(FIRST_NUMBER, SECOND_NUMBER))

6.5.1.2 Idiomatic

import sys

import os

def divide(a, b):

    return a/b

# Will only run if script is executed directly,

# not when the file is imported as a module

if __name__ == '__main__':

    first_number = float(sys.argv[1])

    second_number = float(sys.argv[2])

    if second_number != 0:

        print(divide(first_number, second_number))

6.5.2 Use sys.exit in your script to return proper error codes

Python scripts should be good shell citizens. It’s tempting to jam a bunch of code after the if __name__ == '__main__' statement and not return anything. Avoid this temptation.

Create a main function that contains the code to be run as a script. Use sys.exit in main to return error codes if something goes wrong or zero if everything runs to completion. The only code under the if __name__ == '__main__' statement should call sys.exit with the return value of yourmain function as the parameter.

By doing this, we allow the script to be used in Unix pipelines, to be monitored for failure without needing custom rules, and to be called by other programs safely.

6.5.2.1 Harmful

if __name__ == '__main__':

    import sys

    # What happens if no argument is passed on the

    # command line?

    if len(sys.argv) > 1:

        argument = sys.argv[1]

        result = do_stuff(argument)

        # Again, what if this is False? How would other

        # programs know?

        if result:

            do_stuff_with_result(result)

6.5.2.2 Idiomatic

def main():

    import sys

    if len(sys.argv) < 2:

        # Calling sys.exit with a string automatically

        # prints the string to stderr and exits with

        # a value of '1' (error)

        sys.exit('You forgot to pass an argument')

    argument = sys.argv[1]

    result = do_stuff(argument)

    if not result:

        sys.exit(1) 

        # We can also exit with just the return code

    do_stuff_with_result(result)

    # Optional, since the return value without this return

    # statment would default to None, which sys.exit treats

    # as 'exit with 0'

    return 0

# The three lines below are the canonical script entry

# point lines. You'll see them often in other Python scripts

if __name__ == '__main__':

        sys.exit(main())