Learning Python (2013)

Part V. Modules and Packages

Chapter 24. Module Packages

So far, when we’ve imported modules, we’ve been loading files. This represents typical module usage, and it’s probably the technique you’ll use for most imports you’ll code early on in your Python career. However, the module import story is a bit richer than I have thus far implied.

In addition to a module name, an import can name a directory path. A directory of Python code is said to be a package, so such imports are known as package imports. In effect, a package import turns a directory on your computer into another Python namespace, with attributes corresponding to the subdirectories and module files that the directory contains.

This is a somewhat advanced feature, but the hierarchy it provides turns out to be handy for organizing the files in a large system and tends to simplify module search path settings. As we’ll see, package imports are also sometimes required to resolve import ambiguities when multiple program files of the same name are installed on a single machine.

Because it is relevant to code in packages only, we’ll also introduce Python’s recent relative imports model and syntax here. As we’ll see, this model modifies search paths in 3.X, and extends the from statement for imports within packages in both 2.X and 3.X. This model can make such intrapackage imports more explicit and succinct, but comes with some tradeoffs that can impact your programs.

Finally, for readers using Python 3.3 and later, its new namespace package model—which allows packages to span multiple directories and requires no initialization file—is also introduced here. This new-style package model is optional and can be used in concert with the original (now known as “regular”) package model, but it upends some of the original model’s basic ideas and rules. Because of that, we’ll explore regular packages here first for all readers, and present namespace packages last as an optional topic.

Package Import Basics

At a base level, package imports are straightforward—in the place where you have been naming a simple file in your import statements, you can instead list a path of names separated by periods:

import dir1.dir2.mod

The same goes for from statements:

from dir1.dir2.mod import x

The “dotted” path in these statements is assumed to correspond to a path through the directory hierarchy on your computer, leading to the file mod.py (or similar; the extension may vary). That is, the preceding statements indicate that on your machine there is a directory dir1, which has a subdirectory dir2, which contains a module file mod.py (or similar).

Furthermore, these imports imply that dir1 resides within some container directory dir0, which is a component of the normal Python module search path. In other words, these two import statements imply a directory structure that looks something like this (shown with Windows backslash separators):

dir0\dir1\dir2\mod.py               # Or mod.pyc, mod.so, etc.

The container directory dir0 needs to be added to your module search path unless it’s the home directory of the top-level file, exactly as if dir1 were a simple module file.

More formally, the leftmost component in a package import path is still relative to a directory included in the sys.path module search path list we explored in Chapter 22. From there down, though, the import statements in your script explicitly give the directory paths leading to modules in packages.

Packages and Search Path Settings

If you use this feature, keep in mind that the directory paths in your import statements can be only variables separated by periods. You cannot use any platform-specific path syntax in your import statements, such as C:\dir1, My Documents.dir2, or ../dir1—these do not work syntactically. Instead, use any such platform-specific syntax in your module search path settings to name the container directories.

For instance, in the prior example, dir0—the directory name you add to your module search path—can be an arbitrarily long and platform-specific directory path leading up to dir1. You cannot use an invalid statement like this:

import C:\mycode\dir1\dir2\mod      # Error: illegal syntax

But you can add C:\mycode to your PYTHONPATH variable or a .pth file, and say this in your script:

import dir1.dir2.mod

In effect, entries on the module search path provide platform-specific directory path prefixes, which lead to the leftmost names in import and from statements. These import statements themselves provide the remainder of the directory path in a platform-neutral fashion.[47]

As for simple file imports, you don’t need to add the container directory dir0 to your module search path if it’s already there—per Chapter 22, it will be if it’s the home directory of the top-level file, the directory you’re working in interactively, a standard library directory, or the site-packagesthird-party install root. One way or another, though, your module search path must include all the directories containing leftmost components in your code’s package import statements.

Package __init__.py Files

If you choose to use package imports, there is one more constraint you must follow: at least until Python 3.3, each directory named within the path of a package import statement must contain a file named __init__.py, or your package imports will fail. That is, in the example we’ve been using, both dir1 and dir2 must contain a file called __init__.py; the container directory dir0 does not require such a file because it’s not listed in the import statement itself.

More formally, for a directory structure such as this:

dir0\dir1\dir2\mod.py

and an import statement of the form:

import dir1.dir2.mod

the following rules apply:

§  dir1 and dir2 both must contain an __init__.py file.

§  dir0, the container, does not require an __init__.py file; this file will simply be ignored if present.

§  dir0, not dir0\dir1, must be listed on the module search path sys.path.

To satisfy the first two of these rules, package creators must create files of the sort we’ll explore here. To satisfy the latter of these, dir0 must be an automatic path component (the home, libraries, or site-packages directories), or be given in PYTHONPATH or .pth file settings or manualsys.path changes.

The net effect is that this example’s directory structure should be as follows, with indentation designating directory nesting:

dir0\                               # Container on module search path

    dir1\

        __init__.py

        dir2\

            __init__.py

            mod.py

The __init__.py files can contain Python code, just like normal module files. Their names are special because their code is run automatically the first time a Python program imports a directory, and thus serves primarily as a hook for performing initialization steps required by the package. These files can also be completely empty, though, and sometimes have additional roles—as the next section explains.

NOTE

As we’ll see near the end of this chapter, the requirement of packages to have a file named __init__.py has been lifted as of Python 3.3. In that release and later, directories of modules with no such file may be imported as single-directory namespace packages, which work the same but run no initialization-time code file. Prior to Python 3.3, though, and in all of Python 2.X, packages still require __init__.py files. As described ahead, in 3.3 and later these files also provide a performance advantage when used.

Package initialization file roles

In more detail, the __init__.py file serves as a hook for package initialization-time actions, declares a directory as a Python package, generates a module namespace for a directory, and implements the behavior of from * (i.e., from .. import *) statements when used with directory imports:

Package initialization

The first time a Python program imports through a directory, it automatically runs all the code in the directory’s __init__.py file. Because of that, these files are a natural place to put code to initialize the state required by files in a package. For instance, a package might use its initialization file to create required data files, open connections to databases, and so on. Typically, __init__.py files are not meant to be useful if executed directly; they are run automatically when a package is first accessed.

Module usability declarations

Package __init__.py files are also partly present to declare that a directory is a Python package. In this role, these files serve to prevent directories with common names from unintentionally hiding true modules that appear later on the module search path. Without this safeguard, Python might pick a directory that has nothing to do with your code, just because it appears nested in an earlier directory on the search path. As we’ll see later, Python 3.3’s namespace packages obviate much of this role, but achieve a similar effect algorithmically by scanning ahead on the path to find later files.

Module namespace initialization

In the package import model, the directory paths in your script become real nested object paths after an import. For instance, in the preceding example, after the import the expression dir1.dir2 works and returns a module object whose namespace contains all the names assigned bydir2’s __init__.py initialization file. Such files provide a namespace for module objects created for directories, which would otherwise have no real associated module file.

from * statement behavior

As an advanced feature, you can use __all__ lists in __init__.py files to define what is exported when a directory is imported with the from * statement form. In an __init__.py file, the __all__ list is taken to be the list of submodule names that should be automatically imported whenfrom * is used on the package (directory) name. If __all__ is not set, the from * statement does not automatically load submodules nested in the directory; instead, it loads just names defined by assignments in the directory’s __init__.py file, including any submodules explicitly imported by code in this file. For instance, the statement from submodule import X in a directory’s __init__.py makes the name X available in that directory’s namespace. (We’ll see additional roles for __all__ in Chapter 25: it serves to declare from * exports of simple files as well.)

You can also simply leave these files empty, if their roles are beyond your needs (and frankly, they are often empty in practice). They must exist, though, for your directory imports to work at all.

NOTE

Don’t confuse package __init__.py files with the class __init__ constructor methods we’ll meet in the next part of the book. The former are files of code run when imports first step through a package directory in a program run, while the latter are called when an instance is created. Both have initialization roles, but they are otherwise very different.


[47] The dot path syntax was chosen partly for platform neutrality, but also because paths in import statements become real nested object paths. This syntax also means that you may get odd error messages if you forget to omit the .py in your import statements. For example, import mod.py is assumed to be a directory path import—it loads mod.py, then tries to load a mod\py.py, and ultimately issues a potentially confusing “No module named py” error message. As of Python 3.3 this error message has been improved to say “No module named ‘m.py’; m is not a package.”

Package Import Example

Let’s actually code the example we’ve been talking about to show how initialization files and paths come into play. The following three files are coded in a directory dir1 and its subdirectory dir2—comments give the pathnames of these files:

# dir1\__init__.py

print('dir1 init')

x = 1

# dir1\dir2\__init__.py

print('dir2 init')

y = 2

# dir1\dir2\mod.py

print('in mod.py')

z = 3

Here, dir1 will be either an immediate subdirectory of the one we’re working in (i.e., the home directory), or an immediate subdirectory of a directory that is listed on the module search path (technically, on sys.path). Either way, dir1’s container does not need an __init__.py file.

import statements run each directory’s initialization file the first time that directory is traversed, as Python descends the path; print statements are included here to trace their execution:

C:\code> python               # Run in dir1's container directory

>>> import dir1.dir2.mod      # First imports run init files

dir1 init

dir2 init

in mod.py

>>> 

>>> import dir1.dir2.mod      # Later imports do not

Just like module files, an already imported directory may be passed to reload to force reexecution of that single item. As shown here, reload accepts a dotted pathname to reload nested directories and files:

>>> from imp import reload    # from needed in 3.X only

>>> reload(dir1)

dir1 init

<module 'dir1' from '.\\dir1\\__init__.py'>

>>> 

>>> reload(dir1.dir2)

dir2 init

<module 'dir1.dir2' from '.\\dir1\\dir2\\__init__.py'>

Once imported, the path in your import statement becomes a nested object path in your script. Here, mod is an object nested in the object dir2, which in turn is nested in the object dir1:

>>> dir1

<module 'dir1' from '.\\dir1\\__init__.py'>

>>> dir1.dir2

<module 'dir1.dir2' from '.\\dir1\\dir2\\__init__.py'>

>>> dir1.dir2.mod

<module 'dir1.dir2.mod' from '.\\dir1\\dir2\\mod.py'>

In fact, each directory name in the path becomes a variable assigned to a module object whose namespace is initialized by all the assignments in that directory’s __init__.py file. dir1.x refers to the variable x assigned in dir1\__init__.py, much as mod.z refers to the variable z assigned inmod.py:

>>> dir1.x

1

>>> dir1.dir2.y

2

>>> dir1.dir2.mod.z

3

from Versus import with Packages

import statements can be somewhat inconvenient to use with packages, because you may have to retype the paths frequently in your program. In the prior section’s example, for instance, you must retype and rerun the full path from dir1 each time you want to reach z. If you try to accessdir2 or mod directly, you’ll get an error:

>>> dir2.mod

NameError: name 'dir2' is not defined

>>> mod.z

NameError: name 'mod' is not defined

It’s often more convenient, therefore, to use the from statement with packages to avoid retyping the paths at each access. Perhaps more importantly, if you ever restructure your directory tree, the from statement requires just one path update in your code, whereas imports may require many. The import as extension, discussed formally in the next chapter, can also help here by providing a shorter synonym for the full path, and a renaming tool when the same name appears in multiple modules:

C:\code> python

>>> from dir1.dir2 import mod             # Code path here only

dir1 init

dir2 init

in mod.py

>>> mod.z                                 # Don't repeat path

3

>>> from dir1.dir2.mod import z

>>> z

3

>>> import dir1.dir2.mod as mod           # Use shorter name (see Chapter 25)

>>> mod.z

3

>>> from dir1.dir2.mod import z as modz   # Ditto if names clash (see Chapter 25)

>>> modz

3

Why Use Package Imports?

If you’re new to Python, make sure that you’ve mastered simple modules before stepping up to packages, as they are a somewhat more advanced feature. They do serve useful roles, though, especially in larger programs: they make imports more informative, serve as an organizational tool, simplify your module search path, and can resolve ambiguities.

First of all, because package imports give some directory information in program files, they both make it easier to locate your files and serve as an organizational tool. Without package paths, you must often resort to consulting the module search path to find files. Moreover, if you organize your files into subdirectories for functional areas, package imports make it more obvious what role a module plays, and so make your code more readable. For example, a normal import of a file in a directory somewhere on the module search path, like this:

import utilities

offers much less information than an import that includes the path:

import database.client.utilities

Package imports can also greatly simplify your PYTHONPATH and .pth file search path settings. In fact, if you use explicit package imports for all your cross-directory imports, and you make those package imports relative to a common root directory where all your Python code is stored, you really only need a single entry on your search path: the common root. Finally, package imports serve to resolve ambiguities by making explicit exactly which files you want to import—and resolve conflicts when the same module name appears in more than one place. The next section explores this role in more detail.

A Tale of Three Systems

The only time package imports are actually required is to resolve ambiguities that may arise when multiple programs with same-named files are installed on a single machine. This is something of an install issue, but it can also become a concern in general practice—especially given the tendency of developers to use simple and similar names for module files. Let’s turn to a hypothetical scenario to illustrate.

Suppose that a programmer develops a Python program that contains a file called utilities.py for common utility code, and a top-level file named main.py that users launch to start the program. All over this program, its files say import utilities to load and use the common code. When the program is shipped, it arrives as a single .tar or .zip file containing all the program’s files, and when it is installed, it unpacks all its files into a single directory named system1 on the target machine:

system1\

    utilities.py        # Common utility functions, classes

    main.py             # Launch this to start the program

    other.py            # Import utilities to load my tools

Now, suppose that a second programmer develops a different program with files also called utilities.py and main.py, and again uses import utilities throughout the program to load the common code file. When this second system is fetched and installed on the same computer as the first system, its files will unpack into a new directory called system2 somewhere on the receiving machine—ensuring that they do not overwrite same-named files from the first system:

system2\

    utilities.py        # Common utilities

    main.py             # Launch this to run

    other.py            # Imports utilities

So far, there’s no problem: both systems can coexist and run on the same computer. In fact, you won’t even need to configure the module search path to use these programs on your computer—because Python always searches the home directory first (that is, the directory containing the top-level file), imports in either system’s files will automatically see all the files in that system’s directory. For instance, if you click on system1\main.py, all imports will search system1 first. Similarly, if you launch system2\main.pysystem2 will be searched first instead. Remember, module search path settings are only needed to import across directory boundaries.

However, suppose that after you’ve installed these two programs on your machine, you decide that you’d like to use some of the code in each of the utilities.py files in a system of your own. It’s common utility code, after all, and Python code by nature “wants” to be reused. In this case, you’d like to be able to say the following from code that you’re writing in a third directory to load one of the two files:

import utilities

utilities.func('spam')

Now the problem starts to materialize. To make this work at all, you’ll have to set the module search path to include the directories containing the utilities.py files. But which directory do you put first in the path—system1 or system2?

The problem is the linear nature of the search path. It is always scanned from left to right, so no matter how long you ponder this dilemma, you will always get just one utilities.py—from the directory listed first (leftmost) on the search path. As is, you’ll never be able to import it from the other directory at all.

You could try changing sys.path within your script before each import operation, but that’s both extra work and highly error prone. And changing PYTHONPATH before each Python program run is too tedious, and won’t allow you to use both versions in a single file in an event. By default, you’re stuck.

This is the issue that packages actually fix. Rather than installing programs in independent directories listed on the module search path individually, you can package and install them as subdirectories under a common root. For instance, you might organize all the code in this example as an install hierarchy that looks like this:

root\

    system1\

        __init__.py

        utilities.py

        main.py

        other.py

    system2\

        __init__.py

        utilities.py

        main.py

        other.py

    system3\                    # Here or elsewhere

        __init__.py             # Need __init__.py here only if imported elsewhere

        myfile.py               # Your new code here

Now, add just the common root directory to your search path. If your code’s imports are all relative to this common root, you can import either system’s utility file with a package import—the enclosing directory name makes the path (and hence, the module reference) unique. In fact, you can import both utility files in the same module, as long as you use an import statement and repeat the full path each time you reference the utility modules:

import system1.utilities

import system2.utilities

system1.utilities.function('spam')

system2.utilities.function('eggs')

The names of the enclosing directories here make the module references unique.

Note that you have to use import instead of from with packages only if you need to access the same attribute name in two or more paths. If the name of the called function here were different in each path, you could use from statements to avoid repeating the full package path whenever you call one of the functions, as described earlier; the as extension in from can also be used to provide unique synonyms.

Also, notice in the install hierarchy shown earlier that __init__.py files were added to the system1 and system2 directories to make this work, but not to the root directory. Only directories listed within import statements in your code require these files; as we’ve seen, they are run automatically the first time the Python process imports through a package directory.

Technically, in this case the system3 directory doesn’t have to be under root—just the packages of code from which you will import. However, because you never know when your own modules might be useful in other programs, you might as well place them under the common root directory as well to avoid similar name-collision problems in the future.

Finally, notice that both of the two original systems’ imports will keep working unchanged. Because their home directories are searched first, the addition of the common root on the search path is irrelevant to code in system1 and system2; they can keep saying just import utilities and expect to find their own files when run as programs—though not when used as packages in 3.X, as the next section explains. If you’re careful to unpack all your Python systems under a common root like this, path configuration also becomes simple: you’ll only need to add the common root directory once.

WHY YOU WILL CARE: MODULE PACKAGES

Because packages are a standard part of Python, it’s common to see larger third-party extensions shipped as sets of package directories, rather than flat lists of modules. The win32all Windows extensions package for Python, for instance, was one of the first to jump on the package bandwagon. Many of its utility modules reside in packages imported with paths. For instance, to load client-side COM tools, you use a statement like this:

from win32com.client import constants, Dispatch

This line fetches names from the client module of the win32com package—an install subdirectory.

Package imports are also pervasive in code run under the Jython Java-based implementation of Python, because Java libraries are organized into hierarchies as well. In recent Python releases, the email and XML tools are likewise organized into package subdirectories in the standard library, and Python 3.X groups even more related modules into packages—including tkinter GUI tools, HTTP networking tools, and more. The following imports access various standard library tools in 3.X (2.X usage may vary):

from email.message import Message

from tkinter.filedialog import askopenfilename

from http.server import CGIHTTPRequestHandler

Whether you create package directories or not, you will probably import from them eventually.

Package Relative Imports

The coverage of package imports so far has focused mostly on importing package files from outside the package. Within the package itself, imports of same-package files can use the same full path syntax as imports from outside the package—and as we’ll see, sometimes should. However, package files can also make use of special intrapackage search rules to simplify import statements. That is, rather than listing package import paths, imports within the package can be relative to the package.

The way this works is version-dependent: Python 2.X implicitly searches package directories first on imports, while 3.X requires explicit relative import syntax in order to import from the package directory. This 3.X change can enhance code readability by making same-package imports more obvious, but it’s also incompatible with 2.X and may break some programs.

If you’re starting out in Python with version 3.X, your focus in this section will likely be on its new import syntax and model. If you’ve used other Python packages in the past, though, you’ll probably also be interested in how the 3.X model differs. Let’s begin our tour with the latter perspective on this topic.

NOTE

As we’ll learn in this section, use of package relative imports can actually limit your files’ roles. In short, they can no longer be used as executable program files in both 2.X and 3.X. Because of this, normal package import paths may be a better option in many cases. Still, this feature has found its way into many a Python file, and merits a review by most Python programmers to better understand both its tradeoffs and motivation.

Changes in Python 3.X

The way import operations in packages work has changed slightly in Python 3.X. This change applies only to imports within files when files are used as part of a package directory; imports in other usage modes work as before. For imports in packages, though, Python 3.X introduces two changes:

§  It modifies the module import search path semantics to skip the package’s own directory by default. Imports check only paths on the sys.path search path. These are known as absolute imports.

§  It extends the syntax of from statements to allow them to explicitly request that imports search the package’s directory only, with leading dots. This is known as relative import syntax.

These changes are fully present in Python 3.X. The new from statement relative syntax is also available in Python 2.X, but the default absolute search path change must be enabled as an option there. Enabling this can break 2.X programs, but is available for 3.X forward compatibility.

The impact of this change is that in 3.X (and optionally in 2.X), you must generally use special from dotted syntax to import modules located in the same package as the importer, unless your imports list a complete path relative to a package root on sys.path, or your imports are relative to the always-searched home directory of the program’s top-level file (which is usually the current working directory).

By default, though, your package directory is not automatically searched, and intrapackage imports made by files in a directory used as a package will fail without the special from syntax. As we’ll see, in 3.X this can affect the way you will structure imports or directories for modules meant for use in both top-level programs and importable packages. First, though, let’s take a more detailed look at how this all works.

Relative Import Basics

In both Python 3.X and 2.X, from statements can now use leading dots (“.”) to specify that they require modules located within the same package (known as package relative imports), instead of modules located elsewhere on the module import search path (called absolute imports). That is:

§  Imports with dots: In both Python 3.X and 2.X, you can use leading dots in from statements’ module names to indicate that imports should be relative-only to the containing package—such imports will search for modules inside the package directory only and will not look for same-named modules located elsewhere on the import search path (sys.path). The net effect is that package modules override outside modules.

§  Imports without dots: In Python 2.X, normal imports in a package’s code without leading dots currently default to a relative-then-absolute search path order—that is, they search the package’s own directory first. However, in Python 3.X, normal imports within a package are absolute-onlyby default—in the absence of any special dot syntax, imports skip the containing package itself and look elsewhere on the sys.path search path.

For example, in both Python 3.X and 2.X a statement of the form:

from . import spam                        # Relative to this package

instructs Python to import a module named spam located in the same package directory as the file in which this statement appears. Similarly, this statement:

from .spam import name

means “from a module named spam located in the same package as the file that contains this statement, import the variable name.”

The behavior of a statement without the leading dot depends on which version of Python you use. In 2.X, such an import will still default to the original relative-then-absolute search path order (i.e., searching the package’s directory first), unless a statement of the following form is included at the top of the importing file (as its first executable statement):

from __future__ import  absolute_import   # Use 3.X relative import model in 2.X

If present, this statement enables the Python 3.X absolute-only search path change. In 3.X, and in 2.X when enabled, an import without a leading dot in the module name always causes Python to skip the relative components of the module import search path and look instead in the absolute directories that sys.path contains. For instance, in 3.X’s model, a statement of the following form will always find a string module somewhere on sys.path, instead of a module of the same name in the package:

import string                             # Skip this package's version

By contrast, without the from __future__ statement in 2.X, if there’s a local string module in the package, it will be imported instead. To get the same behavior in 3.X, and in 2.X when the absolute import change is enabled, run a statement of the following form to force a relative import:

from . import string                      # Searches this package only

This statement works in both Python 2.X and 3.X today. The only difference in the 3.X model is that it is required in order to load a module that is located in the same package directory as the file in which this appears, when the file is being used as part of a package (and unless full package paths are spelled out).

Notice that leading dots can be used to force relative imports only with the from statement, not with the import statement. In Python 3.X, the import modname statement is always absolute-only, skipping the containing package’s directory. In 2.X, this statement form still performs relative imports, searching the package’s directory first. from statements without leading dots behave the same as import statements—absolute-only in 3.X (skipping the package directory), and relative-then-absolute in 2.X (searching the package directory first).

Other dot-based relative reference patterns are possible, too. Within a module file located in a package directory named mypkg, the following alternative import forms work as described:

from .string import name1, name2          # Imports names from mypkg.string

from . import string                      # Imports mypkg.string

from .. import string                     # Imports string sibling of mypkg

To understand these latter forms better, and to justify all this added complexity, we need to take a short detour to explore the rationale behind this change.

Why Relative Imports?

Besides making intrapackage imports more explicit, this feature is designed in part to allow scripts to resolve ambiguities that can arise when a same-named file appears in multiple places on the module search path. Consider the following package directory:

mypkg\

    __init__.py

    main.py

    string.py

This defines a package named mypkg containing modules named mypkg.main and mypkg.string. Now, suppose that the main module tries to import a module named string. In Python 2.X and earlier, Python will first look in the mypkg directory to perform a relative import. It will find and import the string.py file located there, assigning it to the name string in the mypkg.main module’s namespace.

It could be, though, that the intent of this import was to load the Python standard library’s string module instead. Unfortunately, in these versions of Python, there’s no straightforward way to ignore mypkg.string and look for the standard library’s string module located on the module search path. Moreover, we cannot resolve this with full package import paths, because we cannot depend on any extra package directory structure above the standard library being present on every machine.

In other words, simple imports in packages can be both ambiguous and error-prone. Within a package, it’s not clear whether an import spam statement refers to a module within or outside the package. As one consequence, a local module or package can hide another hanging directly off ofsys.path, whether intentionally or not.

In practice, Python users can avoid reusing the names of standard library modules they need for modules of their own (if you need the standard string, don’t name a new module string!). But this doesn’t help if a package accidentally hides a standard module; moreover, Python might add a new standard library module in the future that has the same name as a module of your own. Code that relies on relative imports is also less easy to understand, because the reader may be confused about which module is intended to be used. It’s better if the resolution can be made explicit in code.

The relative imports solution in 3.X

To address this dilemma, imports run within packages have changed in Python 3.X to be absolute-only (and can be made so as an option in 2.X). Under this model, an import statement of the following form in our example file mypkg/main.py will always find a string module outside the package, via an absolute import search of sys.path:

import string                          # Imports string outside package (absolute)

A from import without leading-dot syntax is considered absolute as well:

from string import name                # Imports name from string outside package

If you really want to import a module from your package without giving its full path from the package root, though, relative imports are still possible if you use the dot syntax in the from statement:

from . import string                   # Imports mypkg.string here (relative)

This form imports the string module relative to the current package only and is the relative equivalent to the prior import example’s absolute form (both load a module as a whole). When this special relative syntax is used, the package’s directory is the only directory searched.

We can also copy specific names from a module with relative syntax:

from .string import name1, name2       # Imports names from mypkg.string

This statement again refers to the string module relative to the current package. If this code appears in our mypkg.main module, for example, it will import name1 and name2 from mypkg.string.

In effect, the “.” in a relative import is taken to stand for the package directory containing the file in which the import appears. An additional leading dot performs the relative import starting from the parent of the current package. For example, this statement:

from .. import spam                    # Imports a sibling of mypkg

will load a sibling of mypkg—i.e., the spam module located in the package’s own container directory, next to mypkg. More generally, code located in some module A.B.C can use any of these forms:

from . import D                        # Imports A.B.D     (. means A.B)

from .. import E                       # Imports A.E       (.. means A)

from .D import X                       # Imports A.B.D.X   (. means A.B)

from ..E import X                      # Imports A.E.X     (.. means A)

Relative imports versus absolute package paths

Alternatively, a file can sometimes name its own package explicitly in an absolute import statement, relative to a directory on sys.path. For example, in the following, mypkg will be found in an absolute directory on sys.path:

from mypkg import string                    # Imports mypkg.string (absolute)

However, this relies on both the configuration and the order of the module search path settings, while relative import dot syntax does not. In fact, this form requires that the directory immediately containing mypkg be included in the module search path. It probably is if mypkg is the package root (or else the package couldn’t be used from the outside in the first place!), but this directory may be nested in a much larger package tree. If mypkg isn’t the package’s root, absolute import statements must list all the directories below the package’s root entry in sys.path when naming packages explicitly like this:

from system.section.mypkg import string     # system container on sys.path only

In large or deep packages, that could be substantially more work to code than a dot:

from . import string                        # Relative import syntax

With this latter form, the containing package is searched automatically, regardless of the search path settings, search path order, and directory nesting. On the other hand, the full-path absolute form will work regardless of how the file is being used—as part of a program or package—as we’ll explore ahead.

The Scope of Relative Imports

Relative imports can seem a bit perplexing on first encounter, but it helps if you remember a few key points about them:

§  Relative imports apply to imports within packages only. Keep in mind that this feature’s module search path change applies only to import statements within module files used as part of a package—that is, intrapackage imports. Normal imports in files not used as part of a package still work exactly as described earlier, automatically searching the directory containing the top-level script first.

§  Relative imports apply to the from statement only. Also remember that this feature’s new syntax applies only to from statements, not import statements. It’s detected by the fact that the module name in a from begins with one or more dots (periods). Module names that contain embedded dots but don’t have a leading dot are package imports, not relative imports.

In other words, package relative imports in 3.X really boil down to just the removal of 2.X’s inclusive search path behavior for packages, along with the addition of special from syntax to explicitly request that relative package-only behavior be used. If you coded your package imports in the past so that they did not depend upon 2.X’s implicit relative lookup (e.g., by always spelling out full paths from a package root), this change is largely a moot point. If you didn’t, you’ll need to update your package files to use the new from syntax for local package files, or full absolute paths.

Module Lookup Rules Summary

With packages and relative imports, the module search story in Python 3.X that we have seen so far can be summarized as follows:

§  Basic modules with simple names (e.g., A) are located by searching each directory on the sys.path list, from left to right. This list is constructed from both system defaults and user-configurable settings described in Chapter 22.

§  Packages are simply directories of Python modules with a special __init__.py file, which enables A.B.C directory path syntax in imports. In an import of A.B.C, for example, the directory named A is located relative to the normal module import search of sys.path, B is another package subdirectory within A, and C is a module or other importable item within B.

§  Within a package’s files, normal import and from statements use the same sys.path search rule as imports elsewhere. Imports in packages using from statements and leading dots, however, are relative to the package; that is, only the package directory is checked, and the normalsys.path lookup is not used. In from . import A, for example, the module search is restricted to the directory containing the file in which this statement appears.

Python 2.X works the same, except that normal imports without dots also automatically search the package directory first before proceeding on to sys.path.

In sum, Python imports select between relative (in the containing directory) and absolute (in a directory on sys.path) resolutions as follows:

Dotted imports: from . import m

Are relative-only in both 2.X and 3.X

Nondotted imports: import m, from m import x

Are relative-then-absolute in 2.X, and absolute-only in 3.X

As we’ll see later, Python 3.3 adds another flavor to modules—namespace packages—which is largely disjointed from the package-relative story we’re covering here. This newer model supports package-relative imports too, and is simply a different way to construct a package. It augments the import search procedure to allow package content to be spread across multiple simple directories as a last-resort resolution. Thereafter, though, the composite package behaves the same in terms of relative import rules.

Relative Imports in Action

But enough theory: let’s run some simple code to demonstrate the concepts behind relative imports.

Imports outside packages

First of all, as mentioned previously, this feature does not impact imports outside a package. Thus, the following finds the standard library string module as expected:

C:\code> c:\Python33\python

>>> import string

>>> string

<module 'string' from 'C:\\Python33\\lib\\string.py'>

But if we add a module of the same name in the directory we’re working in, it is selected instead, because the first entry on the module search path is the current working directory (CWD):

# code\string.py

print('string' * 8)

C:\code> c:\Python33\python

>>> import string

stringstringstringstringstringstringstringstring

>>> string

<module 'string' from '.\\string.py'>

In other words, normal imports are still relative to the “home” directory (the top-level script’s container, or the directory you’re working in). In fact, package relative import syntax is not even allowed in code that is not in a file being used as part of a package:

>>> from . import string

SystemError: Parent module '' not loaded, cannot perform relative import

In this section, code entered at the interactive prompt behaves the same as it would if run in a top-level script, because the first entry on sys.path is either the interactive working directory or the directory containing the top-level file. The only difference is that the start of sys.path is an absolute directory, not an empty string:

# code\main.py

import string                                         # Same code but in a file

print(string)

C:\code> C:\python33\python main.py                   # Equivalent results in 2.X

stringstringstringstringstringstringstringstring

<module 'string' from 'c:\\code\\string.py'>

Similarly, a from . import string in this nonpackage file fails the same as it does at the interactive prompt—programs and packages are different file usage modes.

Imports within packages

Now, let’s get rid of the local string module we coded in the CWD and build a package directory there with two modules, including the required but empty test\pkg\__init__.py file. Package roots in this section are located in the CWD added automatically to sys.path, so we don’t need to set PYTHONPATH. I’ll also largely omit empty __init__.py files and most error message text for space (and non-Windows readers will have to pardon the shell commands here, and translate for your platform):

C:\code> del string*           # del __pycache__\string* for bytecode in 3.2+

C:\code> mkdir pkg

c:\code> notepad pkg\__init__.py

# code\pkg\spam.py

import eggs                    # <== Works in 2.X but not 3.X!

print(eggs.X)

# code\pkg\eggs.py

X = 99999

import string

print(string)

The first file in this package tries to import the second with a normal import statement. Because this is taken to be relative in 2.X but absolute in 3.X, it fails in the latter. That is, 2.X searches the containing package first, but 3.X does not. This is the incompatible behavior you have to be aware of in 3.X:

C:\code> c:\Python27\python

>>> import pkg.spam

<module 'string' from 'C:\Python27\lib\string.pyc'>

99999

C:\code> c:\Python33\python

>>> import pkg.spam

ImportError: No module named 'eggs'

To make this work in both 2.X and 3.X, change the first file to use the special relative import syntax, so that its import searches the package directory in 3.X too:

# code\pkg\spam.py

from . import eggs             # <== Use package relative import in 2.X or 3.X

print(eggs.X)

# code\pkg\eggs.py

X = 99999

import string

print(string)

C:\code> c:\Python27\python

>>> import pkg.spam

<module 'string' from 'C:\Python27\lib\string.pyc'>

99999

C:\code> c:\Python33\python

>>> import pkg.spam

<module 'string' from 'C:\\Python33\\lib\\string.py'>

99999

Imports are still relative to the CWD

Notice in the preceding example that the package modules still have access to standard library modules like string—their normal imports are still relative to the entries on the module search path. In fact, if you add a string module to the CWD again, imports in a package will find it there instead of in the standard library. Although you can skip the package directory with an absolute import in 3.X, you still can’t skip the home directory of the program that imports the package:

# code\string.py

print('string' * 8)

# code\pkg\spam.py

from . import eggs

print(eggs.X)

# code\pkg\eggs.py

X = 99999

import string                  # <== Gets string in CWD, not Python lib!

print(string)

C:\code> c:\Python33\python    # Same result in 2.X

>>> import pkg.spam

stringstringstringstringstringstringstringstring

<module 'string' from '.\\string.py'>

99999

Selecting modules with relative and absolute imports

To show how this applies to imports of standard library modules, reset the package again. Get rid of the local string module, and define a new one inside the package itself:

C:\code> del string*           # del __pycache__\string* for bytecode in 3.2+

# code\pkg\spam.py

import string                  # <== Relative in 2.X, absolute in 3.X

print(string)

# code\pkg\string.py

print('Ni' * 8)

Now, which version of the string module you get depends on which Python you use. As before, 3.X interprets the import in the first file as absolute and skips the package, but 2.X does not—another example of the incompatible behavior in 3.X:

C:\code> c:\Python33\python

>>> import pkg.spam

<module 'string' from 'C:\\Python33\\lib\\string.py'>

C:\code> c:\Python27\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

<module 'pkg.string' from 'pkg\string.py'>

Using relative import syntax in 3.X forces the package to be searched again, as it is in 2.X—by using absolute or relative import syntax in 3.X, you can either skip or select the package directory explicitly. In fact, this is the use case that the 3.X model addresses:

# code\pkg\spam.py

from . import string           # <== Relative in both 2.X and 3.X

print(string)

# code\pkg\string.py

print('Ni' * 8)

C:\code> c:\Python33\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

<module 'pkg.string' from '.\\pkg\\string.py'>

C:\code> c:\Python27\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

<module 'pkg.string' from 'pkg\string.py'>

Relative imports search packages only

It’s also important to note that relative import syntax is really a binding declaration, not just a preference. If we delete the string.py file and any associated byte code in this example now, the relative import in spam.py fails in both 3.X and 2.X, instead of falling back on the standard library (or any other) version of this module:

# code\pkg\spam.py

from . import string           # <== Fails in both 2.X and 3.X if no string.py here!

C:\code> del pkg\string*

C:\code> C:\python33\python

>>> import pkg.spam

ImportError: cannot import name string

C:\code> C:\python27\python

>>> import pkg.spam

ImportError: cannot import name string

Modules referenced by relative imports must exist in the package directory.

Imports are still relative to the CWD, again

Although absolute imports let you skip package modules this way, they still rely on other components of sys.path. For one last test, let’s define two string modules of our own. In the following, there is one module by that name in the CWD, one in the package, and another in the standard library:

# code\string.py

print('string' * 8)

# code\pkg\spam.py

from . import string           # <== Relative in both 2.X and 3.X

print(string)

# code\pkg\string.py

print('Ni' * 8)

When we import the string module with relative import syntax like this, we get the version in the package in both 2.X and 3.X, as desired:

C:\code> c:\Python33\python    # Same result in 2.X

>>> import pkg.spam

NiNiNiNiNiNiNiNi

<module 'pkg.string' from '.\\pkg\\string.py'>

When absolute syntax is used, though, the module we get varies per version again. 2.X interprets this as relative to the package first, but 3.X makes it “absolute,” which in this case really just means it skips the package and loads the version relative to the CWD—not the version in thestandard library:

# code\string.py

print('string' * 8)

# code\pkg\spam.py

import string                  # <== Relative in 2.X, "absolute" in 3.X: CWD!

print(string)

# code\pkg\string.py

print('Ni' * 8)

C:\code> c:\Python33\python

>>> import pkg.spam

stringstringstringstringstringstringstringstring

<module 'string' from '.\\string.py'>

C:\code> c:\Python27\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

<module 'pkg.string' from 'pkg\string.pyc'>

As you can see, although packages can explicitly request modules within their own directories with dots, their “absolute” imports are otherwise still relative to the rest of the normal module search path. In this case, a file in the program using the package hides the standard library module the package may want. The change in 3.X simply allows package code to select files either inside or outside the package (i.e., relatively or absolutely). Because import resolution can depend on an enclosing context that may not be foreseen, though, absolute imports in 3.X are not a guarantee of finding a module in the standard library.

Experiment with these examples on your own for more insight. In practice, this is not usually as ad hoc as it might seem: you can generally structure your imports, search paths, and module names to work the way you wish during development. You should keep in mind, though, that imports in larger systems may depend upon context of use, and the module import protocol is part of a successful library’s design.

Pitfalls of Package-Relative Imports: Mixed Use

Now that you’ve learned about package-relative imports, you should also keep in mind that they may not always be your best option. Absolute package imports, with a complete directory path relative to a directory on sys.path, are still sometimes preferred over both implicit package-relative imports in Python 2.X, and explicit package-relative import dot syntax in both Python 2.X and 3.X. This issue may seem obscure, but will likely become important fairly soon after you start coding packages of your own.

As we’ve seen, Python 3.X’s relative import syntax and absolute search rule default make intrapackage imports explicit and thus easier to notice and maintain, and allow explicit choice in some name conflict scenarios. However, there are also two major ramifications of this model that you should be aware of:

§  In both Python 3.X and 2.X, use of package-relative import statements implicitly binds a file to a package directory and role, and precludes it from being used in other ways.

§  In Python 3.X, the new relative search rule change means that a file can no longer serve as both script and package module as easily as it could in 2.X.

These constraint’s causes are a bit subtle, but because the following are simultaneously true:

§  Python 3.X and 2.X do not allow from . relative syntax to be used unless the importer is being used as part of a package (i.e., is being imported from somewhere else).

§  Python 3.X does not search a package module’s own directory for imports, unless from . relative syntax is used (or the module is in the current working directory or main script’s home directory).

Use of relative imports prevents you from creating directories that serve as both executable programs and externally importable packages in 3.X and 2.X. Moreover, some files can no longer serve as both script and package module in 3.X as they could in 2.X. In terms of import statements, the rules pan out as follows—the first is for package mode only in both Pythons, and the second is for program mode only in 3.X:

from . import mod      # Not allowed in nonpackage mode in both 2.X and 3.X

import mod             # Does not search file's own directory in package mode in 3.X

The net effect is that for files to be used in either 2.X or 3.X, you may need to choose a single usage mode—package (with relative imports) or program (with simple imports), and isolate true package module files in a subdirectory apart from top-level script files.

Alternatively, you can attempt manual sys.path changes (a generally brittle and error-prone task), or always use full package paths in absolute imports instead of either package-relative syntax or simple imports, and assume the package root is on the module search path:

from system.section.mypkg import mod   # Works in both program and package mode

Of all these schemes, the last—full package path imports—may be the most portable and functional, but we need to turn to more concrete code to see why.

The issue

For example, in Python 2.X it’s common to use the same single directory as both program and package, using normal undotted imports. This relies on the script’s home directory to resolve imports when used as a program, and the 2.X relative-then-absolute rule to resolve intrapackage imports when used as a package. This won’t quite work in 3.X, though—in package mode, plain imports do not load modules in the same directory anymore, unless that directory also happens to be the same as the main file’s container or the current working directory (and hence, be onsys.path).

Here’s what this looks like in action, stripped to a bare minimum of code (for brevity in this section I again omit __init__.py package directory files required prior to Python 3.3, and for variety use the 3.3 Windows launcher covered in Appendix B):

# code\pkg\main.py

import spam

# code\pkg\spam.py

import eggs                     # <== Works if in "." = home of main script file

# code\pkg\eggs.py

print('Eggs' * 4)               # But won't load this file when used as pkg in 3.X!

c:\code> python pkg\main.py     # OK as program, in both 2.X and 3.X

EggsEggsEggsEggs

c:\code> python pkg\spam.py

EggsEggsEggsEggs

c:\code> py −2                  # OK as package in 2.X: relative-then-absolute

>>> import pkg.spam             # 2.X: plain imports search package directory first

EggsEggsEggsEggs

C:\code> py −3                  # But 3.X fails to find file here: absolute only

>>> import pkg.spam             # 3.X: plain imports search only CWD plus sys.path

ImportError: No module named 'eggs'

Your next step might be to add the required relative import syntax for 3.X use, but it won’t help here. The following retains the single directory for both a main top-level script and package modules, and adds the required dots—in both 2.X and 3.X this now works when the directory is imported as a package, but fails when it is used as a program directory (including attempts to run a module as a script directly):

# code\pkg\main.py

import spam

# code\pkg\spam.py

from . import eggs              # <== Not a package if main file here (even if me)!

# code\pkg\eggs.py

print('Eggs' * 4)

c:\code> python                 # OK as package but not program in both 3.X and 2.X

>>> import pkg.spam

EggsEggsEggsEggs

c:\code> python pkg\main.py

SystemError: ... cannot perform relative import

c:\code> python pkg\spam.py

SystemError: ... cannot perform relative import

Fix 1: Package subdirectories

In a mixed-use case like this, one solution is to isolate all but the main files used only by the program in a subdirectory—this way, your intrapackage imports still work in all Pythons, you can use the top directory as a standalone program, and the nested directory still serves as a package for use from other programs:

# code\pkg\main.py

import sub.spam                 # <== Works if move modules to pkg below main file

# code\pkg\sub\spam.py

from . import eggs              # Package relative works now: in subdirectory

# code\pkg\sub\eggs.py

print('Eggs' * 4)

c:\code> python pkg\main.py     # From main script: same result in 2.X and 3.X

EggsEggsEggsEggs

c:\code> python                 # From elsewhere: same result in 2.X and 3.X

>>> import pkg.sub.spam

EggsEggsEggsEggs

The potential downside of this scheme is that you won’t be able to run package modules directly to test them with embedded self-test code, though tests can be coded separately in their parent directory instead:

c:\code> py −3 pkg\sub\spam.py  # But individual modules can't be run to test

SystemError: ... cannot perform relative import

Fix 2: Full path absolute import

Alternatively, full path package import syntax would address this case too—it requires the directory above the package root to be in your path, though this is probably not an extra requirement for a realistic software package. Most Python packages will either require this setting, or arrange for it to be handled automatically with install tools (such as distutils, which may store a package’s code in a directory on the default module search path such as the site-packages root; see Chapter 22 for more details):

# code\pkg\main.py

import spam

# code\pkg\spam.py

import pkg.eggs                 # <== Full package paths work in all cases, 2.X+3.X

# code\pkg\eggs.py

print('Eggs' * 4)

c:\code> set PYTHONPATH=C:\code

c:\code> python pkg\main.py     # From main script: Same result in 2.X and 3.X

EggsEggsEggsEggs

c:\code> python                 # From elsewhere: Same result in 2.X and 3.X

>>> import pkg.spam

EggsEggsEggsEggs

Unlike the subdirectory fix, full path absolute imports like these also allow you to run your modules standalone to test:

c:\code> python pkg\spam.py     # Individual modules are runnable too in 2.X and 3.X

EggsEggsEggsEggs

Example: Application to module self-test code (preview)

To summarize, here’s another typical example of the issue and its full path resolution. This uses a common technique we’ll expand on in the next chapter, but the idea is simple enough to include as a preview here (though you may want to review this again later—the coverage makes more sense here).

Consider the following two modules in a package directory, the second of which includes self-test code. In short, a module’s __name__ attribute is the string “__main__” when it is being run as a top-level script, but not when it is being imported, which allows it to be used as both module and script:

# code\dualpkg\m1.py

def somefunc():

    print('m1.somefunc')

# code\dualpkg\m2.py

...import m1 here...            # Replace me with a real import statement

def somefunc():

    m1.somefunc()

    print('m2.somefunc')

if __name__ == '__main__':

   somefunc()                   # Self-test or top-level script usage mode code

The second of these needs to import the first where the “...import m1 here...” placeholder appears. Replacing this line with a relative import statement works when the file is used as a package, but is not allowed in nonpackage mode by either 2.X or 3.X (results and error messages are omitted here for space; see the file dualpkg\results.txt in the book’s examples for the full listing):

# code\dualpkg\m2.py

from . import m1

c:\code> py −3

>>> import dualpkg.m2           # OK

C:\code> py −2

>>> import dualpkg.m2           # OK

c:\code> py −3 dualpkg\m2.py    # Fails!

c:\code> py −2 dualpkg\m2.py    # Fails!

Conversely, a simple import statement works in nonpackage mode in both 2.X and 3.X, but fails in package mode in 3.X only, because such statements do not search the package directory in 3.X:

# code\dualpkg\m2.py

import m1

c:\code> py −3

>>> import dualpkg.m2           # Fails!

c:\code> py −2

>>> import dualpkg.m2           # OK

c:\code> py −3 dualpkg\m2.py    # OK

c:\code> py −2 dualpkg\m2.py    # OK

And finally, using full package paths works again in both usage modes and Pythons, as long as the package’s root is on the module search path (as it must be to be used elsewhere):

# code\dualpkg\m2.py

import dualpkg.m1 as m1         # And: set PYTHONPATH=c:\code

c:\code> py −3

>>> import dualpkg.m2           # OK

C:\code> py −2

>>> import dualpkg.m2           # OK

c:\code> py −3 dualpkg\m2.py    # OK

c:\code> py −2 dualpkg\m2.py    # OK

In sum, unless you’re willing and able to isolate your modules in subdirectories below scripts, full package path imports are probably preferable to package-relative imports—though they’re more typing, they handle all cases, and they work the same in 2.X and 3.X. There may be additional workarounds that involve extra tasks (e.g., manually setting sys.path in your code), but we’ll skip them here because they are more obscure and rely on import semantics, which is error-prone; full package imports rely only on the basic package mechanism.

Naturally, the extent to which this may impact your modules can vary per package; absolute imports may also require changes when directories are reorganized, and relative imports may become invalid if a local module is relocated.

NOTE

Be sure to also watch for future Python changes on this front. Although this book covers Python up to 3.3 only, at this writing, there is talk in a PEP of possibly addressing some package issues in Python 3.4, perhaps even allowing relative imports to be used in program mode. On the other hand, this initiative’s scope and outcome is uncertain and would work only on 3.4 and later; the full path solution given here is version-neutral; and 3.4 is more than a year away in any event. That is, you can wait for a change to a 3.X change that limited functionality, or simply use tried-and-true full package paths.

Python 3.3 Namespace Packages

Now that you’ve learned all about package and package-relative imports, I need to explain that there’s a new option that modifies some of the ideas we just covered. At least abstractly, as of release 3.3 Python has four import models. From original to newest:

Basic module imports: import mod, from mod import attr

The original model: imports of files and their contents, relative to the sys.path module search path

Package imports: import dir1.dir2.mod, from dir1.mod import attr

Imports that give directory path extensions relative to the sys.path module search path, where each package is contained in a single directory and has an initialization file, in Python 2.X and 3.X

Package-relative imports: from . import mod (relative), import mod (absolute)

The model used for intrapackage imports of the prior section, with its relative or absolute lookup schemes for dotted and nondotted imports, available but differing in Python 2.X and 3.X

Namespace packages: import splitdir.mod

The new namespace package model that we’ll survey here, which allows packages to span multiple directories, and requires no initialization file, introduced in Python 3.3

The first two of these are self-contained, but the third tightens up the search order and extends syntax for intrapackage imports, and the fourth upends some of the core notions and requirements of the prior package model. In fact, Python 3.3 (and later) now has two flavors of packages:

§  The original model, now known as regular packages

§  The alternative model, known as namespace packages

This is similar in spirit to the “classic” and “new style” class model dichotomy we’ll meet in the next part of this book, though the new is more an addition to the old here. The original and new package models are not mutually exclusive, and can be used simultaneously in the same program. In fact, the new namespace package model works as something of a fallback option, recognized only if normal modules and regular packages of the same name are not present on the module search path.

The rationale for namespace packages is rooted in package installation goals that may seem obscure unless you are responsible for such tasks, and is better addressed by this feature’s PEP document. In short, though, they resolve a potential for collision of multiple __init__.py files when package parts are merged, by removing this file completely. Moreover, by providing standard support for packages that can be split across multiple directories and located in multiple sys.path entries, namespace packages both enhance install flexibility and provide a common mechanism to replace the multiple incompatible solutions that have arisen to address this goal.

Though too early to judge their uptake, average Python users may find namespace packages to be a useful and alternative extension to the regular package model—one that does not require initialization files, and allows any directory of code to be used as an importable package. To see why, let’s move on to the details.

Namespace Package Semantics

A namespace package is not fundamentally different from a regular package; it is just a different way of creating packages. Moreover, they are still relative to sys.path at the top level: the leftmost component of a dotted namespace package path must still be located in an entry on the normal module search path.

In terms of physical structure, though, the two can differ substantially. Regular packages still must have an __init__.py file that is run automatically, and reside in a single directory as before. By contrast, new-style namespace packages cannot contain an __init__.py, and may span multiple directories that are collected at import time. In fact, none of the directories that make up a namespace package can have an __init__.py, but the content nested within each of them is treated as a single package.

The import algorithm

To truly understand namespace packages, we have to look under the hood to see how the import operation works in 3.3. During imports, Python still iterates over each directory in the module search path, sys.path, just as in 3.2 and earlier. In 3.3, though, while looking for an imported module or package named spam, for each directory in the module search path, Python tests for a wider variety of matching criteria, in the following order:

1.    If directory\spam\__init__.py is found, a regular package is imported and returned.

2.    If directory\spam.{py, pyc, or other module extension} is found, a simple module is imported and returned.

3.    If directory\spam is found and is a directory, it is recorded and the scan continues with the next directory in the search path.

4.    If none of the above was found, the scan continues with the next directory in the search path.

If the search path scan completes without returning a module or package by steps 1 or 2, and at least one directory was recorded by step 3, then a namespace package is created.

The creation of the namespace package happens immediately, and is not deferred until a sublevel import occurs. The new namespace package has a __path__ attribute set to an iterable of the directory path strings that were found and recorded during the scan by step 3, but does not have a__file__.

The __path__ attribute is then used in later, deeper accesses to search all package components—each recorded entry on a namespace package’s __path__ is searched whenever further nested items are requested, much like the sole directory of a regular package.

Viewed another way, the __path__ attribute of a namespace package serves the same role for lower-level components that sys.path does at the top for the leftmost component of package import paths; it becomes the “parent path” for accessing lower items using the same four-step procedure just sketched.

The net result is that a namespace package is a sort of virtual concatenation of directories located via multiple sys.path entries. Once a namespace package is created, though, there is no functional difference between it and a regular package; it supports everything we’ve learned for regular packages, including package-relative import syntax.

Impacts on Regular Packages: Optional __init__.py

As one consequence of this new import procedure, as of Python 3.3 packages no longer require __init__.py files—when a single-directory package does not have this file, it will be treated as a single-directory namespace package, and no warning will be issued. This is a major relaxation of prior rules, but a commonly requested change; many packages require no initialization code, and it seemed extraneous to have to create an empty initialization file in such cases. This is finally no longer required as of 3.3.

At the same time, the original regular package model is still fully supported, and automatically runs code in __init__.py as before as an initialization hook. Moreover, when it’s known that a package will never be a portion of a split namespace package, there is a performance advantage to coding it as a regular package with an __init__.py. Creation and loading of a regular package occurs immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created. More formally, regular packages stop the prior section’s algorithm at step 1; namespace packages do not.

Per this change’s PEP, there is no plan to remove support of regular packages—at least, that’s the story today; change is always a possibility in open source projects (indeed, the prior edition quoted plans on string formatting and relative imports in 2.X that were later abandoned), so as usual, be sure to watch for future developments on this front. Given the performance advantage and auto-initialization code of regular packages, though, it seems unlikely that they would be removed altogether.

Namespace Packages in Action

To see how namespace packages work, consider the following two modules and nested directory structure—with two subdirectories named sub located in different parent directories, dir1 and dir2:

C:\code\ns\dir1\sub\mod1.py

C:\code\ns\dir2\sub\mod2.py

If we add both dir1 and dir2 to the module search path, sub becomes a namespace package spanning both, with the two module files available under that name even though they live in separate physical directories. Here’s the files’ contents and the required path settings on Windows: there are no __init__.py files here—in fact there cannot be in namespace packages, as this is their chief physical differentiation:

c:\code> mkdir ns\dir1\sub                # Two dirs of same name in different dirs

c:\code> mkdir ns\dir2\sub                # And similar outside Windows

c:\code> type ns\dir1\sub\mod1.py         # Module files in different directories

print(r'dir1\sub\mod1')

c:\code> type ns\dir2\sub\mod2.py

print(r'dir2\sub\mod2')

c:\code> set PYTHONPATH=C:\code\ns\dir1;C:\code\ns\dir2

Now, when imported directly in 3.3 and later, the namespace package is the virtual concatenation of its individual directory components, and allows further nested parts to be accessed through its single, composite name with normal imports:

c:\code> C:\Python33\python

>>> import sub

>>> sub                                   # Namespace packages: nested search paths

<module 'sub' (namespace)>

>>> sub.__path__

_NamespacePath(['C:\\code\\ns\\dir1\\sub', 'C:\\code\\ns\\dir2\\sub'])

>>> from sub import mod1

dir1\sub\mod1

>>> import sub.mod2                       # Content from two different directories

dir2\sub\mod2

>>> mod1

<module 'sub.mod1' from 'C:\\code\\ns\\dir1\\sub\\mod1.py'>

>>> sub.mod2

<module 'sub.mod2' from 'C:\\code\\ns\\dir2\\sub\\mod2.py'>

This is also true if we import through the namespace package name immediately—because the namespace package is made when first reached, the timing of path extensions is irrelevant:

c:\code> C:\Python33\python

>>> import sub.mod1

dir1\sub\mod1

>>> import sub.mod2                       # One package spanning two directories

dir2\sub\mod2

>>> sub.mod1

<module 'sub.mod1' from 'C:\\code\\ns\\dir1\\sub\\mod1.py'>

>>> sub.mod2

<module 'sub.mod2' from 'C:\\code\\ns\\dir2\\sub\\mod2.py'>

>>> sub

<module 'sub' (namespace)>

>>> sub.__path__

_NamespacePath(['C:\\code\\ns\\dir1\\sub', 'C:\\code\\ns\\dir2\\sub'])

Interestingly, relative imports work in namespace packages too—in the following, the relative import statement references a file in the package, even though the referenced file resides in a different directory:

c:\code> type ns\dir1\sub\mod1.py

from . import mod2                        # And "from . import string" still fails

print(r'dir1\sub\mod1')

c:\code> C:\Python33\python

>>> import sub.mod1                       # Relative import of mod2 in another dir

dir2\sub\mod2

dir1\sub\mod1

>>> import sub.mod2                       # Already imported module not rerun

>>> sub.mod2

<module 'sub.mod2' from 'C:\\code\\ns\\dir2\\sub\\mod2.py'>

As you can see, namespace packages are like ordinary single-directory packages in every way, except for having a split physical storage—which is why single directory namespaces packages without __init__.py files are exactly like regular packages, but with no initialization logic to be run.

Namespace Package Nesting

Namespace packages even support arbitrary nesting—once a package namespace package is created, it serves essentially the same role at its level that sys.path does at the top, becoming the “parent path” for lower levels. Continuing the prior section’s example:

c:\code> mkdir ns\dir2\sub\lower          # Further nested components

c:\code> type  ns\dir2\sub\lower\mod3.py

print(r'dir2\sub\lower\mod3')

c:\code> C:\Python33\python

>>> import sub.lower.mod3                 # Namespace pkg nested in namespace pkg

dir2\sub\lower\mod3

c:\code> C:\Python33\python

>>> import sub                            # Same effect if accessed incrementally

>>> import sub.mod2

dir2\sub\mod2

>>> import sub.lower.mod3

dir2\sub\lower\mod3

>>> sub.lower                             # A single-directory namespace pkg

<module 'sub.lower' (namespace)>

>>> sub.lower.__path__

_NamespacePath(['C:\\code\\ns\\dir2\\sub\\lower'])

In the preceding, sub is a namespace package split across two directories, and sub.lower is a single-directory namespace package nested within the portion of sub physically located in dir2. sub.lower is also the namespace package equivalent of a regular package with no __init__.py.

This nesting behavior holds true whether the lower component is a module, regular package, or another namespace package—by serving as new import search paths, namespace packages allow all three to be nested within them freely:

c:\code> mkdir ns\dir1\sub\pkg

C:\code> type  ns\dir1\sub\pkg\__init__.py

print(r'dir1\sub\pkg\__init__.py')

c:\code> C:\Python33\python

>>> import sub.mod2                       # Nested module

dir2\sub\mod2

>>> import sub.pkg                        # Nested regular package

dir1\sub\pkg\__init__.py

>>> import sub.lower.mod3                 # Nested namespace package

dir2\sub\lower\mod3

>>> sub                                   # Modules, packages,and namespaces

<module 'sub' (namespace)>

>>> sub.mod2

<module 'sub.mod2' from 'C:\\code\\ns\\dir2\\sub\\mod2.py'>

>>> sub.pkg

<module 'sub.pkg' from 'C:\\code\\ns\\dir1\\sub\\pkg\\__init__.py'>

>>> sub.lower

<module 'sub.lower' (namespace)>

>>> sub.lower.mod3

<module 'sub.lower.mod3' from 'C:\\code\\ns\\dir2\\sub\\lower\\mod3.py'>

Trace through this example’s files and directories for more insight. As you can see, namespace packages integrate seamlessly into the former import models, and extend it with new functionality.

Files Still Have Precedence over Directories

As explained earlier, part of the purpose of __init___.py files in regular packages is to declare the directory as a package—it tells Python to use the directory, rather than skipping ahead to a possible file of the same name later on the path. This avoids inadvertently choosing a noncode subdirectory that accidentally appears early on the path, over a desired module of the same name.

Because namespace packages do not require these special files, they would seem to invalidate this safeguard. This isn’t the case, though—because the namespace algorithm outlined earlier continues scanning the path after a namespace directory has been found, files later on the path still have priority over earlier directories with no __init__.py. For example, consider the following directories and modules:

c:\code> mkdir ns2

c:\code> mkdir ns3

c:\code> mkdir   ns3\dir

c:\code> notepad ns3\dir\ns2.py

c:\code> type    ns3\dir\ns2.py

print(r'ns3\dir\ns2.py!')

The ns2 directory here cannot be imported in Python 3.2 and earlier—it’s not a regular package, as it lacks an __init__.py initialization file. This directory can be imported under 3.3, though—it’s a namespace package directory in the current working directory, which is always the first item on the sys.path module search path irrespective of PYTHONPATH settings:

c:\code> set PYTHONPATH=

c:\code> py −3.2

>>> import ns2

ImportError: No module named ns2

c:\code> py −3.3

>>> import ns2

>>> ns2                         # A single-directory namespace package in CWD

<module 'ns2' (namespace)>

>>> ns2.__path__

_NamespacePath(['.\\ns2'])

But watch what happens when the directory containing a file of the same name as a namespace directory is added later on the search path, via PYTHONPATH settings—the file is used instead, because Python keeps searching later path entries after a namespace package directory is found. It stops searching only when a module or regular package is located, or the path has been completely scanned. Namespace packages are returned only if nothing else was found along the way:

c:\code> set PYTHONPATH=C:\code\ns3\dir

c:\code> py −3.3

>>> import ns2                  # Use later module file, not same-named directory!

ns3\dir\ns2.py!

>>> ns2

<module 'ns2' from 'C:\\code\\ns3\\dir\\ns2.py'>

>>> import sys

>>> sys.path[:2]                # First '' means current working directory, CWD

['', 'C:\\code\\ns3\\dir']

In fact, setting the path to include a module works the same as it does in earlier Pythons, even if a same-named namespace directory appears earlier on the path; namespace packages are used in 3.3 only in cases that would be errors in earlier Pythons:

c:\code> py −3.2

>>> import ns2

ns3\dir\ns2.py!

>>> ns2

<module 'ns2' from 'C:\code\ns3\dir\ns2.py'>

This is also why none of the directories in a namespace package is allowed to have a __init__.py file: as soon as the import algorithm finds one that does, it returns a regular package immediately, and abandons the path search and the namespace package. Put more formally, the import algorithm chooses a namespace package only at the end of the path scan, and stops at steps 1 or 2 if either a regular package or module file is found sooner.

The net effect is that both module files and regular packages anywhere on the module search path have precedence over namespace package directories. In the following, for example, a namespace package called sub exists as the concatenation of same-named directories under dir1 anddir2 on the path:

c:\code> mkdir ns4\dir1\sub

c:\code> mkdir ns4\dir2\sub

c:\code> set PYTHONPATH=c:\code\ns4\dir1;c:\code\ns4\dir2

c:\code> py −3

>>> import sub

>>> sub

<module 'sub' (namespace)>

>>> sub.__path__

_NamespacePath(['c:\\code\\ns4\\dir1\\sub', 'c:\\code\\ns4\\dir2\\sub'])

Much like a module file, though, a regular package added in the rightmost path entry takes priority over same-named namespace package directories too—the import path scan starts recording a namespace package tentatively in dir1 as before, but abandons it when the regular package is detected in dir2:

c:\code> notepad ns4\dir2\sub\__init__.py

c:\code> py −3

>>> import sub                  # Use later reg. package, not same-named directory!

>>> sub

<module 'sub' from 'c:\\code\\ns4\\dir2\\sub\\__init__.py'>

Though a useful extension, because namespace packages are available only to readers using Python 3.3 (and later) I’m going to defer to Python’s manuals for more details on the subject. See especially this change’s PEP document for this change’s rationale, additional details, and more comprehensive examples.

Chapter Summary

This chapter introduced Python’s package import model—an optional but useful way to explicitly list part of the directory path leading up to your modules. Package imports are still relative to a directory on your module import search path, but your script gives the rest of the path to the module explicitly.

As we’ve seen, packages not only make imports more meaningful in larger systems, but also simplify import search path settings if all cross-directory imports are relative to a common root directory, and resolve ambiguities when there is more than one module of the same name—including the name of the enclosing directory in a package import helps distinguish between them.

Because it’s relevant only to code in packages, we also explored the newer relative import model here—a way for imports in package files to select modules in the same package explicitly using leading dots in a from, instead of relying on an older and error-prone implicit package search rule. Finally, we surveyed Python 3.3 namespace packages, which allow a logical package to span multiple physical directories as a fallback option of import searches, and remove the initialization file requirements of the prior model.

In the next chapter, we will survey a handful of more advanced module-related topics, such as the __name__ usage mode variable and name-string imports. As usual, though, let’s close out this chapter first with a short quiz to review what you’ve learned here.

Test Your Knowledge: Quiz

1.    What is the purpose of an __init__.py file in a module package directory?

2.    How can you avoid repeating the full package path every time you reference a package’s content?

3.    Which directories require __init__.py files?

4.    When must you use import instead of from with packages?

5.    What is the difference between from mypkg import spam and from . import spam?

6.    What is a namespace package?

Test Your Knowledge: Answers

1.    The __init__.py file serves to declare and initialize a regular module package; Python automatically runs its code the first time you import through a directory in a process. Its assigned variables become the attributes of the module object created in memory to correspond to that directory. It is also not optional until 3.3 and later—you can’t import through a directory with package syntax unless it contains this file.

2.    Use the from statement with a package to copy names out of the package directly, or use the as extension with the import statement to rename the path to a shorter synonym. In both cases, the path is listed in only one place, in the from or import statement.

3.    In Python 3.2 and earlier, each directory listed in an executed import or from statement must contain an __init__.py file. Other directories, including the directory that contains the leftmost component of a package path, do not need to include this file.

4.    You must use import instead of from with packages only if you need to access the same name defined in more than one path. With import, the path makes the references unique, but from allows only one version of any given name (unless you also use the as extension to rename).

5.    In Python 3.X, from mypkg import spam is an absolute import—the search for mypkg skips the package directory and the module is located in an absolute directory in sys.path. A statement from . import spam, on the other hand, is a relative import—spam is looked up relative to the package in which this statement is contained only. In Python 2.X, the absolute import searches the package directory first before proceeding to sys.path; relative imports work as described.

6.    A namespace package is an extension to the import model, available in Python 3.3 and later, that corresponds to one or more directories that do not have __init__.py files. When Python finds these during an import search, and does not find a simple module or regular package first, it creates a namespace package that is the virtual concatenation of all found directories having the requested module name. Further nested components are looked up in all the namespace package’s directories. The effect is similar to a regular package, but content may be split across multiple directories.