Cython (2015)

Chapter 7. Wrapping C Libraries with Cython

Controlling complexity is the essence of computer programming.

— B. Kernighan

We have seen how Cython can take Python code and improve its performance with ahead-of-time compilation. This chapter will focus on the inverse: starting with a C library, how do we make it accessible to Python? Such a task is typically the domain of specialized tools like SWIG, SIP, Boost.Python, ctypes, cffi, or others. Cython, while not automating the process like some, provides the capability to wrap external libraries in a straightforward way. Cython also makes C-level Cython constructs available to external C code, which can be useful when we are embedding Python in a C application, for instance.

Because Cython understands both the C and Python languages, it allows full control over all aspects during interfacing. It accomplishes this feat while remaining Python-like, making Cython interfacing code easier to understand and debug. When wrapping C libraries in Cython, we are not restricted to a domain-specific wrapping language—we can bring to bear all of the Python language, its standard library, and any third-party libraries to help us, along with all the Cython constructs we have learned about in previous chapters.

When done well, Cython-wrapped libraries have C-level performance, minimal wrapper overhead, and a Python-friendly interface. End users need never suspect they are working with wrapped code.

Declaring External C Code in Cython

To wrap a C library with Cython, we must first declare in Cython the interface of the C components we wish to use. To this end, Cython provides the extern block statement. These declaration blocks are meant to tell Cython what C constructs we wish to use from a specified C header file. Their syntax is:[13]

cdef extern from "header_name":

    indented declarations from header file

The header_name goes inside a single- or double-quoted string.

Including the extern block has the following effects:

§  The cython compiler generates an #include "header_name" line inside the generated source file.

§  The types, functions, and other declarations made in the block body are accessible from Cython code.

§  Cython will check at compile time that the C declarations are used in a type-correct manner, and will produce a compilation error if they are not.

The declarations inside the extern block have a straightforward C-like syntax for variables and functions. They use the Cython-specific syntax for declaring structs and unions covered briefly in Chapter 3.

BARE EXTERN DECLARATIONS

Cython supports the extern keyword, which can be added to any C declaration in conjunction with cdef:

cdef extern external_declaration

When we use extern in this manner, Cython will place the declaration—which can be a function signature, variable, struct, union, or other such C declaration—in the generated source code with an extern modifier. The Cython extern declaration must match the C declaration.

This style of external declarations is not recommended, as it has the same drawbacks as using extern in C directly. The extern block is preferred.

If it is necessary to have an #include preprocessor directive for a specific header file, but no declarations are required, the declaration block can be empty:

cdef extern from "header.h":

    pass

Conversely, if the name of the header file is not necessary (perhaps it is already included by another header file that has its own extern block), but we would like to interface with external code, we can suppress #include statement generation with from *:

cdef extern from *:

    declarations

Before we go into the details of the declaration block, it is important to realize what extern blocks do not do.

Cython Does Not Automate Wrapping

The purpose of the extern block is straightforward, but can be misleading at first glance. In Cython, extern blocks (and extern declarations) exist to ensure we are calling and using the declared C functions, variables, and structs in a type-correct manner. The extern block does notautomatically generate wrappers for the declared objects. As mentioned, the only C code that is generated for the entire extern block is a single #include "header.h" line. We still have to write def and cpdef (and possibly cdef) functions that call the C functions declared in theextern block. If we do not, then the external C functions declared in the extern block cannot be accessed from Python code. Cython does not parse C files and automate wrapping C libraries.

It would be nice if Cython automatically wrapped everything declared in an extern block (and there is an active project that builds on Cython to do the equivalent). Using Cython to wrap large C libraries with hundreds of functions, structs, and other constructs is a significant undertaking. Brave souls have successfully done just this for the MPI (MPI4Py), PETSc (PETSc4Py), and HDF5 (h5py) libraries, for example. They chose Cython as their wrapping tool over other options (which can automatically wrap libraries) for various reasons:

§  Cython’s generated wrapper code is highly optimized and generates wrappers that are up to an order of magnitude faster than those of other wrapping tools.

§  Often the goal is to customize, improve, simplify, or otherwise Pythonize the interface as it is wrapped, so an automated wrapping tool would not provide much gain.

§  The Cython language is a high-level, Python-like language and not limited to domain-specific interfacing commands, making complicated wrapping tasks easier.

Now that we realize what an extern block does and does not do, let’s look at the declarations in the extern block in more detail.

Declaring External C Functions and typedefs

The most common declarations placed inside an extern block are C functions and typedefs. These declarations translate almost directly from their C equivalents. Typically the only modifications necessary are to:

§  change typedef to ctypedef;

§  remove unnecessary and unsupported keywords such as restrict and volatile;

§  ensure the function’s return type and name are declared on a single line;

§  remove line-terminating semicolons.

It is possible to break up a long function declaration over several lines after the opening parenthesis of the argument list, as in Python.

For example, consider these simple C declarations and macros in the file header.h:

#define M_PI 3.1415926

#define MAX(a, b) ((a) >= (b) ? (a) : (b))

double hypot(double, double);

typedef int integral;

typedef double real;

void func(integral, integral, real);

real *func_arrays(integral[], integral[][10], real **);

The Cython declarations for them are, except for the macros, nearly copy and paste:

cdef extern from "header.h":

    double M_PI

    float MAX(float a, float b)

    double hypot(double x, double y)

    ctypedef int integral

    ctypedef double real

    void func(integral a, integral b, real c)

    real *func_arrays(integral[] i, integral[][10] j, real **k)

Note that when declaring the M_PI macro, we declare it as if it were a global variable of type double. Similarly, when declaring the MAX function-like macro, we declare it in Cython as if it were a regular C function named MAX that takes two float arguments and returns a float.

In the preceding extern block we added variable names for the function arguments. This is recommended but not mandatory: doing so allows us to call these functions with keyword arguments and, if the argument names are meaningful, helps document the interface. This is impossible if argument names are omitted.

Cython supports the full range of C declarations, even the function-pointer-returning-array-of-function-pointers variety. Of course, simple type declarations—scalars of built-in numeric types, arrays, pointers, void, and the like—form the backbone of most C declarations and compose the majority of C header files. Most of the time, we can cut and paste straightforward C function declarations into the body of the extern block, remove the semicolons, and be on our way.

As an example of a more complicated declaration that Cython handles without difficulty, consider a header file, header.h, containing a function named signal that takes a function pointer and returns a function pointer. The extern block would look like:

cdef extern from "header.h":

    void (*signal(void(*)(int)))(int)

Because Cython uses extern blocks only to check type correctness, we can add a helper ctypedef to this extern block to make signal’s declaration easier to understand:

cdef extern from "header.h":

    ctypedef void (*void_int_fptr)(int)

    void_int_fptr signal(void_int_fptr)

The second declaration is equivalent to the first but markedly easier to understand. Because Cython does not declare the void_int_ptr typedef in generated code, we can use it to help make the C declarations more straightforward. The void_int_fptr ctypedef is only a Cython declaration convenience; there is no corresponding typedef in the header file.

Declaring and Wrapping C structs, unions, and enums

To declare an external struct, union, or enum in an extern block, we use the same syntax as described in Declaring and Using structs, unions, and enums, but we can omit the cdef, as that is implied:

cdef extern from "header_name":

    struct struct_name:

        struct_members

    union union_name:

        union_members

    enum enum_name:

        enum_members

These match the following C declarations:

struct struct_name {

    struct_members

};

union union_name {

    union_members

};

enum enum_name {

    enum_members

};

Cython generates struct struct_name declarations for the struct, and the equivalent for union and enum.

For the typedefed version of these:

typedef struct struct_name {

    struct_members

} struct_alias;

typedef union union_name {

    union_members

} union_alias;

typedef enum enum_name {

    enum_members

} enum_alias;

simply prefix with ctypedef on the Cython side and use the type alias name:

cdef extern from "header_name":

    ctypedef struct struct_alias:

        struct_members

    ctypedef union union_alias:

        union_members

    ctypedef enum enum_alias:

        enum_members

In this case, Cython will use just the alias type names for declarations and will not generate the struct, union, or enum as part of the declaration, as is proper.

To statically declare a struct variable in Cython code, use cdef with the struct name or the typedef alias name; Cython will generate the right thing for us in either case.

It is only necessary to declare the fields that are actually used in the preceding struct, union, and enum declarations in Cython. If no fields are used but it is necessary to use the struct as an opaque type, then the body of the struct should be the pass statement.

Wrapping C Functions

After we have declared the external functions we want to use, we still must wrap them in a def function, a cpdef function, or a cdef class to access them from Python.

For example, say we want to wrap a simple random-number generator (RNG). We will wrap the Mersenne twister, which requires us to expose at least two functions to Python. To initialize the RNG’s state we call init_genrand; after doing so we can call genrand_real1 to get a random real number on the closed interval [0, 1]. The init_genrand function takes an unsigned long int as a seed value, and genrand_real1 takes no arguments and returns a double.

Declaring them in Cython is straightforward:

cdef extern from "mt19937ar.h":

    void init_genrand(unsigned long s)

    double genrand_real1()

We must provide def or cpdef functions so that these declarations can be called from Python:

def init_state(unsigned long s):

    init_genrand(s)

def rand():

    return genrand_real1()

To compile everything together, we can use a distutils script, which we name setup.py. We must be sure to include the mt19937ar.c source file in the sources list:

from distutils.core import setup, Extension

from Cython.Build import cythonize

ext = Extension("mt_random",

                sources=["mt_random.pyx", "mt19937ar.c"])

setup(

    name="mersenne_random",

    ext_modules = cythonize([ext])

)

Compiling is straightforward. Please see Chapter 2 for platform-specific command-line flags:

$ python setup.py build_ext --inplace

This command will generate several lines of output. If it is successful, Python’s distutils will produce an extension module named mt_random.so or mt_random.pyd, depending on whether we are on Mac OS X, Linux, or Windows.

We can use it from IPython as follows:

In [1]: import mt_random

In [2]: mt_random.init_state(42)

In [3]: mt_random.rand()

Out[3]: 0.37454011439684315

Note that we cannot call either init_genrand or genrand_real1 from Python:

In [4]: mt_random.init_genrand(42)

Traceback (most recent call last):

  File "<ipython-input-2-34528a64a483>", line 1, in <module>

    mt_random.init_genrand(42)

AttributeError: 'module' object has no attribute 'init_genrand'

In [5]: mt_random.genrand_real1()

Traceback (most recent call last):

  File "<ipython-input-3-23619324ba3f>", line 1, in <module>

    mt_random.genrand_real1()

AttributeError: 'module' object has no attribute 'genrand_real1'

In about two dozen lines of code, we have wrapped a simple random-number generator with minimal overhead. One downside of the RNG’s design is that it uses a static global array to store the RNG’s state, allowing only one RNG at a time.

In the next section, we will wrap a version of the RNG API that supports concurrent generators.

Wrapping C structs with Extension Types

The improved API first forward-declares a struct typedef in the header file:

typedef struct _mt_state mt_state;

It then declares creation and destruction functions:

mt_state *make_mt(unsigned long s);

void free_mt(mt_state *state);

The random-number-generation functions take a pointer to a heap-allocated mt_state struct as an argument. We will wrap just one of them:

double genrand_real1(mt_state *state);

The Cython extern declaration for this new interface is, again, mostly copy and paste:

cdef extern from "mt19937ar-struct.h":

    ctypedef struct mt_state

    mt_state *make_mt(unsigned long s)

    void free_mt(mt_state *state)

    double genrand_real1(mt_state *state)

Because the mt_state struct is opaque and Cython does not need to access any of its internal fields, the preceding ctypedef declaration is sufficient. Essentially, mt_state is a named placeholder.

Again, Cython exposes none of these C extern declarations to Python. In this case, it is nice to wrap this improved version in an extension type named MT. The only attribute this extension type will hold is a private pointer to an mt_state struct:

cdef class MT:

    cdef mt_state *_thisptr

Because creating an mt_state heap-allocated struct must happen at the C level before an MT object is initialized, the proper place to do it is in a __cinit__ method:

cdef class MT:

    cdef mt_state *_thisptr

    def __cinit__(self, unsigned long s):

        self._thisptr = make_mt(s)

        if self._thisptr == NULL:

            msg = "Insufficient memory."

            raise MemoryError(msg)

The corresponding __dealloc__ just forwards its work to free_mt:

cdef class MT:

    # ...

    def __dealloc__(self):

        if self._thisptr != NULL:

            free_mt(self._thisptr)

These Cython methods allow us to properly create, initialize, and finalize an MT object. To generate random numbers, we simply define def or cpdef methods that call the corresponding C functions:

cdef class MT:

    # ...

    cpdef double rand(self):

        return genrand_real1(self._thisptr)

Declaring and interfacing the remaining generation functions is straightforward and is left as an exercise for the reader.

To try out our extension type wrapper, we must first compile it into an extension module. We compile the mt_random_type.pyx file together with the mt19937ar-struct.c source using distutils. A script named setup_mt_type.py to take care of the gory details would look something like the following:

from distutils.core import setup, Extension

from Cython.Build import cythonize

ext_type = Extension("mt_random_type",

                     sources=["mt_random_type.pyx",

                              "mt19937ar-struct.c"])

setup(

    name="mersenne_random",

    ext_modules = cythonize([ext_type])

)

As in the previous section, we compile it with the standard distutils invocation:

$ python setup_mt_type.py build_ext --inplace

This generates an extension module that we can import as mt_random_type from Python:

In [1]: from mt_random_type import MT

In [2]: mt1, mt2 = MT(0), MT(0)

Here we have created two separate random-number generators with the same seed to verify that each has separate state:

In [3]: mt1.rand() == mt2.rand()

Out[3]: True

In [4]: for i in range(1000):

   ...:     assert mt1.rand() == mt2.rand()

   ...:

In [5]:

If they were using the same state, the MT objects would modify the same state array each time rand is called, leading to inconsistent results and failed assertions.

The entire mt_random_type.pyx file is just 22 lines, and it is easily extensible to cover the remaining RNG functions. It provides a Pythonic interface to a useful RNG library that is familiar to anyone who has used Python classes before. Its performance is likely as efficient as a hand-coded C extension type while requiring a fraction of the effort and no manual reference counting.

For wrapping C structs in Cython, the pattern used in this example is common and recommended. The internal struct pointer is kept private and used only internally. The struct is allocated and initialized in __cinit__ and automatically deallocated in __dealloc__. Declaring methodscpdef when possible allows them to be called by external Python code, and efficiently from other Cython code. It also allows these methods to be overridden in Python subclasses.

Now that we have covered the basics of wrapping a C interface with Cython, let’s focus on some of the customization features that provide greater control.

Constants, Other Modifiers, and Controlling What Cython Generates

As mentioned in Chapter 3, the Cython language understands the const keyword, but it is not useful in cdef declarations. It is used in specific instances within cdef extern blocks to ensure Cython generates const-correct code.

The const keyword is not necessary for declaring function arguments, and can be included or omitted without effect. It may be required when we are declaring a typedef that uses const, or when a function return value is declared const:

typedef const int * const_int_ptr;

const double *returns_ptr_to_const(const_int_ptr);

We can carry these declarations over into Cython and use them as required:

cdef extern from "header.h":

    ctypedef const int * const_int_ptr

    const double *returns_ptr_to_const(const_int_ptr)

Other C-level modifiers, such as volatile and restrict, should be removed in Cython extern blocks; leaving them in results in a compile-time error.

Occasionally it is useful to use an alias for a function, struct, or typedef name in Cython. This allows us to refer to a C-level object with a name in Cython that is different from its actual name in C. This feature also provides a lot of control over exactly what is declared at the C level.

For instance, suppose we want to wrap a C function named print. We cannot use the name print in Cython, because it is a reserved keyword in Python 2 and it clashes with the print function in Python 3. To give such a function an alias, we can use the following declaration:

cdef extern from "printer.h":

    void _print "print"(fmt_str, arg)

The function is called _print in Cython, but it is called print in generated C. This also works for typedefs, structs, unions, and enums:

cdef extern from "pathological.h":

    # typedef void * class

    ctypedef void * klass "class"

    # int finally(void) function

    int _finally "finally"()

    # struct del { int a, b; };

    struct _del "del":

        int a, b

    # enum yield { ALOT; SOME; ALITTLE; };

    enum _yield "yield":

        ALOT

        SOME

        ALITTLE

In all cases, the string in quotes is the name of the object in generated C code. Cython does no checking on the contents of this string, so this feature can be used (or abused) to control the C-level declaration.

EXPOSING CYTHON CODE TO C

As we saw in Chapter 3, Cython allows us to declare C-level functions, variables, and structs with the cdef keyword, and we saw how we can use these C-level constructs directly from Cython code. Suppose, for instance, that it would be useful to call a cdef Cython function from an external C function in an application, essentially wrapping Python in C. This use case is less frequent than wrapping a C library in Python, but it does arise. Cython provides two mechanisms to support this scenario.

The first mechanism is via the public keyword. We already saw public in the context of declaring the external visibility of extension type attributes; here we use it for a different purpose.

If we add the public keyword to a C-level type, variable, or function declared with cdef, then these constructs are made accessible to C code that is compiled or linked with the extension module.

For instance, suppose we have a file named transcendentals.pyx that uses the public keyword for a cdef variable and function:

cdef public double PI = 3.1415926

cdef public double get_e():

    print "calling get_e()"

    return 2.718281828

When we generate an extension module from transcendentals.pyx, the public declarations cause the cython compiler to output a transcendentals.h header in addition to transcendentals.c. This header declares the public C interface for the Cython source. It must be included in external C code that wants to call get_e or that wants to use PI.

External C code that calls into our Cython code must also be sure both to initialize the Python interpreter with Py_Initialize and to initialize the module with inittranscendentals before using any public declarations:

#include "Python.h"

#include "transcendentals.h"

#include <math.h>

#include <stdio.h>

int main(int argc, char **argv)

{

    Py_Initialize();

    inittranscendentals();

    printf("pi**e: %f\n", pow(PI, get_e()));

    Py_Finalize();

    return 0;

}

After generating transcendentals.c:

$ cython transcendentals.pyx

we can then compile our main.c source file with the transcendental.c source:

$ gcc $(python-config --cflags) \

      $(python-config --ldflags) \

      transcendentals.c main.c

and run the result:

$ ./a.out

calling get_e()

pi**e: 22.459157

The second mechanism uses the api keyword, which can be attached to C-level functions and extension types only:

cdef api double get_e():

    print "calling get_e()"

    return 2.718281828

Both api and public modifiers can be applied to the same object.

In a similar way to the public keyword, the api keyword causes cython to generate transcendentals_api.h. It can be used by external C code to call into the api-declared functions and methods in Cython. This method is more flexible in that it uses Python’s import mechanism to bring in the api-declared functions dynamically without explicitly compiling with the extension module source or linking against the dynamic library.

The one requirement is that import_transcendentals be called before we use get_e:

#include "transcendentals_api.h"

#include <stdio.h>

int main(int argc, char **argv)

{

    import_transcendentals();

    printf("e: %f\n", get_e());

    return 0;

}

Note that we cannot access PI via this method—to access it using api, we would have to create an api function that returns PI, as the api method can work only with functions and extension types. This is the tradeoff for the flexibility the api mechanism provides via dynamic runtime importing.

Error Checking and Raising Exceptions

It is common for an external C function to communicate error states via return codes or error flags. To properly wrap these functions, we must test for these cases in the wrapper function and, when an error is signaled, explicitly raise a Python exception. It is tempting to use an except clause (see Functions and Exception Handling) to automatically convert a C error return code into a Python exception, but doing so will not work; this is not the purpose of the except clause. Cython cannot automatically detect when an external C function sets a C error state.

The except clause can be used in conjunction with cdef callbacks, however. We will see an example of this in the next section.

Callbacks

As we saw previously, Cython supports C function pointers. Using this capability, we can wrap C functions that take function pointer callbacks. The callback can be a pure-C function that does not call the Python/C API, or it can call arbitrary Python code, depending on the use case. This powerful feature allows us to pass in a Python function created at runtime to control the behavior of the underlying C function.

Working with callbacks across language boundaries can get complicated, especially when it comes to proper exception handling.

To get started, suppose we want to wrap the qsort function from the C standard library. It is declared in stdlib.h:

cdef extern from "stdlib.h":

    void qsort(void *array, size_t count, size_t size,

                int (*compare)(const void *, const void *))

The first void pointer is to an array with count elements, and each element occupies size bytes. The compare function pointer callback takes two void pointers, a and b, into array. It must return a negative integer if a < b, 0 if a == b, and a positive integer if a > b.

For the sake of this example, we will create a function named pyqsort to sort a Python list of integers using C’s qsort with varying comparison functions.

The function proceeds in four steps:

1.    Allocate a C array of integers of the proper size.

2.    Convert the list of Python integers into the C int array.

3.    Call qsort with the proper compare function.

4.    Convert the sorted values back to Python and return.

The function definition looks like this:

cdef extern from "stdlib.h":

    void *malloc(size_t size)

    void free(void *ptr)

def pyqsort(list x):

    cdef:

        int *array

        int i, N

    # Allocate the C array.

    N = len(x)

    array = <int*>malloc(sizeof(int) * N)

    if array == NULL:

        raise MemoryError("Unable to allocate array.")

    # Fill the C array with the Python integers.

    for i inrange(N):

        array[i] = x[i]

    # qsort the array...

    # Convert back to Python and free the C array.

    for i inrange(N):

        x[i] = array[i]

    free(array)

To actually sort the array, we need to set up a compare callback. To do a standard sort, we can use a cdef function:

cdef int int_compare(const void *a, const void *b):

    cdef int ia, ib

    ia = (<int*>a)[0]

    ib = (<int*>b)[0]

    return ia - ib

In int_compare, we convert the void pointer arguments into C integers. We learned in Chapter 3 that to dereference a pointer in Cython we index into it with index 0. If ia < ib, then ia - ib will return the correctly signed value for qsort.

We now have all the pieces we need to call qsort in pyqsort:

    # qsort the array...

    qsort(<void*>array, <size_t>N, sizeof(int), int_compare)

This version of the function works, but is fairly static. One way to expand its capability is to allow reverse-sorting the array by negating the return value of int_compare:

cdef int reverse_int_compare(const void *a, const void *b):

    return -int_compare(a, b)

By providing the optional reverse argument, the user can exert some control over sorting. Let’s also add a ctypedef to make working with the callback easier:

ctypedef int (*qsort_cmp)(const void *, const void *)

def pyqsort(list x, reverse=False):

    # ...

    cdef qsort_cmp cmp_callback

    # Select the appropriate callback.

    if reverse:

        cmp_callback = reverse_int_compare

    else:

        cmp_callback = int_compare

    # qsort the array...

    qsort(<void*>array, <size_t>N, sizeof(int), cmp_callback)

    # ...

Let’s try out our routine. First, we compile on the fly with pyximport and import the pyqsort function:

In [1]: import pyximport; pyximport.install()

Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c7c650>)

In [2]: from pyqsort import pyqsort

In [3]: pyqsort?

Type:       builtin_function_or_method

String Form:<built-in function pyqsort>

Docstring:  <no docstring>

To test our function, we need a mixed-up list of integers:

In [4]: from random import shuffle

In [5]: intlist = range(10)

In [6]: shuffle(intlist)

In [7]: print intlist

[2, 1, 3, 7, 6, 4, 0, 9, 5, 8]

Calling pyqsort should sort the list in place:

In [8]: pyqsort(intlist)

In [9]: print intlist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

And passing in reverse=True should reverse-sort:

In [10]: pyqsort(intlist, reverse=True)

In [11]: print intlist

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Our basic functionality is looking good.

For full control over the sorting, let’s allow users to pass in their own Python comparison function. For this to work, the C callback has to call the Python callback, converting arguments between C types and Python types.

We will use a module-global Python object, py_cmp, to store the Python comparison function. This allows us to set the Python callback at runtime, and the C callback wrapper can access it when needed:

cdef object py_cmp = None

Because qsort expects a C comparison function, we have to create a callback wrapper cdef function that matches the compare function pointer signature and that calls our py_cmp Python function:

cdef int py_cmp_wrapper(const void *a, const void *b):

    cdef int ia, ib

    ia = (<int*>a)[0]

    ib = (<int*>b)[0]

    return py_cmp(ia, ib)

Inside py_cmp_wrapper, we must cast the void pointer arguments to int pointers, dereference them to extract the underlying integers, and pass these integers to py_cmp. Because py_cmp is a Python function, Cython will automatically convert the C integers to Python integers for us. The return value from py_cmp will be converted to a C integer.

We can define a reverse_py_cmp_wrapper to invert the values to support reverse sorting:

cdef int reverse_py_cmp_wrapper(const void *a, const void *b):

    return -py_cmp_wrapper(a, b)

We now have four callbacks: int_compare and reverse_int_compare, which are in pure C; and py_cmp_wrapper and reverse_py_cmp_wrapper, which call a user-provided Python callback.

The logic to select the right callback looks something like the following:

def pyqsort(list x, cmp=None, reverse=False):

    global py_cmp

    # ...

    # Set up comparison callback.

    if cmp andreverse:

        py_cmp = cmp

        cmp_callback = reverse_py_cmp_wrapper

    elif cmp and not reverse:

        py_cmp = cmp

        cmp_callback = py_cmp_wrapper

    elif reverse:

        cmp_callback = reverse_int_compare

    else:

        cmp_callback = int_compare

    # qsort the array...

    qsort(<void*>array, <size_t>N, sizeof(int), cmp_callback)

There are four cases to consider: cmp is provided or left as None, and reverse is True or False. Each case results in cmp_callback being set to a different cdef function. If cmp is provided, then the global py_cmp is set to it so that the callback wrapper can access it.

Let’s try out the new functionality. First we import, using pyximport to recompile, and create a random array of positive and negative values:

In [13]: import pyximport; pyximport.install()

Out[13]: (None, <pyximport.pyximport.PyxImporter at 0x101c7c650>)

In [14]: from pyqsort import pyqsort

In [15]: from random import shuffle

In [16]: a = range(-10, 10)

In [17]: shuffle(a)

In [18]: print a

[-8, 3, -10, 5, -3, 8, 7, -6, 4, -4, -2, 2, -7, 0, -5, -1, 6, -9, 9, 1]

Suppose we want to sort a according to absolute value. We can create a Python comparison function for that, and call pyqsort with it:

In [19]: def cmp(a, b):

   ....:     return abs(a) - abs(b)

   ....:

In [20]: pyqsort(a, cmp=cmp)

In [21]: print a

[0, 1, -1, -2, 2, 3, -3, 4, -4, -5, 5, 6, -6, -7, 7, -8, 8, 9, -9, -10]

Reversing the result works as well:

In [22]: pyqsort(a, cmp=cmp, reverse=True)

In [23]: print a

[-10, 9, -9, 8, -8, 7, -7, -6, 6, 5, -5, -4, 4, -3, 3, -2, 2, 1, -1, 0]

What about error handling? For that, we can make use of the except * clause with our cdef callbacks.

Callbacks and Exception Propagation

Thus far, any Python exception raised in cmp is ignored. To address this limitation, we can use the except * clause when declaring our cdef callbacks. The except * clause is part of the function’s declaration, so we must update the qsort declaration as well to allow it to be exception-friendly:

cdef extern from "stdlib.h":

    void qsort(void *array, size_t count, size_t size,

                int (*compare)(const void *, const void *) except *)

We also add the except * clause to the qsort_cmp ctypedef, and to each of our four cdef callbacks:

ctypedef int (*qsort_cmp)(const void *, const void *) except *

cdef int int_compare(const void *a, const void *b) except *:

    # ...

cdef int reverse_int_compare(const void *a, const void *b) except *:

    # ...

cdef int py_cmp_wrapper(const void *a, const void *b) except *:

    # ...

cdef int reverse_py_cmp_wrapper(const void *a, const void *b) except *:

    # ...

With these trivial modifications, Cython now checks for an exception every time our callbacks are called, and properly unwinds the call stack. Let’s see it in action:

$ ipython --no-banner

In [1]: import pyximport; pyximport.install()

Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c68710>)

In [2]: from pyqsort import pyqsort

In [3]: def cmp(a, b):

...:     raise Exception("Not very interesting.")

...:

In [4]: ll = range(10)

In [5]: pyqsort(ll, cmp=cmp)

Traceback (most recent call last):

  File "pyqsort.pyx", line 68, in pyqsort.py_cmp_wrapper (...)

    return py_cmp((<int*>a)[0], (<int*>b)[0])

  File "<ipython-input-3-747656ee32db>", line 2, in cmp

    raise Exception("Not very interesting.")

Exception: Not very interesting.

Because we use the except * clause, the callbacks check for an exception after every call. This means there is some overhead associated with this functionality. However, the improved error handling may be more than worth the small performance cost.

Exception propagation with cdef callbacks goes a long way toward providing a Pythonic interface to a pure-C library.

Summary

Compiling Python to C and wrapping C in Python are the yin and yang of Cython. There is no strict separation between the two: once a C function is declared in an extern block, it can be used and called as if it were a regular cdef function defined in Cython itself. All of the Python-specific parts can be used to help wrap C libraries. To the outside Python world, no one has to know whether we laboriously implemented an algorithm on our own or simply called out to a preexisting implementation defined elsewhere.

The concepts, techniques, and examples in this chapter cover basic and intermediate usage of Cython’s interfacing features. We will use these basics in the next chapter, where we cover interfacing with C++.


[13To follow along with the examples in this chapter, please see https://github.com/cythonbook/examples.