Developing a Python package at DESC

The “standard” way of building packages in Python has gone through many changes in recent years, and if you have not been keeping track, it can be difficult to know the best method for putting a Python package together such that it is easy to install, easy to maintain and easy to publish.

Here we go over the (relatively) new standard for Python packaging, the pyproject.toml file. We would recommend developers at DESC strongly consider migrating to this new packaging standard, if you haven’t already, particularly when starting a new package from scratch. The intuitive, clean and maintainable format of pyproject.toml packages make them both easy to work with and publish with a minimal amount of effort.

This guide assumes some basic knowledge of putting together a piece of Python software, such as creating your own modules, etc. For those unfamiliar with creating Python software, check out the official Python guide. Also note this is by no means an exhaustive tutorial on the subject of Python packaging. A great additional resource for that is the Python Packaging Guide itself.

Note we write this guide under the assumption that the Python package you are creating will being hosted by the DESC github repository, however the majority of this guide will still apply even if this is not the case. If you are not yet familiar with git or github, or you need help getting setup on the DESC github repository, checkout this guide on the DESC Confluence page, or this more general getting started guide.

The accompanying repository which hosts the demo package we often refer in this guide can be found here. The package, mydescpackage, is very simple, consisting of a few callable mathematical functions. You are welcome to download/fork it and use it as a starting template for your project.

Note

For those wondering what a TOML file is, TOML is a file format for configuration files, similar to YAML, it stands for “Tom’s Obvious Minimal Language”.

The directory structure

Let’s get started, first we go over the directory structure of Python packages, which should look something like this (replacing names where sensible):

/path/to/my/project/
├── README.md
├── pyproject.toml
├── LICENCE
├── .gitignore
├── src/
│   └── mydescpackage/
│       ├── __init__.py
│       ├── file1.py
│       └── file2.py
├── tests/
│   ├── test1.py
│   └── test2.py
├── .github/
│   └── workflows/
│       └── ci.yml
└── docs/

Note that not all of these files and directories are strictly required, however as a minimum you should have the README.md and pyproject.toml files in your base project directory and your software code should populate the src/mydescpackage/ directory.

What are these files and directories…

  • README.md: Usually a markdown file, typically outlines the project, its requirements, installation instructions, authors, etc. The contents of this file will also be displayed on your github project landing page.

  • pyproject.toml: Where the build information, project dependencies, metadata, etc, of the Python package is stored. More in the next section.

  • LICENCE: Contains the license of the package, outlining any restrictions of its use. It is good practice to use a well-known license rather than a self-created license, such as; GNU, Apache licence, MIT license or creative commons license.

  • .gitignore: This file specifies intentionally untracked files that Git should ignore (see here).

  • src/mydescpackage/: Many people prefer placing their python packages in a src/ folder in their project directory. This is a preference, and not a requirement, however it does break the habit of getting used to running the source code directly from the project directory, as using the src/ directory layout forces the user to install the package to run (don’t worry, you still only have to install the package once with an editable install, see more about this later on).

  • tests/: Your tests go in here (see our guide on Continuous Integration).

  • .github/workflows/: Your github Actions Continuous Integration workflows go in here (see our guide on Continuous Integration).

  • docs/: For extensive documentation, readthedocs files for example.

Once the directory structure is setup, we can move onto telling pip how to build and install our package.

The pyproject.toml file

The pyproject.toml configuration file was introduced in PEP518 as a way of specifying the minimum build system requirements for Python projects. This allows the system to know what packages are required during the building process itself, e.g., setuptools, wheel, so that one does not have to pre-install any package dependencies before hand in order to install your package. The build requirements specified in pyproject.toml are installed in an isolated environment, used to build the package, and later discarded, keeping your base environment clean and tidy.

The build system

To specify which build-backend to use for installing your package, and any requirements needed during the build process, include this at the top of your pyproject.toml file.

[build-system]
requires = ["setuptools >= 61.0"]
build-backend = "setuptools.build_meta"

Here we are saying we require the setuptools package during the build, and we are going to use setuptools to build the our Python package as our build-backend. Other common requirements during he build process are wheel and cython.

Note

You do not have to use setuptools as your build-backend, you can use alternate Python package managers such as Poetry, or Flit. You even can put your own custom build-backend here if you have very specific requirements for building your package. However if you are unsure, stick with setuptools.

In theory this is the minimum needed, if you were to install your package via pip at this stage, pip install ., it would use the specified information from pyproject.toml for the build system, and continue to install your package with some generic default values, or by looking for more information in the legacy setup.py and setup.cfg files.

However, we are now able to transfer the all information that has traditionally been put in the setup.py and setup.cfg files directly into pyproject.toml, making it the only configuration file you need (note you can still keep the traditional setup.* files for legacy purposes and backwards compatibility).

Project metadata

As of PEP621 there is a standard format for storing project metadata in pyproject.toml, which setuptools>=61.0.0 conforms to (see their tutorial on metadata here). Below is the metadata for our demo package:

[project]
name = "mydescpackage"
description = "Example DESC Python package, some simple mathmatical functions."
license = {text = "BSD 3-Clause License"}
classifiers = [
    "Programming Language :: Python :: 3",
]
dependencies = [
    'numpy',
    'importlib-metadata;python_version<"3.8"'
]
requires-python = ">=3"
version = "0.0.1"

All metadata goes under the [project] section, including the name of your package, the minimum required Python version, and the package dependencies. Here we are saying our package will be installed as mypackage==0.0.1, it requires Python versions >=3 to run, and depends on numpy (importlib-metadata was not built-in to Python prior to <3.8, so we need to include that as a dependency in those cases). Many of the metadata fields are optional, but it is useful to be as thorough as possible detailing the package, especially if you publish the package to PyPi for example (for a list of all metadata options see here).

[tool.setuptools.packages.find]
where = ["src"]

Because we are using the src/ directory to host our package’s code, we can aid setuptools by pointing to this directory in its search for our Package’s source code (the default is .). Any [sub/]directories of src/ with an __init__.py file will automatically be discovered by setuptools.

Optional dependencies

The packages listed under [project] dependencies should be the minimum required for your Python software to operate. Yet we can include optional dependencies for alternate scenarios.

For example, in our demo package we have a test suite which we invoke using the pytest package during the Continuous Integration process. As we only need the pytest package during testing, we create an optional dependency list, labelled test.

[project.optional-dependencies]
test = ["pytest"]

which, when running pip install .[test], will install pytest along with the default dependencies.

Optional dependencies are also useful if you want MPI-specific installs, or installs to compile documentation, for example.

Package entrypoints/scripts

Another extremely useful thing to be aware of with Python packages is script entrypoints. Here you can declare commands to be run from the terminal which will directly execute functions within your package. For example, in our demo package we have a function that computes the numerical value of pi. As we keep forgetting the value of pi, we can to register a command, display-pi, to help us, which calls the mydescpackage.pi.display_pi function directly (outputting the value of pi to the terminal).

[project.scripts]
display-pi = "mydescpackage.pi:display_pi"

Entrypoints are great for creating front-ends to your packages.

Automatic versioning

An extremely important attribute of your Python package is its version, which you should declare in the pyproject.toml metadata. It is a good practice to use the Semantic Versioning format for your code.

In order to not have multiple manual declarations of the package version, both in the pyproject.toml file and the source code, a useful trick is to use the importlib.metadata method to access the version tag dynamically within the code.

To do this, go to your __init__.py file in your mydescpackage directory and insert:

try:
    # For Python >= 3.8
    from importlib import metadata
except ImportError:
    # For Python < 3.8
    import importlib_metadata as metadata

__version__ = metadata.version("mydescpackage")

then any calls to mydescpackage.__version__ will be automatically up to date and correct.

Installing your package (from source)

Finally, once the pyproject.toml file is built, we can install the package using pip just like before. Within the project directory type:

pip install -e .

Note the -e flag means an “editable install”, which is extremely useful, particularly when developing your packages. An editable installation works very similarly to a regular install with pip install ., except that it only installs your package dependencies, metadata and wrappers for console and GUI scripts, but your system will point to the code directly in your project folder using a special link. This means that any changes in the Python source code can immediately take place without requiring a new installation.

For this installation method, people will have to clone your git repository, and install from source as shown above (which is fine). A slightly easier way for people to install your packages is via public repositories, such a PyPy and Conda, which we cover next.