Azure DevOps: Python Wheels

This is the third post in a series about Azure DevOps. This one is about making Python wheels. If you want to play nice with Python users, or you have a complex build, this will make your package far more accessible to users. They are faster to install and to use and more secure. We will quickly cover making universal wheels, then we will move on to fully compiled binaries, including C++14, manylinux2010, and other hot topics. This series was developed to update the testing and releasing of Python packages for Scikit-HEP. The results of this tutorial can be seen in the boost-histogram repository, under the .ci folder.

Introduction

You should know the basics of pipelines from my first post. We will be targeting Azure Release Pipelines, which you can read about in my second post. I will assume your package already has a nice setup.py or setup.cfg and uses setuptools, though most of this should be applicable to the other packaging tools as long as they make wheels.

SDist

You should provide wheels, but you also should always provide an SDist (source distribution) as well. SDist’s are easy; there is just one thing to watch out for: accidentally including too much in your package. You can (and should) write a good MANIFEST.in (or whatever your packaging tool uses) to make sure you don’t pick up .pyc, .git, and other files like that. Building your SDist using CI, such as Azure, helps as well; since the directory can be packaged before building or running anything, you are less likely to pick up surprises. Always check your SDist when you are first building them, though, they are just simple zipped files and can be inspected easily.

Adding a DevOps job to make an SDist is easy. Since you can run this right before making a universal wheel, it integrates into that workflow and you will see it in the next section. If you make binaries, however, it should probably be in its own dedicated job, since you only want it to run one time (so it can’t be in a matrix).

Universal wheels

If you do not have any compiled code, and your code will run on any version of Python that it supports1, then you can make universal wheels. You can make them on any system, and run them on any system.

Here is my suggested azure-pipelines.yml entry:

- job: 'Package'
  pool:
    vmImage: 'ubuntu-16.04'
  steps:
    - template: .ci/azure-wheel.yml

Here, it doesn’t matter what image we use, so we pick a Linux image. The contents of .ci/azure-wheel.yml could be the following:

steps:
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.7'
    architecture: 'x64'

- script: |
    python -m pip install --upgrade pip
    python -m pip install --upgrade setuptools wheel
  displayName: 'Install dependencies'

- script: |
    python setup.py sdist
  displayName: 'Make sdist'

- script: |
    python setup.py bdist_wheel --universal=1
  displayName: 'Make wheel'

- task: PublishPipelineArtifact@0
  inputs:
    artifactName: 'artifact'
    targetPath: 'dist'

Here, we select a Python version (choice doesn’t matter; in fact, the default may be fine). Next, we make sure pip, setuptools, and wheel are all latest-and-greatest. The first call to setup.py makes the sdist; we do this early to make sure we don’t pickup extra generated bits in our source distribution. We then make the bdist; you can either pass --universal=1 here, or (better) set the following in your setup.cfg:

[bdist_wheel]
universal=1

We end by publishing the artifacts to the Azure artifacts; this way you can download them later or use them in Release Pipelines.

Binary wheels

I like to set up Azure DevOps with two pipelines if you are making non-universal wheels; I really don’t need every possible combination of Python and OS every time I want a PR tested. So I’ll have a test pipeline and a slower build (packaging) pipeline. You can always manually trigger a pipeline in the UI if you need to check all possible Python combinations for some reason.

For the following examples, I will assume you have the package name set as a variable, for example:

variables:
  # This is the output name, - is replaced by _
  package_name: my_package

Publishing from many jobs with unique names

I am going to start at the end, by making a template that will publish whatever I produce in ./dist, called .ci/azure-publish-dist.yml:

steps:
- task: PublishPipelineArtifact@0
  inputs:
    artifactName: 'artifact_$(Agent.OS)_$(Agent.JobName)_$(python.architecture)'
    targetPath: 'dist'

This tries very hard to make sure all the outputs will have unique names, but can be collected with the glob artifact_* (other tasks, like publishing tests and artifacts also produce files here; if you do not use any other publishing lines you can simplify this a little). If you do not run in a matrix (for example, the SDist), the JobName will not be very descriptive.

Making the SDist

We need a job that will just make an SDist; let’s keep it clearly separate:

- job: LinuxSDist
  pool:
    vmImage: 'ubuntu-16.04'
  variables:
    python.architecture: 'none'
  steps:
    - script: |
        python -m pip install setuptools
        python setup.py sdist
      displayName: Publish sdist
    - template: azure-publish-dist.yml

That’s it for the sdist. We set python.architecture because we use it in the publish step. We don’t need anything special to make an sdist, so it’s really just these two lines. Warning about multiline commands: Failures in lines before the final line will not cause the job to fail. However, this should be safe.

ManyLinux

Since there are many flavors of Linux, Python packagers have come up with a special subset of allowed interactions with the base operating system, and called that “ManyLinux1”. It’s based on CentOS 5, circa 2007; so in theory, most Linux OS’s after 2007 should be able to run your wheel. The common exceptions are the unusual distros, like Alpine Linux and Clear Linux, which will download the sdist and build. But you can cover CentOS, Fedora, Ubuntu, and many others.

However, CentOS5 (that is, Red Hat Enterprise Linux 5) has hit end-of-life, so compiler packages and such are no longer being produced - the latest developer toolset compiler is GCC 4.8, which does not support C++14. Recently, a new CentOS 6 manylinux, called manylinux2010, was released. You need a very recent version of pip to be able to use it. Note it is also 64-bit only if that matters to you on Linux. Here’s an example of a DevOps job matrix that builds both ManyLinux1 and ManyLinux2010 wheels:

- job: ManyLinux
  strategy:
    matrix:
      64Bit2010:
        arch: x86_64
        plat: manylinux2010_x86_64
        image: quay.io/pypa/manylinux2010_x86_64
        python.architecture: x64
      64Bit:
        arch: x86_64
        plat: manylinux1_x86_64
        image: quay.io/pypa/manylinux1_x86_64
        python.architecture: x64
      32Bit:
        arch: i686
        plat: manylinux1_i686
        image: quay.io/pypa/manylinux1_i686
        python.architecture: x86
  pool:
    vmImage: 'ubuntu-16.04'
  steps:
    - script: |
        set -ex
        docker run -e PLAT=$(plat) -e package_name=$(package_name) --rm -v `pwd`:/io $(image) /io/.ci/build-wheels.sh
        ls -lh wheelhouse/
        mkdir -p dist
        cp wheelhouse/$(package_name)*.whl dist/.
      displayName: Build wheels
    - template: azure-publish-dist.yml

The first few lines should be clear to you by now; we set up three jobs, each with some custom variables. For the script, we run docker and pass the variables into the script using -e VARIABLE=$(variable). We map the current working directory to /io in the container, and we run our script from its container path. After it runs, we echo the contents of the wheelhouse directory (which is where we build our files instead of the more normal “dist” directory). Finally, we copy just the package-related wheels to “dist” - if you built some other wheels, like numpy, along the way, this keeps them out of your dist directory.

The helper file here is .ci/build-wheels.sh, and was based on the official example.

#!/bin/bash
set -e -x

# Collect the pythons
pys=(/opt/python/*/bin)

# Filter out Python 3.4
pys=(${pys[@]//*34*/})

# Compile wheels
for PYBIN in "${pys[@]}"; do
    "${PYBIN}/pip" install -r /io/dev-requirements.txt
    "${PYBIN}/pip" wheel /io/ -w wheelhouse/
done

# Bundle external shared libraries into the wheels
for whl in wheelhouse/$package_name-*.whl; do
    auditwheel repair --plat $PLAT "$whl" -w /io/wheelhouse/
done

# Install packages and test
for PYBIN in "${pys[@]}"; do
    "${PYBIN}/python" -m pip install $package_name --no-index -f /io/wheelhouse
    "${PYBIN}/pytest" /io/tests
done

The main differences here from the official example is the package name (which I pass in), the filter for Python 3.4 (since Numpy does not provide Python 3.4 wheels, this slows down the build a lot if included).

If you want to build ManyLinux1 wheels with a newer version of GCC, I’ve created a docker image skhep/manylinuxgcc-x86_64 (and skhep/manylinuxgcc-i686) with a custom build of GCC 8 or 9; see the formula here. The ManyLinux2010 image should make this obsolete eventually.

macOS

In order to support macOS, you need to pay attention to what version of macOS Python was built with. Most sources of macOS Python are built with a recent version of macOS; the official Python.org versions are the oldest, and so should always be what you build your wheels against. So, if you want a completely generic setup, you should have something like this:

- script: .ci/macos-install-python.sh '$(python.version)'
  displayName: Install Python.org Python

If you want to do so in a general setup (click here)

- script: .ci/macos-install-python.sh '$(python.version)'
  displayName: Install Python.org Python
  condition: and(succeeded(), eq(variables['Agent.OS'], 'Darwin')) 

- task: UsePythonVersion@0
  inputs:
    versionSpec: '$(python.version)'
    architecture: '$(python.architecture)'
  condition: and(succeeded(), ne(variables['Agent.OS'], 'Darwin')) 

The special setup only runs on macOS, other OS’s use the normal Azure Python task.

The contents of the macos-install-python.sh file:

#!/usr/bin/env bash

PYTHON_VERSION="$1"

case $PYTHON_VERSION in
2.7)
  FULL_VERSION=2.7.16
  ;;
3.6)
  FULL_VERSION=3.6.8
  ;;
3.7)
  FULL_VERSION=3.7.3
  ;;
esac

INSTALLER_NAME=python-$FULL_VERSION-macosx10.9.pkg
URL=https://www.python.org/ftp/python/$FULL_VERSION/$INSTALLER_NAME

PY_PREFIX=/Library/Frameworks/Python.framework/Versions

set -e -x

curl $URL > $INSTALLER_NAME

sudo installer -pkg $INSTALLER_NAME -target /

sudo rm /usr/local/bin/python
sudo ln -s /usr/local/bin/python$PYTHON_VERSION /usr/local/bin/python

which python
python --version
python -m ensurepip
python -m pip install setuptools twine wheel numpy

This installs 2.7, 3.6, and 3.7. You have a choice here; the most recent releases of Python.org Python have special 64-bit only 10.9+ builds; if you prefer, you can use the older 10.6+ dual architecture builds. If you want to support Python 3.5 on macOS, you’ll need to do this as well as select an older patch release, because Python no longer provides binaries for it. For any C++ build, you’ll probably have to make your code 10.9+ anyway (because libstdc++ was removed in 10.14, and you need 10.9+ to get the replacement, libc++).

At this point, you can just make wheels the normal way, using the same code we will use on Windows (except for Python 2.7):

- script: |
    python -m pip wheel . -w wheelhouse/
  displayName: 'Build wheel'
 
# <INSERT TESTING HERE>
  
- script: |
    ls -lh wheelhouse
    mkdir -p dist
    cp wheelhouse/$(package_name)* dist/.
  displayName: 'Show wheelhouse'

We should end by delocating the wheels; like auditwheel above, this will try to make sure all dependencies are included and referenced properly:

- script: |
    python -m pip install delocate
    /Library/Frameworks/Python.framework/Versions/$(python.version)/bin/delocate-wheel dist/$(package_name)*.whl
  displayName: 'Delocate wheels'
  condition: and(succeeded(), eq(variables['Agent.OS'], 'Darwin')) 

This is macOS only, so I have added a condition here; you don’t need it unless you share code with windows (or possibly linux, but the Docker-centeric build makes that unlikely).

Windows

If you don’t care about C++11 and Python 2.7 on Windows, then Windows is easy. First let’s assume you want a pretty standard matrix of versions. Make sure you include 32-bit for Windows; unlike macOS (which removed it years ago), and Linux (which may remove it soon), 32-bit is the default download option from Python.org.

- job: Windows
  strategy:
    matrix:
      Python27:
        python.version: '2.7'
        python.architecture: 'x64'
      Python36:
        python.version: '3.6'
        python.architecture: 'x64'
      Python37:
        python.version: '3.7'
        python.architecture: 'x64'
      Python27_32:
        python.version: '2.7'
        python.architecture: 'x86'
      Python36_32:
        python.version: '3.6'
        python.architecture: 'x86'
      Python37_32:
        python.version: '3.7'
        python.architecture: 'x86'
  pool:
    vmImage: 'vs2017-win2016'
  steps:
    - template: .ci/azure-setup.yml
    - template: .ci/azure-steps.yml
    - template: .ci/azure-publish-dist.yml

This is a pretty standard matrix (where one might complain that I didn’t use the “matrix” part of matrix where it could have been used, but this is simple). If you don’t need special compilers, this is pretty much trivial. Just run the normal setup, bdist, and publish. You don’t even need to delocate the wheels2. Let’s show what it would look like if you need a more powerful compiler, such as MSVC 2017. Note: Do not do this unless you need C++11+! It will force your users to have the MSVC 2015+ redistributable to run, instead of the “normal” 2008 redistributable that Python requires. However, I love PyBind11 (as you may have noticed from my previous posts), so this is a requirement for me. Hopefully no one is using Windows and Python 2.7 together.

Let’s look at the three files listed above. First, the Windows .ci/azure-setup.yml:

- task: UsePythonVersion@0
  inputs:
    versionSpec: '$(python.version)'
    architecture: '$(python.architecture)'

- script: |
    mkdir -p dist
    python -m pip install --upgrade pip
    python -m pip install --upgrade pytest wheel twine setuptools
  displayName: 'Install dependencies'

Everything there is normal. Next, .ci/azure-steps.yml:

- script: |
    call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" $(python.architecture)
    set MSSdk=1
    set DISTUTILS_USE_SDK=1
    python -m pip wheel . -w wheelhouse/
  displayName: 'Build wheel (Windows Python 2.7)'
  condition: and(succeeded(), eq(variables['python.version'], '2.7')) 

- script: |
    python -m pip wheel . -w wheelhouse/
  displayName: 'Build wheel'
  condition: and(succeeded(), ne(variables['python.version'], '2.7'))

- script: |
    ls -lh wheelhouse
    mkdir -p dist
    cp wheelhouse/$(package_name)* dist/.
  displayName: 'Show wheelhouse'

# <INSERT TESTING HERE>

The special thing here is the setup for MSVC when you are running Python 2.7. You are forcing distutils (really setuptools) to ignore the built-in MSVC settings, and instead pick up the 2017 settings.

You have already seen the publish part.

Wrap up

With that, we have now covered how to make a complete set of Wheels for ManyLinux, Windows, and macOS. You can see an example of all this in action with the boost-histogram package; look in the .ci folder.

If you have suggestions or corrections, either let me know in the comments below, or open an issue here, since this is an open source blog. I would like to thank Eduardo Rodrigues, who helped me edit these posts before they were published.


This work was supported by the National Science Foundation under Cooperative Agreement OAC-1836650.

Bonus: Operating system agnostic files

(Click here to expand)

I really actually share many of the files, at least for macOS and Windows. Here is what azure-setup.yml looks like:

steps:

- script: .ci/macos-install-python.sh '$(python.version)'
  displayName: Install Python.org Python
  condition: and(succeeded(), eq(variables['Agent.OS'], 'Darwin')) 

- task: UsePythonVersion@0
  inputs:
    versionSpec: '$(python.version)'
    architecture: '$(python.architecture)'
  condition: and(succeeded(), ne(variables['Agent.OS'], 'Darwin')) 

- script: |
    mkdir -p dist
    python -m pip install --upgrade pip
    python -m pip install --upgrade pytest wheel twine setuptools
  displayName: 'Install dependencies'

And, .ci/azure-steps.yml, including a testing:

- script: |
    call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" $(python.architecture)
    set MSSdk=1
    set DISTUTILS_USE_SDK=1
    python -m pip wheel . -w wheelhouse/
  displayName: 'Build wheel (Windows Python 2.7)'
  condition: and(succeeded(), eq(variables['Agent.OS'], 'Windows_NT'), eq(variables['python.version'], '2.7')) 

- script: |
    python -m pip wheel . -w wheelhouse/
  displayName: 'Build wheel'
  condition: and(succeeded(), not(and(eq(variables['Agent.OS'], 'Windows_NT'), eq(variables['python.version'], '2.7'))))

- script: |
    ls -lh wheelhouse
    mkdir -p dist
    cp wheelhouse/$(package_name)* dist/.
  displayName: 'Show wheelhouse'

- script: |
    python -m pip install $(package_name) --no-index -f wheelhouse
  displayName: 'Install wheel'

- script: |
    python -m pytest --junitxml=junit/test-results.xml
  workingDirectory: tests
  displayName: 'Test with pytest'

- task: PublishTestResults@2
  inputs:
    testResultsFiles: '**/test-*.xml'
    testRunTitle: 'Publish test results for Python $(python.version)'
  condition: succeededOrFailed()

- script: |
    python -m pip install delocate
    /Library/Frameworks/Python.framework/Versions/$(python.version)/bin/delocate-wheel dist/$(package_name)*.whl
  displayName: 'Delocate wheels'
  condition: and(succeeded(), eq(variables['Agent.OS'], 'Darwin')) 


  1. This may sound odd. The most common case where it might not be true for pure Python code is if you have Python 2 code that is converted into Python 3 code using 2to3 by setup.py. This was the expected method for adopting Python 3 when it first came out, but quickly was found to be a complete mess and is no longer in use. Most code is written to support both in a single code base if both versions are supported. [return]
  2. Okay, I can’t get by with making Windows look that good. The reason you can’t delocate the wheels is that the DLL lookup on Windows is terrible, and you have to be very careful to bundle in any DLL’s by hand with unique names so that you don’t break another wheel. [return]
comments powered by Disqus