detox

What and why

I spend most of my time building models and writing ad hoc code within notebooks. Sometimes I wonder what writing and shipping real software feels like, so I thought I’d have a crack at re-invigorating an old project of mine from a CI perspective.

The code

clean_py is a python-based utility that bundles the functionality of black, autopep8 and isort into a single CLI tool, allowing users to tidy and lint python source code as well as notebooks. Additionally, clean_py clears cell outputs within notebooks. I have another post describing the guts here, but the gist is that these libraries are invoked at the python level as modules and applied across the source strings of .py and .ipynb files. If you’re curious, you can test it out for yourself via pip install clean-py

Multi-env testing

I’ve been using clean_py at work for the past couple of years and was pretty happy with its performance; it linted and cleaned source code but it also threw lots of terrible warnings and errors whenever I raked it across .ipynb source. Also, I recently started up at a new job, where python 3.9 is the game and attempted to lint some source and came across some specific errors tied to some hardcoded python 3.6 function parameter. So clean_py was effectively tied to python 3.6, which is not great.

I could bump it to 3.9, but it would probably run into the same problem later, and perhaps importantly, this would break for other people, with their own specific environment requirements. I needed a way to run tests across a variety of different environments, so I punched this exact query into google and perused some options.

Two related tools I found were Tox and Nox. They both aim to provide an easy way to run (test) software in a variety of configurations, with a mind toward stable releases across environments. From what I can gather Tox is the OG project, where the vibe is more config driven, whereas Nox is both written and executed imperatively and explicitly in python.

I could have gone either way, but the thing which tipped it for me was this line within the tox docs that concisely described a matrix of environments like this:

[tox]
envlist = py{36,37,38}-django{22,30}-{sqlite,mysql}

Where the specified python, django and sql version/type combinations are enumerated and tested in a triple nested for-loop. Very cool! I thought this might come in handy because I would ideally like clean_py to be useable across most environments, so tox it is.

Anyway, here’s my tox.ini file, which spins out the specified virtual envs and runs the test suite for clean_py via a call to tox in the root directory of the project:

[tox]
envlist = python3.6,python3.7,python3.8,python3.9

[testenv]
commands = pytest

Somewhat deflating in its simplicity. Some gotchas which stung me along the way:

I did have to create all the envs initially via tox --recreate
and I did have to change my env description from a conda style p36 to the more invocation-friendly python3.6
also, drilling into the tox-created environments and debugging tests in VS code (at least) was manageable with the scale of clean_py (some env changes required) but could have been annoying since python interpreter versions are usually specified within VSCode configs for convenience

The Circle CI Pipeline

Building off of this excellent article from circle CI, below is the planned intent of how the branching structure interacts with circle CI:

Something like a “funnel” where CI becomes more onerous as code gets closer to master.

In particular, note the use of TestPyPi within the dev branch workflow, via the python docs:

TestPyPI is a separate instance of the Python Package Index (PyPI) that allows you to try out the distribution tools and processes without worrying about affecting the real index. TestPyPI is hosted at test.pypi.org

Sounds.. good? Sounds like.. a staging build? In any case, it matches the branching structure I originally had, which is a good enough alignment for me. If you want to skip ahead, here’s the final .circleci/config.yaml:

version: 2.1

# YAML referencing to reduce job filter duplication
dev_only: &dev_only
  filters:
    branches:
      only: dev

master_only: &master_only
  filters:
    branches:
      only: master

jobs:
  tests:
    docker:
      - image: cimg/python:3.10.4
    steps:
      - checkout
      - run: pip install tox
      - run: cd ./clean_py && tox

  pypi_publish_mock:
    docker:
      - image: cimg/python:3.10.4
    steps:
      - checkout
      - run:
          command: |
            python setup.py sdist bdist_wheel
            pip install twine
            twine upload --repository testpypi dist/*
  pypi_publish_prod:
    docker:
      - image: cimg/python:3.10.4
    steps:
      - checkout
      - run:
          command: |
            python setup.py sdist bdist_wheel
            pip install twine
            twine upload dist/*
workflows:
  tests_pypi_publish_mock:
    jobs:
      - tests:
          <<: *dev_only
      - pypi_publish_mock:
          <<: *dev_only
          requires:
            - tests
  tests_pypi_publish_prod:
    jobs:
      - tests:
          <<: *master_only
      - pypi_publish_prod:
          <<: *master_only
          requires:
            - tests

Circle CI Miscellaneous Observations

In no particular order:

Workflow image management. The plan is to only run some tests and export some python packages, and since we’re using Tox to create virtual environments to test within all we need is a docker image with a python 3 installation. Like this python image that is readily available from CircleCI. Flagging the decision explicitly here as I have been involved in CI configurations that required custom docker images to run (and repo access, see credential notes below) and/or built docker images as part of explicit artefacts of the build process.

Successfully executing a pipeline. Coming from programming land, one thing I was always curious about was how to define “success” across intermediate steps where the evaluation context sits outside of a program (no explicit “variables”). Of course, and seems obvious in hindsight, the context is moved “up” another level to within a shell terminal, where (bash) exit codes from each discrete element of your workflow are monitored. Anything other than a zero exit code is classified as “failure”, breaking the build.

Remote build errors. I came across my first remote error within circle CI, when my tox tests couldn’t find the adjacent requirements.txt file. I found the “re-run with ssh” option to be super helpful here, which allowed me to examine the environment at the point of failure, reducing the “distance” between invocation and error which is often such a pain when working with hosted/remote tools.

Supplying credentials as environment variables within CircleCI. Something that has also fascinated me for ages is where do all the passwords and credentials go once you start moving the execution of your programs off of your local machine? (eg. a CI environment). I believe the technical name for this is “secrets management” and the gist of it is to centralise application-dependent credentials, keys, and passwords into something like a password manager that is programmatically accessible. Within CircleCI you can configure environment variables that are available during workflow execution, so kind of like a secrets manager. In particular, we’re concerned with supplying pypi credentials to allow build outputs to be uploaded to the index.

Twine version upload requirements. Twine is a utility for publishing python packages to pypi. As I found out, it requires a new version to be described upon every upload to either the test/real index. This was a bit of a pain when I was initially trying to stand up the CI pipeline with broken/temp artefacts (to test the twine uploads), largely solved by incrementing the patch version, which appeased the index upload “uniqueness” requirement.

Filtering workflows across branches. Recall that we sketched out distinct dev/master workflows. CircleCI allows us to filter when particular jobs/workflows are invoked (yay) but AFAIK the filters must be applied within each constituent workflow job (nay).

# nay, duplicitious filtering within each workflow job
tests_pypi_publish_prod:
  jobs:
    - tests:
        filters:
          branches:
            only:
              - master
    - pypi_publish_prod:
        requires:
          - tests
        filters:
          branches:
            only:
              - master

# yay? single filtering applied to whole workflow?
tests_pypi_publish_prod:
  filters:
    branches:
      only:
        - master
  jobs:
    - tests
    - pypi_publish_prod:
        requires:
          - tests

The apparent fix to this is via YAML referencing which allows you to reference “variables” in later parts of the YAML. Note the <<: referencing invocations within the final config above for an example.

The Final Run

After much fiddling (not pictured, over 25 misc. pipeline runs), I decided to run a mock feature branch through the whole CI pipeline, using the following sequence:

sh/ci_sanity_check
	> new branch on local machine
	> tweak package version
	> pull request into dev
dev
	> trigger dev ci build
	> pull request into master
master
	> trigger master ci build
	> validate package index updates

The branch bumps clean_py to a very auspicious version 0.5 to ensure test pypi and pypi build steps don’t fail and it does literally nothing else. Here are the circle CI pipeline outputs:

So far so good.

And here are the associated version updates across test pypi and pypi, yee haw.

Test pypi.