zeitbach.com

Database tests in Python

This post describes a setup for automated tests with a real database. These are the steps we’ll cover:

I’m using a specific tech stack (GitLab, MySQL/MariaDB, pytest), however the overall strategy doesn’t depend on these details. It should be relatively easy to swap out, for example, GitLab with Bitbucket or MySQL with Postgres.

Database setup with Docker

In GitLab CI/CD we can run a database in a service container next to our test container. We only need to specify an image that comes with a database under services in .gitlab-ci.yml.

dbtests:
  services:
    - name: mariadb:11.1.2-jammy
      alias: mysql
  variables:
    MYSQL_ROOT_PASSWORD: $MYSQL_ROOT_PASSWORD
    MYSQL_PORT: 3306
    MYSQL_HOST: "mysql"
    DB_TESTS_ENABLED: 1
    FF_NETWORK_PER_BUILD: 1

We define an alias that we’ll use as host name in the test image. It’s important to enable FF_NETWORK_PER_BUILD, to create a Docker network such that the database container can be resolved by name. DB_TESTS_ENABLED is a custom environment variable that we’ll use as a feature flag to enable the database tests.

Now this will work on the server, but how do we run the tests locally during development? We’ll use docker compose up -d to spin up the same database image as in the pipeline with this docker-compose.yml.

version: '3.8'
services:
  mysql:
    image: mariadb:11.1.2-jammy
    environment:
      MYSQL_ROOT_PASSWORD: testpassword
    ports:
      - '3306:3306'

volumes:
  mysql:
    driver: local

We’ll still be able to run the fast unit tests without database dependency by running pytest without DB_TESTS_ENABLED=1.

An alternative to this solution is using the testcontainers library for Python. With that library you can spin up Docker containers from your Python code, so you can use the same setup for local development and the pipeline on the server. The downside of that approach is that your test container needs to be able to launch other Docker containers1, which might be a security issue or not even possible, depending on your setup.

Collecting test data with mysqldump

If you’re not dealing with sensitive production data, one simple way of obtaining test data is sampling data from your production database, for example, with mysqldump.

mysqldump --host "my.database.com" \
  --port 3306 \
  --user myuser \
  --single-transaction \
  "mydb" "mytable" \
  --where "myid IN ('1', '2', '3')" \
  > mydb/mytable.sql

In this case, we just select rows from mytable in the database mydb with three constant myids, but the where condition can be as complex as necessary. If you have to join several tables to be able to select the data you need, go ahead.

Implementing useful pytest fixtures

We implement a fixture that gives us a connection to our database. This will be useful both for preparing the database and for testing.

@pytest.fixture(scope="function")
def mysql_connection():
    with MySQLdb.connect(
        user="root",
        password=os.environ["MYSQL_ROOT_PASSWORD"],
        host=os.environ["MYSQL_HOST"],
        port=int(os.environ["MYSQL_PORT"]),
        client_flag=MySQLdb.constants.CLIENT.MULTI_STATEMENTS,
    ) as connection:
        yield connection

Here I use the mysqlclient lib. It’s a C wrapper, which means it’s fast, but depends on a few system libs. If you can’t or don’t want to install these dependencies, use a pure-Python client like PyMySQL. The connection is configured with values from environment variables.2 To set them locally, you can use .env.

Now we use the connection from the fixture to write our sampled test data to the database.

@pytest.fixture(scope="function")
def prepare_db(mysql_connection):
    with mysql_connection.cursor() as cursor:
        # Create fresh DB
        cursor.execute("DROP DATABASE IF EXISTS mydb;")
        cursor.execute("CREATE DATABASE mydb;")

        # Insert data from mysql dump
        cursor.execute("USE mydb;")
        cursor.execute(Path("mydb/mytable.sql").read_text())

The fixture is function scoped to isolate the test cases from one another. If you use the same data in each test case and only read from the database, you could make it module or session scoped to speed up your tests.

For our convenience, we add a little helper function that allows us to hide tests behind a feature flag.

def requires_flag(flag_name: str):
    return pytest.mark.skipif(
        not os.environ.get(flag_name, False),
        reason=f"Environment variable {flag_name} needs to be set to run this test",
    )

The helper is using pytest’s skipif.

Writing a simple test case

Finally, we use our fixtures to run queries against a real database in our tests.

@requires_flag("DB_TESTS_ENABLED")
@pytest.mark.usefixtures("prepare_db")
def test_sql_query(mysql_connection):
    with mysql_connection.cursor() as cursor:
        # Whatever you want to test
        cursor.execute("SELECT * FROM mytable");

We hide the test behind the DB_TESTS_ENABLED flag, so it will only run if enabled explicitly.

DB_TESTS_ENABLED=1 pytest

That’s it. Go forth and write database tests.


  1. The test container needs a Docker client installation and access to the Docker daemon, see DinD.↩︎

  2. When running this locally, use 127.0.0.1 as host instead of localhost, otherwise the client lib tries to use a Unix socket instead of TCP.↩︎