pyembedpg, a simple way to use Postgres in Python integration tests

Francois Dang Ngoc

Sr. Software Engineer

At Simulmedia, we use Postgres as part of our technology stack. Initially, when we started writing integration tests, we were using a local Postgres instance database that was installed on engineer laptops as well as on our continuous integration servers. Our tests would first create tables, populate them with some data and at the end clean up the data by dropping all the tables.

However as we kept adding features, so were the number of tests and tables. As a result, it was taking longer and longer to run the tests and do the cleanup. We first thought of running our tests in parallel to reduce the running time but that would require using different database instances so they wouldn’t conflict with one another. Also another problem we had was that the Postgres version installed on our laptop would sometimes differ from the production databases and so we could have unpredictable results.

In order to address these problems, we decided to develop pyembedpg a Python library that would allow us to:

  • run any Postgres version (so we can run tests on the same Postgres version that was used on production) by downloading the specified version from the postgres website
  • run tests in parallel by running separate Postgres instances on different ports using a pool of ports
  • cleaning up data fast by simply stopping the Postgres instance and delete the data files

Usage

pyembedpg is part of the pypi repository so you can install it as follows:
pip install pyembedpg

To use it in your tests, you can start a Postgres instance using a context as follows:
postgres = PyEmbedPg(‘9.4.0’)
with postgres.start(15432) as db:
postgres.create_user(‘scott’, ‘tiger’) # create a user scott with password tiger
postgres.create_database(‘testdb’, ‘scott’) # create database testdb with owner scott
# do your thing

or with no context:
postgres = PyEmbedPg(‘9.4.0’)
postgres.start(15432)
postgres.create_user(‘scott’, ‘tiger’)
postgres.create_database(‘testdb’, ‘scott’)
# do your thing
postgres.close()

Postgres version

One can choose to use any version of Postgres that is available on http://ftp.postgresql.org/pub/source. The source is downloaded, compiled and cached into $HOME/.pyembedpg/<version> so subsequent executions wouldnt require to go through these steps again. You can also choose to use the local version of Postgres that is installed on your computer using the local version (i.e., PyEmbedPg(local)). If the version is omitted, the latest version in the cache is used otherwise the latest version of Postgres is downloaded from the Postgres website.

Because the source is compiled, you will need gcc, autoconf and make to be installed. It takes a few minutes for Postgres to build.

Integration tests

Simple tests
Using pytest, one can write the tests as follows:
important unittest
important psycopg2
from pyembedpg import PyEmbedPg

class TestPyEmbedPgWithContext(unittest.TestCase):
def setUp(self):
self.postgres = PyEmbedPg(‘9.4.0’).start(15432)

def test_simple_run(self):
self.postgres.create_user(‘scott’, ‘tiger’)
self.postgres.create_database(‘testdb’, ‘scott’)

# Postgres is initialized, now run some queries
with psycopg2.connect(database=’postgres’, user=’scott’, password=’tiger’, port=15432) as conn:
with conn.cursor() as cursor:
cursor.execute(‘CREATE TABLE employee(name VARCHAR(32), age INT)’)
cursor.execute(“INSERT INTO employee VALUES (‘John’, 32)”)
cursor.execute(“INSERT INTO employee VALUES (‘Mary’, 22)”)
cursor.execute(‘SELECT * FROM employee ORDER BY age’)
assert cursor.fetchall() == [(‘Mary’, 22), (‘John’, 32)]

def tearDown (self):
self.postgres.shutdown()

Parallel tests
As mentioned earlier, running tests in parallel can greatly improve testing time. This can be done using the pytest plugin xdist ( https://pytest.org/latest/xdist.html). In pyembedpg, we allow to run PyEmbedPg using a pool of ports (i.e., if one is taken, it automatically tries the next one and so on) by passing an array to port instead of an int. For example:
postgres = PyEmbedPg(‘9.4.0’).start(range(15432, 15440))
print ‘Postgres is running on part: {port}’.format(port=postgres.running_port)
postgres.shutdown()

will try port 15432 and if a postgres instance is already running on this port, it will try port 15433 and so on until 15440. If no port is available, it throws a PyEmbedPgException.

Conclusion

Pyembedpg allows to run tests on temporary Postgres instances in isolation and in parallel and using different versions. It is available on pypi and on github under the Apache 2 License. Feel free to report bugs / feature requests or better yet submit pull requests.

Interested in getting the latest from Simulmedia?

News, insights, and events sent straight to your inbox!