Skip to content

DavidPratt512/pytest-with-data-files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

This is a brief guide on how to setup a pytest project to retrieve test data from files.

I don't use pytest often, but when I do, I often find myself in a situation where this information would come in handy.

Example

Follow the structure of the column-extractor directory for an example of how to create pytest test cases from directories.

There are 3 components to get this working:

  • A pytest fixture in conftest.py.
  • A pytest test function in test_column_extractor.py that accepts the above fixture.
  • A set of tests defined by convention in the column-extractor-test-cases directory.

Why

If you use pytest, you are likely familiar with pytest's pytest.mark.parametrize() test decorator. If not, it is a simple way to run many tests scenarios against a single criteria. For example, the following code block tests the double() function against four test cases.

def double(n: int) -> int:
    return 2 * n

@pytest.mark.parametrize(
    'given,expected',
    [
        (-1, -2),
        ( 0,  0),
        ( 1,  2),
        ( 2,  4),
    ]
)
def test_double(given: int, expected: int):
    assert double(given) == expected

But what if you needed many test cases? One solution would be to execute the test many times against random inputs. But a programmer can quickly become stuck in a corner when using random test data. (Ask yourself how to test the double() function with random test data without re-implementing the double() function in the test... it's possible, but not obvious.)

Another, perhaps more obvious, solution is to just pack all of those new test cases next to the existing test cases. Now the test looks like this:

@pytest.mark.parametrize(
    'given,expected',
    [
        (-1, -2),
        ( 0,  0),
        # ...
        # ... one thousand lines later
        # ...
        ( 1,  2),
        ( 2,  4),
    ]
)
def test_double(given: int, expected: int):
    assert double(given) == expected

Imagine having to sift through one thousand lines of test cases inside of a python file! Yuck!

At this point, it is reasonable to ask "Why would anyone write a single function that has one thousand test cases?". I don't have a good answer for that. I would agree that there is likely a problem with the structure of the code in that scenario.

While it is potentially unreasonable to have one thousand test cases for a single function, it may not be unreasonable to have a handful of test cases that take up one thousand lines of code. For example, suppose a function takes a SQL CREATE TABLE script as an input and produces a set of column names present in the script. SQL scripts can get pretty long, and perhaps a decent test case would test the function against a "real-world" SQL script.

At this point, one may object again and say "That function does too many things! It shouldn't have to parse a SQL script and search for column names!". That's likely correct - and the code would certainly be more maintainable if it was broken up so each component had a single responsibility. But try writing a SQL tokenizer in 30 minutes with a comprehensive test suite.

It would be nice if there was only one test function, much like test_double(), accompanied by a comprehensive suite of SQL scripts that are stored separately from the python code. Perhaps a test like this:

def test_column_extract(test_column_extract_data):
    assert column_extract(test_column_extract_data.sql) == test_column_extract_data.columns

... with a directory structure like this:

test/
    __init__.py
    conftest.py
    test_column_extractor.py
    column-extractor-test-cases/
        all-caps/
            given.sql
            expected-columns.json
        no-columns/
            given.sql
            expected-columns.json
        ...

Note that this comes with the implication that new tests can be added to the column-extractor-test-cases directory without having to modify any python code. That's pretty slick.

About

How to use data in files as test cases in pytest.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages