Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor core code to use Pydantic objects #1790

Open
2 of 4 tasks
Tracked by #1784
ewels opened this issue Oct 30, 2022 · 0 comments
Open
2 of 4 tasks
Tracked by #1784

Refactor core code to use Pydantic objects #1790

ewels opened this issue Oct 30, 2022 · 0 comments

Comments

@ewels
Copy link
Member

ewels commented Oct 30, 2022

Much of the core MultiQC code was written early on when (a) I didn't know very much Python and (b) everything had to work on Python 2.7. I also wrote a lot of the code to be super forgiving. Things like needing a list but allowing a string (and coercing it into a list) if the user wanted. Also tonnes of defaults for everything, even when it doesn't always make sense. This sounds like a nice thing to do, but actually tends to confuse things because stuff doesn't always fail when it probably should do.

Now that MultiQC supports only Python 3 we can use static typing, and specifically the Pydantic library. I think that this it's time to do a major code clean up in the core library to sort out a lot of the mess that has crept in. We can use this opportunity to be much less forgiving with the code and make it clear about what is expected (type hints will help developers a lot). By using Pydantic data models we can get a tonne of nice validation for free, without using much in the way of additional code. We can also extend the validation methods for custom classes easily.

pydantic-settings specifically (docs) has a lot of functionality that we currently do ourselves, or would like to have.

Pydantic should also be able to help with #1256 by providing access to a wide range of nice output formats. Of particular interest is Apache Parquet and JSON Schema, which I think we should be well positioned for.

Progress

  1. core: refactoring
  2. core: back end
  3. core: back end
  4. core: refactoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant