Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check-in schema generated from higlass_schema #1078

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

manzt
Copy link
Member

@manzt manzt commented Feb 3, 2022

Description

The file checked in was generated via:

pip install git+https://github.com/higlass/higlass-schema
python -m higlass_schema.main > app/schema.json

If the changes are desirable, we can discuss how to add higlass_schema as a dependency to this repo.

What was changed in this pull request?

This PR introduces schema.json generated from higlass/higlass-schema. It replaces the old schema with one that is generated automatically from python class definitions.

The initial python code was bootstrapped via a code-generation tool which used the schema.json from this repo as input. Classes were modified manually to better reflect the types described here.

Why is it necessary?

This is a step towards keeping the higlass client and any python API in sync with minimal manual refinement. The idea is that higlass/higlass-schema serves as the "source of truth" and schema changes should occur there.

Checklist

  • Set proper GitHub labels (e.g. v1.6+, ignore if you don't know)
  • Unit tests added or updated
  • Documentation added or updated
  • Example(s) added or updated
  • Update schema.json if there are changes to the viewconf JSON structure format
  • Screenshot for visual changes (e.g. new tracks or UI changes)
  • Updated CHANGELOG.md

@pkerpedjiev
Copy link
Member

Looks good. I didn't go through the schema line by line but it looks like a reasonable addition. A few questions:

  1. Did you happen to try running the schema on any of the viewconfs in test/view-configs?
  2. It might be worth trying to run in on a viewconf in a plugin track to see how it behaves:

https://github.com/higlass/higlass-pileup/blob/master/src/index.html#L61

Do you know if it's possible to do something like subclass a schema? I'm thinking in terms of plugin tracks it would be nice if they could take the existing higlass schema, add whatever they need and then use that? I presume it would just be a matter of subclassing the pydantic class and regenerating the schema?

@manzt
Copy link
Member Author

manzt commented Feb 3, 2022

Did you happen to try running the schema on any of the viewconfs in test/view-configs?

I just ran the test suite (sucessfully) locally. I can go through some of the view configs.

  1. It might be worth trying to run in on a viewconf in a plugin track to see how it behaves:

I'll try that out.

I presume it would just be a matter of subclassing the pydantic class and regenerating the schema?

So I ran into something similar (for different reasons) that I need for hg. I wanted my derived classes to have additional python methods not implemented on the core classes. I think I found a solution where we can expose this behavior via generics and pydantic.generics.GenericModel.

We can make the schema generic over different properties, and any concrete subclassing of the generic config will extend the schema.

from typing import Literal, Union
from higlass_schema as hgs

class MyCustomTrack(hgs.BaseTrack[Literal["a-custom-track"]):
  customField: str = "I'm a custom field"

hgs.Viewconf.schema_json() # base schema, "bounds" on generic types are used to generate schema
# equivalent to
hgs.Viewconf[hgs.View[hgs.Track]].schema_json()
# extended
hgs.Viewconf[hgs.View[Union[hgs.Track, MyCustomTrack]].schema_json() # explicit extension

The added benefit here is that we can control which parts of the schema are "extendable"

@manzt
Copy link
Member Author

manzt commented Feb 3, 2022

Let me know if that makes any sense or if you'd want to hop on a call to discuss anything.

@pkerpedjiev
Copy link
Member

Yeah, this makes sense. Thank you! I'm thinking that plugin repos could have a small chunk of Python code that extends the schema and higlass-schema and generates their own.

Mixing Python and JS is not ideal but as far as I know there's no JS equivalent to pydantic?

@manzt
Copy link
Member Author

manzt commented Feb 3, 2022

Just opened a PR to add a simple CLI for checking schemas:

Yeah, this makes sense. Thank you! I'm thinking that plugin repos could have a small chunk of Python code that extends the schema and higlass-schema and generates their own.

In that PR I expanded the Track definition to include BaseTrack which is a catch-all type for all non-CombinedTrack/EnumTrack/HeatmapTrack/. It's a more flexible type since I allowed extra fields on the model.

TL;DR - with this change, the pile-up schema you shared is valid but all the unrecognized tracks end up being BaseTrack in python.

from typing import Union
import higlass_schema as hgs

hgs.Viewconf[
	hgs.View[hgs.EnumTrack, hgs.CombinedTrack, hgs.HeatmapTrack]
].parse_file("./pileup.json") # fails because tracks aren't one of enum/combined/heatmaps

c = hgs.Viewconf[
	hgs.View[hgs.EnumTrack, hgs.CombinedTrack, hgs.HeatmapTrack, hgs.BaseTrack]
]parse_file("./pileup.json") # success!

track = c.views[0].tracks.top[0] # hgs.BaseTrack
print(track.type) # type == str, "chromosome-labels"
print(track.filetype) # type == Any, "chromsizes-tsv" , not defined on BaseTrack, but additional properties are allowed and will be added to the instance

By adding the BaseTrack to the root generic bound, the latter behavior is the default:

hgs.Viewconf.parse_file("./pileup.json") # ok!

Then in a python library like hg, we could expose an API to allow users to extend the model with more precise types (and not worry about it at all for higlass-schema). This would allow strictness only if folks desire it.

@pkerpedjiev
Copy link
Member

That seems reasonable.

By the way, why do you use the name EnumTrack? From what I can tell, that's meant to reference tracks that are in the base higlass distribution?

@manzt
Copy link
Member Author

manzt commented Feb 4, 2022

By the way, why do you use the name EnumTrack? From what I can tell, that's meant to reference tracks that are in the base higlass distribution?

I bootstrapped schema.py with a code generation tool that took the names from schema.json directly...

"enum_track": {

A track for base higlass is defined as anyOf [enum_track, heatmap_track, combined_track, independent_viewport_projection_track]. With python/pydantic is should be easy to refactor these types (without breaking anything) if that is desireable (e.g. use inheritance, define a minimum requirement fora track, etc). For now I've just left them mostly as they were auto generated, and extracted out a BaseTrack type which all tracks inherit from (to avoid redefining shared properties).

@manzt
Copy link
Member Author

manzt commented Sep 1, 2022

Schema tests had been turned off for a while. Good thing we didn't merge. Need to have a closer look.

@pkerpedjiev
Copy link
Member

Anything to be done here on my end?

@manzt
Copy link
Member Author

manzt commented Sep 5, 2022

Nope! I just need to spend some time and look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants