Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

Allow "loose" referential integrity #1528

Open
david4096 opened this issue Jan 19, 2017 · 3 comments · May be fixed by #1596
Open

Allow "loose" referential integrity #1528

david4096 opened this issue Jan 19, 2017 · 3 comments · May be fixed by #1596

Comments

@david4096
Copy link
Member

We shouldn't place a foreign key requirement for references when someone does not have them, or doesn't want to manage them. This means that a VCF should be able to be added with no other data (other than a dataset) present in a server.

The problem becomes that it is unclear what reference names to use to query a variant set. From the perspective of the server, that is a data management problem and the more full-fledged offering can be made by adding a reference set, but it shouldn't be required.

@kozbo kozbo added this to the 2017-00 milestone Feb 2, 2017
@david4096
Copy link
Member Author

david4096 commented Feb 7, 2017

To close this issue we should make it possible to add variant sets without a reference set added. This is possible because variants use the referenceName in their search request. However, for Reads search, the explicit reference ID is used, so we can't offer the same feature as easily.

One way to still allow a BAM to be query-able without adding a reference would be to create a synthetic reference based on the BAM index or headers and adding it to the registry. VCF headers do not contain enough information to know what references are used, but by reflecting on the tabix index we might be able to do something similar.

For RNA, making the reference set optional presents no real differences in access pattern. I think there are some features that may only be present in specific gene builds, but that relational information is captured by the FeatureSetIDs.

@ejacox
Copy link
Collaborator

ejacox commented Feb 8, 2017

Here is my view:

Currently, we require that the server has the references loaded. We then use internally generated ids to refer to those references from other sets (tables). The problem is that it could be unrealistic to expect every server to have all the necessary references stored internally. We should be able to use references that are defined outside of the server.

We would still like to maintain referential integrity. It just doesn't need to be enforced by the database. This can be done during ingest, ensuring that all reference ids are known, either internally or externally, much like ontology terms. Alternatively, referential integrity could be checked later or even as part of compliance.

In the short term, we can turn off foreign key checks. Longer term, this should be addressed in the larger discussion involving external ids, ontology terms, federated queries, etc.

@david4096
Copy link
Member Author

Thanks @ejacox that's a good summary of the situation. The feature is, I don't need a FASTA to load data into the server. The aspiration is maintaining easy access patterns when data are distributed.

@david4096 david4096 mentioned this issue Feb 16, 2017
@kozbo kozbo modified the milestones: 2017-02, 2017-00 v0.3.6 Feb 24, 2017
@david4096 david4096 linked a pull request Mar 2, 2017 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants