Skip to content
This repository has been archived by the owner on Jan 21, 2021. It is now read-only.

Features/veen 3 loading tag types #72

Open
wants to merge 40 commits into
base: projects/VEEN
Choose a base branch
from

Conversation

gijskant
Copy link

  • Exporting browse tags.
  • Loading i2b2 tags.
  • Conversion script from browse tag types to i2b2 tag types.

@cataphract
Copy link
Contributor

I just had a cursory look, but for now a few comments:

  • The build needs to pass.
  • This should go without saying, but it needs tests.
  • I see database queries strewn through a lot of classes; I think this would benefit from some refactoring, having them put in a DAO.
  • The (untested) Python script seems unnecessary. Why not to output the data already in the correct format? It seems the only point of exporting the browse tags is to import them as concept tags (unless we want to reimplement the export functionality of transmart here).

void writeHeader(Writer writer) throws IOException {
writer.write(['concept_key', 'tag_title', 'tag_description', 'index'].join('\t'))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Groovy this is done with fieldExtractor: { ['\\', it.value.type.displayName, it.value.description, 0] } as FieldExtractor and headerCallback: { it.write(['concept_key', 'tag_title', 'tag_description', 'index'].join('\t')) } as FlatFileHeaderCallback.

Plus, there's no need to have a wrapper class just for instantiating a FlatFileItemWriter. Either do this instantiation directly in the @Configuration class, or, if you really want to have a separate class, implement FactoryBean.

@cataphract
Copy link
Contributor

Please rebase on master, as it has a fix for a connection leak on Oracle which causes spurious failures.

@gijskant gijskant assigned gijskant and unassigned cataphract Apr 13, 2016
@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch 2 times, most recently from bd1de97 to ffda9d1 Compare April 13, 2016 19:27
@gijskant
Copy link
Author

gijskant commented Apr 14, 2016

  • Rebase on master
  • Add functional tests for tag type and tag import.
  • Add functional tests for browse tag export.
  • The build needs to pass for Postgres.
  • The build needs to pass for Oracle.
  • Refactoring code to have database queries separate in a DAO.
  • Change the browse tags export to write in the tag types import format (multiple values comma-separated on a single line)

@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch 4 times, most recently from 50de07b to 8511629 Compare April 14, 2016 14:43
@gijskant
Copy link
Author

@cataphract Comments addressed. Build passes for Postgres, not for Oracle, because there are no Oracle table definitions for tag types in transmart-data.

@gijskant gijskant assigned cataphract and forus and unassigned gijskant and cataphract Apr 14, 2016
@Override
void write(List<? extends TagType> items) throws Exception {
insertTagTypeService.deleteAllTagsForTagTypes()
insertTagTypeService.deleteAllTagTypes()
Copy link
Contributor

@forus forus Apr 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gijskant The cleanup of the tables in this place doesn't look right. The write method would be executed multiple times (for each chunk) during the upload. At the end only items inserted at the last chunk would remain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll make it a separate step/tasklet.

@cataphract
Copy link
Contributor

BTW Coverage report is here: https://codecov.io/github/thehyve/transmart-batch/commit/851162933c82e4a64421b3838edb593efcdc41d4

It's not been posted automaticaly because of the build failure. There's a bunch of uncovered code in BrowseTagAssociationDatabaseReader.

@cataphract
Copy link
Contributor

It seems you only allow type ANALYZED_STRING. Fields of this type are not appropriate for the showing up in the filters. Only non-analyzed string. I'll write my code assuming the values DATE, NON_ANALYZED_STRING, ANALYZED_STRING, INTEGER and FLOAT.

@cataphract
Copy link
Contributor

Also the convention I used for the field names are that they are lowercase except for predefined ones like the catch-all TEXT. I'll automatically lowercase them, but it would be good if the examples used lowercase as well and not that this field is case insensitive.

@@ -0,0 +1,3 @@
node_type title solr_field_name value_type shown_if_empty values index
STUDY Test tag TEST_TAG ANALYZED_STRING Y Test option 1,Test option 2,Test option 3 1
STUDY Programming language PROGRAMMING_LANGUAGE ANALYZED_STRING N Java,C,R,Python,Javascript,Pascal,Haskell 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this tsv.

@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch 6 times, most recently from c362a54 to 30cf873 Compare April 16, 2016 16:41
@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch from 5f7a3e3 to 36d0612 Compare May 17, 2016 14:32
@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch 3 times, most recently from 428f001 to 48243a9 Compare May 20, 2016 15:18
cdejonge and others added 6 commits May 23, 2016 13:53
Age will be floor-ed for storing it in PatientDimension (just as we humans do).
Age as Observation is unchanged.
…or-patientdimension

Ft 1819 truncate/round age down for patientdimension
@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch 5 times, most recently from 224415a to c3c40fc Compare May 24, 2016 13:53
forus and others added 7 commits May 24, 2016 17:03
miRNA data upload pipeline implementation.
Example of encoded path: Biomart_Data+MRNA (_ => space, + => \)
- Reuse this parsing from two places: column mapping file and subject sample mapping file read.
- We do not implicitly drop empty node names anymore (e.g. Biomart_Data\\MRNA\Lung; two slashes points to the empty node name)
- It's alowed now to do not start ConceptFragment string with slash. The complete ConceptPath still has to start with slash.
@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch from c3c40fc to ebafd72 Compare May 25, 2016 12:01
@gijskant gijskant force-pushed the features/VEEN-3_loading_tag_types branch from ebafd72 to 28fe984 Compare May 25, 2016 19:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants