Features/veen 3 loading tag types #72

gijskant · 2016-04-12T10:03:37Z

Exporting browse tags.
Loading i2b2 tags.
Conversion script from browse tag types to i2b2 tag types.

cataphract · 2016-04-12T14:15:55Z

I just had a cursory look, but for now a few comments:

The build needs to pass.
This should go without saying, but it needs tests.
I see database queries strewn through a lot of classes; I think this would benefit from some refactoring, having them put in a DAO.
The (untested) Python script seems unnecessary. Why not to output the data already in the correct format? It seems the only point of exporting the browse tags is to import them as concept tags (unless we want to reimplement the export functionality of transmart here).

cataphract · 2016-04-12T14:21:44Z

src/main/groovy/org/transmartproject/batch/browsetag/BrowseTagAssociationToI2b2Writer.groovy

+                    void writeHeader(Writer writer) throws IOException {
+                        writer.write(['concept_key', 'tag_title', 'tag_description', 'index'].join('\t'))
+                    }
+                }


In Groovy this is done with fieldExtractor: { ['\\', it.value.type.displayName, it.value.description, 0] } as FieldExtractor and headerCallback: { it.write(['concept_key', 'tag_title', 'tag_description', 'index'].join('\t')) } as FlatFileHeaderCallback.

Plus, there's no need to have a wrapper class just for instantiating a FlatFileItemWriter. Either do this instantiation directly in the @Configuration class, or, if you really want to have a separate class, implement FactoryBean.

cataphract · 2016-04-13T08:48:23Z

Please rebase on master, as it has a fix for a connection leak on Oracle which causes spurious failures.

gijskant · 2016-04-14T07:57:44Z

Rebase on master
Add functional tests for tag type and tag import.
Add functional tests for browse tag export.
The build needs to pass for Postgres.
The build needs to pass for Oracle.
Refactoring code to have database queries separate in a DAO.
Change the browse tags export to write in the tag types import format (multiple values comma-separated on a single line)

gijskant · 2016-04-14T15:39:06Z

@cataphract Comments addressed. Build passes for Postgres, not for Oracle, because there are no Oracle table definitions for tag types in transmart-data.

forus · 2016-04-15T14:32:22Z

src/main/groovy/org/transmartproject/batch/tag/TagTypeWriter.groovy

+    @Override
+    void write(List<? extends TagType> items) throws Exception {
+        insertTagTypeService.deleteAllTagsForTagTypes()
+        insertTagTypeService.deleteAllTagTypes()


@gijskant The cleanup of the tables in this place doesn't look right. The write method would be executed multiple times (for each chunk) during the upload. At the end only items inserted at the last chunk would remain.

Okay, I'll make it a separate step/tasklet.

cataphract · 2016-04-15T14:45:59Z

BTW Coverage report is here: https://codecov.io/github/thehyve/transmart-batch/commit/851162933c82e4a64421b3838edb593efcdc41d4

It's not been posted automaticaly because of the build failure. There's a bunch of uncovered code in BrowseTagAssociationDatabaseReader.

cataphract · 2016-04-15T14:54:35Z

It seems you only allow type ANALYZED_STRING. Fields of this type are not appropriate for the showing up in the filters. Only non-analyzed string. I'll write my code assuming the values DATE, NON_ANALYZED_STRING, ANALYZED_STRING, INTEGER and FLOAT.

cataphract · 2016-04-15T14:56:00Z

Also the convention I used for the field names are that they are lowercase except for predefined ones like the catch-all TEXT. I'll automatically lowercase them, but it would be good if the examples used lowercase as well and not that this field is case insensitive.

cataphract · 2016-04-15T14:58:41Z

global/tagtypes/tagtypes.txt

@@ -0,0 +1,3 @@
+node_type	title	solr_field_name	value_type	shown_if_empty	values	index
+STUDY	Test tag	TEST_TAG	ANALYZED_STRING	Y	Test option 1,Test option 2,Test option 3	1
+STUDY	Programming language	PROGRAMMING_LANGUAGE	ANALYZED_STRING	N	Java,C,R,Python,Javascript,Pascal,Haskell	2


Rename this tsv.

Upload of log transformed data points.

+ Some minor code improvements

Acgh cnv

Age will be floor-ed for storing it in PatientDimension (just as we humans do). Age as Observation is unchanged.

…or-patientdimension Ft 1819 truncate/round age down for patientdimension

miRNA data upload pipeline implementation.

Example of encoded path: Biomart_Data+MRNA (_ => space, + => \) - Reuse this parsing from two places: column mapping file and subject sample mapping file read. - We do not implicitly drop empty node names anymore (e.g. Biomart_Data\\MRNA\Lung; two slashes points to the empty node name) - It's alowed now to do not start ConceptFragment string with slash. The complete ConceptPath still has to start with slash.

gijskant assigned cataphract Apr 12, 2016

cataphract reviewed Apr 12, 2016
View reviewed changes

gijskant assigned gijskant and unassigned cataphract Apr 13, 2016

gijskant force-pushed the features/VEEN-3_loading_tag_types branch 2 times, most recently from bd1de97 to ffda9d1 Compare April 13, 2016 19:27

gijskant force-pushed the features/VEEN-3_loading_tag_types branch 4 times, most recently from 50de07b to 8511629 Compare April 14, 2016 14:43

gijskant assigned cataphract and forus and unassigned gijskant and cataphract Apr 14, 2016

forus reviewed Apr 15, 2016
View reviewed changes

cataphract reviewed Apr 15, 2016
View reviewed changes

gijskant force-pushed the features/VEEN-3_loading_tag_types branch 6 times, most recently from c362a54 to 30cf873 Compare April 16, 2016 16:41

Update CA certificate

7fa8fe4

gijskant force-pushed the features/VEEN-3_loading_tag_types branch from 5f7a3e3 to 36d0612 Compare May 17, 2016 14:32

cataphract and others added 4 commits May 17, 2016 17:22

Merge pull request #83 from thehyve/l-data-type

b34abc5

Upload of log transformed data points.

Fail when parameter has no value.

96d90a9

+ Some minor code improvements

Rename acgh to cnv everywhere.

5bae216

Merge pull request #86 from thehyve/acgh-cnv

0e0fef9

Acgh cnv

gijskant force-pushed the features/VEEN-3_loading_tag_types branch 3 times, most recently from 428f001 to 48243a9 Compare May 20, 2016 15:18

cdejonge and others added 6 commits May 23, 2016 13:53

FT-1819-truncate/round-age-down-for-patientdimension

f62a4c4

Age will be floor-ed for storing it in PatientDimension (just as we humans do). Age as Observation is unchanged.

Reactions on Gustavo's remarks.

89d6694

Adjusted imports

41ed977

Add funtional test.

5579d1b

Merge pull request #88 from thehyve/FT-1819-truncate/round-age-down-f…

6a76e72

…or-patientdimension Ft 1819 truncate/round age down for patientdimension

miRNA data upload pipeline implementation.

aa15f48

gijskant force-pushed the features/VEEN-3_loading_tag_types branch 5 times, most recently from 224415a to c3c40fc Compare May 24, 2016 13:53

forus and others added 7 commits May 24, 2016 17:03

Merge pull request #89 from thehyve/mirna

0afc768

miRNA data upload pipeline implementation.

Use SPLITTER constant instead.

630c4d8

Change Travis config to use project branches.

77f0176

Added loading of tag types.

48aadfd

Added export of browse tags.

b1850b7

Documentation on tag type loading.

c6908f9

gijskant force-pushed the features/VEEN-3_loading_tag_types branch from c3c40fc to ebafd72 Compare May 25, 2016 12:01

Updated browse tag export to properly quote tsv/csv.

28fe984

gijskant force-pushed the features/VEEN-3_loading_tag_types branch from ebafd72 to 28fe984 Compare May 25, 2016 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/veen 3 loading tag types #72

Features/veen 3 loading tag types #72

gijskant commented Apr 12, 2016

cataphract commented Apr 12, 2016

cataphract Apr 12, 2016

cataphract commented Apr 13, 2016

gijskant commented Apr 14, 2016 •

edited by forus

gijskant commented Apr 14, 2016

forus Apr 15, 2016 •

edited

gijskant Apr 15, 2016

cataphract commented Apr 15, 2016

cataphract commented Apr 15, 2016

cataphract commented Apr 15, 2016

cataphract Apr 15, 2016

Features/veen 3 loading tag types #72

Are you sure you want to change the base?

Features/veen 3 loading tag types #72

Conversation

gijskant commented Apr 12, 2016

cataphract commented Apr 12, 2016

cataphract Apr 12, 2016

Choose a reason for hiding this comment

cataphract commented Apr 13, 2016

gijskant commented Apr 14, 2016 • edited by forus

gijskant commented Apr 14, 2016

forus Apr 15, 2016 • edited

Choose a reason for hiding this comment

gijskant Apr 15, 2016

Choose a reason for hiding this comment

cataphract commented Apr 15, 2016

cataphract commented Apr 15, 2016

cataphract commented Apr 15, 2016

cataphract Apr 15, 2016

Choose a reason for hiding this comment

gijskant commented Apr 14, 2016 •

edited by forus

forus Apr 15, 2016 •

edited