Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running vdjtools CalcBasicStats and PoolSamples fail on input file of 10xGenomics data from vdjdb #141

Open
ghost opened this issue Dec 22, 2020 · 0 comments

Comments

@ghost
Copy link

ghost commented Dec 22, 2020

Describe the problem
I am aiming to run the .Rmd tutorial for the paper 'A Framework for Annotation of Antigen Specificities in High-Throughput T-Cell Repertoire Sequencing Studies' found here(https://github.com/antigenomics/tcr-annotation-methodology), by using the 10xGenomics data from vdjdb as input dataset. I have read the section 'Customizing the Framework for Analysis of User-Provided Datasets' of the paper to prepare my input dataset before using CalcBasicStats in the tutorial code. Currently I'm having errors when using the PoolSamples method of vdjtools to pre-process the 10x data.

To Reproduce
--Download 10xGenomics dataset from vdjdb:
-https://vdjdb.cdr3.net/search
-set parameters: species=human, gene(chain): TRB, paired only, Meta: search for 10xGenomics and select the only search result given
-refresh table
-export as .tsv with paired-gene-export enabled

--From a terminal (I used the Windows command prompt) run "java -jar {path_to_vdj_tools}/vdjtools-1.2.1.jar PoolSamples -i strict {path_to_data}/{10xData file}.tsv {output_prefix}"
No changes where made to the .tsv file after downloading it from vdjdb. Errors given by this execution are under the next header of this issue.
I have also tried with .txt (created using Python) and .txt.gz (created using 7zip) versions of the same file, but I still get the same errors.

--Clone the 'tcr-annotation-methodology ' repo from antigenomics, and in the tutorial.Rmd modify only this line with the input filename for 10xData:
run_java("vdjtools-1.2.1",
"CalcBasicStats data/control.txt.gz data/{INPUT_FILENAME_preprocessed from PoolSamples}.{extension} output/",
T)

Run the script up to and including the 'CalcBasicStats' line, using the pre-processed input file. Just out of curiosity, I tried to run it on the {10x_data_from_vdjdb}.tsv file (and the .txt and .txt.gz versions of the same file) as downloaded by vdjdb without any pre-processing, and I still get the same error as with PoolSamples.

Errors (with PoolSample)

C:\Users\Stelios\Documents\tcr-annotation-methodology-master>java -Xmx1G -jar C:/Users/Stelios/vdjsw/vdjtools-1.2.1.jar PoolSamples -i strict data/10x_vdjdb.tsv preprocessed_10x_vdjdb
Executing com.antigenomics.vdjtools.operate.PoolSamples -i strict data/10x_vdjdb.tsv preprocessed_10x_vdjdb
[Tue Dec 22 11:56:20 EET 2020 PoolSamples] Reading samples
[Tue Dec 22 11:56:20 EET 2020 PoolSamples] 1 samples loaded
[Tue Dec 22 11:56:20 EET 2020 PoolSamples] Pooling with Strict, this may take a while
[Tue Dec 22 11:56:20 EET 2020 SampleStreamConnection] Loading sample 10x_vdjdb
[ERROR] java.lang.RuntimeException: Unable to parse clonotype string 1 TRB CASSEGWHSYEQYF TRBV6-101 TRBJ2-701 HomoSapiens HLA-A03:01 B2M MHCI KLGGALQAK IE1 CMV https://www.10xgenomics.com/resources/application-notes/a-new-way-of-exploring-immunity-linking-highly-multiplexed-antigen-recognition-to-immune-repertoire-and-phenotype/# {"frequency": "1/11684", "identification": "dextramer-sort", "sequencing": "rna-seq", "singlecell": "yes", "verification": ""} {"cell.subset": "", "clone.id": "", "donor.MHC": "", "donor.MHC.method": "", "epitope.id": "", "replica.id": "", "samples.found": 1, "structure.id": "", "studies.found": 1, "study.id": "", "subject.cohort": "", "subject.id": "1", "tissue": ""} {"cdr3": "CASSEGWHSYEQYF", "cdr3_old": "CASSEGWHSYEQYF", "fixNeeded": false, "good": true, "jCanonical": true, "jFixType": "NoFixNeeded", "jId": "TRBJ2-701", "jStart": 8, "vCanonical": true, "vEnd": 5, "vFixType": "NoFixNeeded", "vId": "TRBV6-1*01"} 0 for VDJtools input type: For input string: "TRB", see _vdjtools_error.log for details

_vdjtools_error.log:
[Tue Dec 22 11:56:20 EET 2020 BEGIN]
[Script]
PoolSamples
[CommandLine]
executing vdjtools-1.2.1.jar PoolSamples -i strict data/10x_vdjdb.tsv preprocessed_10x_vdjdb
[Message]
java.lang.RuntimeException: Unable to parse clonotype string 1 TRB CASSEGWHSYEQYF TRBV6-101 TRBJ2-701 HomoSapiens HLA-A03:01 B2M MHCI KLGGALQAK IE1 CMV https://www.10xgenomics.com/resources/application-notes/a-new-way-of-exploring-immunity-linking-highly-multiplexed-antigen-recognition-to-immune-repertoire-and-phenotype/# {"frequency": "1/11684", "identification": "dextramer-sort", "sequencing": "rna-seq", "singlecell": "yes", "verification": ""} {"cell.subset": "", "clone.id": "", "donor.MHC": "", "donor.MHC.method": "", "epitope.id": "", "replica.id": "", "samples.found": 1, "structure.id": "", "studies.found": 1, "study.id": "", "subject.cohort": "", "subject.id": "1", "tissue": ""} {"cdr3": "CASSEGWHSYEQYF", "cdr3_old": "CASSEGWHSYEQYF", "fixNeeded": false, "good": true, "jCanonical": true, "jFixType": "NoFixNeeded", "jId": "TRBJ2-701", "jStart": 8, "vCanonical": true, "vEnd": 5, "vFixType": "NoFixNeeded", "vId": "TRBV6-101"} 0 for VDJtools input type: For input string: "TRB"
[StackTrace-Short]
com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser.parse(ClonotypeStreamParser.groovy:189)
com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser$_iterator_closure8.doCall(ClonotypeStreamParser.groovy:249)
com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser$_iterator_closure8.doCall(ClonotypeStreamParser.groovy)
com.antigenomics.vdjtools.sample.Sample.fromInputStream(Sample.java:174)
com.antigenomics.vdjtools.sample.Sample$fromInputStream.call(Unknown Source)
com.antigenomics.vdjtools.io.SampleStreamConnection._load(SampleStreamConnection.groovy:128)
com.antigenomics.vdjtools.io.SampleStreamConnection.getSample(SampleStreamConnection.groovy:139)
com.antigenomics.vdjtools.sample.SampleCollection$_iterator_closure6.doCall(SampleCollection.groovy:320)
com.antigenomics.vdjtools.sample.SampleCollection$_iterator_closure6.doCall(SampleCollection.groovy)
com.antigenomics.vdjtools.pool.SampleAggregator.(SampleAggregator.java:81)
com.antigenomics.vdjtools.operate.PoolSamples.run(PoolSamples.groovy:119)
com.antigenomics.vdjtools.operate.PoolSamples$run.call(Unknown Source)
com.antigenomics.vdjtools.misc.ExecUtil.run(ExecUtil.groovy:94)
com.antigenomics.vdjtools.misc.ExecUtil$run.call(Unknown Source)
com.antigenomics.vdjtools.VdjTools.run(VdjTools.groovy:226)
com.antigenomics.vdjtools.VdjTools.main(VdjTools.groovy)
[StackTrace-Full]
java.lang.RuntimeException: Unable to parse clonotype string 1 TRB CASSEGWHSYEQYF TRBV6-1
01 TRBJ2-701 HomoSapiens HLA-A03:01 B2M MHCI KLGGALQAK IE1 CMV https://www.10xgenomics.com/resources/application-notes/a-new-way-of-exploring-immunity-linking-highly-multiplexed-antigen-recognition-to-immune-repertoire-and-phenotype/# {"frequency": "1/11684", "identification": "dextramer-sort", "sequencing": "rna-seq", "singlecell": "yes", "verification": ""} {"cell.subset": "", "clone.id": "", "donor.MHC": "", "donor.MHC.method": "", "epitope.id": "", "replica.id": "", "samples.found": 1, "structure.id": "", "studies.found": 1, "study.id": "", "subject.cohort": "", "subject.id": "1", "tissue": ""} {"cdr3": "CASSEGWHSYEQYF", "cdr3_old": "CASSEGWHSYEQYF", "fixNeeded": false, "good": true, "jCanonical": true, "jFixType": "NoFixNeeded", "jId": "TRBJ2-701", "jStart": 8, "vCanonical": true, "vEnd": 5, "vFixType": "NoFixNeeded", "vId": "TRBV6-101"} 0 for VDJtools input type: For input string: "TRB"
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:77)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:238)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:258)
at com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser.parse(ClonotypeStreamParser.groovy:189)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:352)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:68)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:157)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:169)
at com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser$_iterator_closure8.doCall(ClonotypeStreamParser.groovy:249)
at com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser$_iterator_closure8.doCall(ClonotypeStreamParser.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:264)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:421)
at org.codehaus.groovy.runtime.ConvertedMap.invokeCustom(ConvertedMap.java:54)
at org.codehaus.groovy.runtime.ConversionHandler.invoke(ConversionHandler.java:124)
at com.sun.proxy.$Proxy4.next(Unknown Source)
at com.antigenomics.vdjtools.sample.Sample.fromInputStream(Sample.java:174)
at com.antigenomics.vdjtools.sample.Sample$fromInputStream.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at com.antigenomics.vdjtools.io.SampleStreamConnection._load(SampleStreamConnection.groovy:128)
at com.antigenomics.vdjtools.io.SampleStreamConnection.getSample(SampleStreamConnection.groovy:139)
at com.antigenomics.vdjtools.sample.SampleCollection$_iterator_closure6.doCall(SampleCollection.groovy:320)
at com.antigenomics.vdjtools.sample.SampleCollection$_iterator_closure6.doCall(SampleCollection.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:264)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:421)
at org.codehaus.groovy.runtime.ConvertedMap.invokeCustom(ConvertedMap.java:54)
at org.codehaus.groovy.runtime.ConversionHandler.invoke(ConversionHandler.java:124)
at com.sun.proxy.$Proxy4.next(Unknown Source)
at com.antigenomics.vdjtools.pool.SampleAggregator.(SampleAggregator.java:81)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:238)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:266)
at com.antigenomics.vdjtools.operate.PoolSamples.run(PoolSamples.groovy:119)
at com.antigenomics.vdjtools.operate.PoolSamples$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at com.antigenomics.vdjtools.misc.ExecUtil.run(ExecUtil.groovy:94)
at com.antigenomics.vdjtools.misc.ExecUtil$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:136)
at com.antigenomics.vdjtools.VdjTools.run(VdjTools.groovy:226)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1235)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1018)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1001)
at org.codehaus.groovy.runtime.InvokerHelper.runScript(InvokerHelper.java:423)
at org.codehaus.groovy.runtime.InvokerHelper$runScript.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:136)
at com.antigenomics.vdjtools.VdjTools.main(VdjTools.groovy)
Caused by: java.lang.NumberFormatException: For input string: "TRB"
at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
at sun.misc.FloatingDecimal.parseDouble(Unknown Source)
at java.lang.Double.parseDouble(Unknown Source)
at java.lang.Double.valueOf(Unknown Source)
at org.codehaus.groovy.runtime.StringGroovyMethods.toDouble(StringGroovyMethods.java:3454)
at org.codehaus.groovy.runtime.dgm$1212.invoke(Unknown Source)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:246)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:55)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at com.antigenomics.vdjtools.io.parser.BaseParser.innerParse(BaseParser.groovy:65)
at com.antigenomics.vdjtools.io.parser.ClonotypeStreamParser.parse(ClonotypeStreamParser.groovy:159)
... 82 more
[END]

Expected behavior
PoolSamples should run with no errors and output a pre-processed version of the given dataset. Then after doing step 2, a basicstats.txt file and other from the CalcBasicStats routine are calculated for the control and samples of interest files, without any errors.

Additional context
OS: Windows 10
Java version:
java version "1.8.0_271"
Java(TM) SE Runtime Environment (build 1.8.0_271-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode)
R version: 4.0.3
Python version: 3.8.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants