Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with converting mixcr files to vdjtools #165

Open
andreaaran opened this issue Mar 23, 2023 · 7 comments
Open

Problems with converting mixcr files to vdjtools #165

andreaaran opened this issue Mar 23, 2023 · 7 comments

Comments

@andreaaran
Copy link

I have used several times the mixcr to align and assembly my data and I had no problems before when converting to VDJTools, but now I am using the new mixcr version 4.3.0 and I am unable to convert my .tsv clonotypes tables to the vdjtools format. The error says that "Some mandatory columns are absent in the input file". Has this happened to anyone else and know how to fix it?

Thanks!

@mizraelson
Copy link

Hi,
The new MiXCR has a slightly new output format that's why vdjtools does not recognize it. Most of the postanalysis function are now included in the MiXCR so you might wanna try using it instead. LEt me know if you need any help with it.

Nevertheless, there is a way to bypass the issue:

  1. New MiXCR has columns named : readCount, readFraction and uniqueMoleculeCount, uniqueMoleculeFraction (if case of UMIs). For vdjtools to work correctly the following columns are required: cloneCount, cloneFraction. A simple rename will help. Keep in mind that if you have both reads and UMIs you should pick which one do you wanna use and rename the columns of interest.

  2. Another thing is region not covered field that sometimes is present if gene region was not covered by the alignment. We used to leave these fields blank (if the region was not covered) but that confused some of our users so we added this note.

As a bypass you can run the following command for the mixcr tsv output files:

sed -i 's/region_not_covered//g' sample1.tsv

Let me know if these tips help.

@c7cc5
Copy link

c7cc5 commented May 23, 2023

Hi, The new MiXCR has a slightly new output format that's why vdjtools does not recognize it. Most of the postanalysis function are now included in the MiXCR so you might wanna try using it instead. LEt me know if you need any help with it.

Nevertheless, there is a way to bypass the issue:

1. New MiXCR has columns named : `readCount`, `readFraction` and `uniqueMoleculeCount`, `uniqueMoleculeFraction` (if case of UMIs).  For vdjtools to work correctly the following columns are required: `cloneCount`, `cloneFraction`. A simple rename will help. Keep in mind that if you have both reads and UMIs you should pick which one do you wanna use and rename the columns of interest.

2. Another thing is `region not covered` field that sometimes is present if gene region was not covered by the alignment. We used to leave these fields blank (if the region was not covered) but that confused some of our users so we added this note.

As a bypass you can run the following command for the mixcr tsv output files:

sed -i 's/region_not_covered//g' sample1.tsv

Let me know if these tips help.

Hi, after doing all your mentioned steps, i am getting this error, would you please help? My command is java -Xmx4g -jar vdjtools-1.2.1.jar Convert -S mixcr C-1.clones_TRB.tsv C-2.clones_TRB.tsv
"[ERROR] java.lang.RuntimeException: Unable to parse clonotype string 32 7.0 0.015555555555555555 GCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAACCTGCAAAGCTTGAGGA,GTATCTCTGTGCCAGCAGCTTCGGGCGAGACCCAGTACTTCGGGCCAGGCACGCGGCTCCTGGTGCTCGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAGGCCA [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[,[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[)[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ TRBV11-200(839) TRBJ2-500(212.7) TRBC2*00(405) 264|414|467|0|150|SG396A|734.0,423|444|467|0|21||105.0 ,25|68|68|26|69||215.0 , TTTCAGAATAACGGTGTA 58 TGTGCCAGCAGCTTCGGGCGAGACCCAGTACTTC 58 GGGCCAGGCACGCGGCTCCTGGTGCTCG 8 FQNNGV CASSFG_ETQYF GPGTRLLVL_ :::::::37:55:::::::::::::,:::::::::7:-3:21:::::26:-5:41:69:: for MiXcr input type: For input string: ",25", see _vdjtools_error.log for details"

@c7cc5
Copy link

c7cc5 commented May 23, 2023

Hi, The commas in the tsv file was causing the trouble. using the command
sed 's/,//g' clones_TRB.tsv > new.tsv solved the problem. Thanks

@zifengstu
Copy link

zifengstu commented Dec 24, 2023

Hi, The commas in the tsv file was causing the trouble. using the command sed 's/,//g' clones_TRB.tsv > new.tsv solved the problem. Thanks

Hi, I have same problem. Have you solved the problem?
For me, it is no use after i replace readCount, readFraction with cloneCount, cloneFraction, sed -i 's/region_not_covered//g' *.tsv, and sed -i 's/,//g' *.tsv

@c7cc5
Copy link

c7cc5 commented Dec 25, 2023

Yes. I solved the problem. What is your error message?

@li1311139481
Copy link

Yes. I solved the problem. What is your error message?

I solved it, your command is worked. Thanks

@pbpayal
Copy link

pbpayal commented Apr 18, 2024

Hello,

I have changed the column names. i am getting a different error.

Unknown symbol "0", see _vdjtools_error.log for details

What's the solution for this?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants