Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-region analysis: sidle/reconstructed/reconstructed_merged.tsv OCCATIONALLY mis-formatted #736

Closed
d4straub opened this issue Apr 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@d4straub
Copy link
Collaborator

Description of the bug

sidle/reconstructed/reconstructed_merged.tsv was wrongly formatted (note that the first line contains the abundance in the last 3 columns, while it is missing for the second and third line):

"ID"	"Taxon"	"sa"	"sb"	"sc"
"AB361591.1.1439|AY486367.1.1434|DQ140184.1.1403|DQ302158.1.1386|EF509324.1.1494|EF509367.1.1462|EF509378.1.1458|EF509460.1.1463|EF509596.1.1469|EF509605.1.1477|EF510026.1.1464|EF510037.1.1401|EF510941.1.1400|EF511012.1.1465|EU139850.1.1411|EU661692.1.1480|EU874609.1.1395|FJ940905.1.1467|GU122959.1.1401|GU181421.1.1378|HM480353.1.1479|HQ232955.1.1433|HQ455027.1.1444|HQ455028.1.1424|HQ880674.1.1424|JF723552.1.1398|JN846903.1.1315|JQ900535.1.1456|JQ900537.1.1448|KC862289.1.1429|KR811027.1.1488|KU196753.1.1447|KU352734.1.1377|LLQC01000080.229.1641|LN558607.1.1223"	"D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;D_6__Pseudomonas aeruginosa|D_6__Pseudomonas sp. 38(2011)|D_6__Pseudomonas sp. B7|D_6__Pseudomonas sp. BS-161R|D_6__Pseudomonas sp. DBTC4|D_6__Pseudomonas sp. DBTSML|D_6__Pseudomonas sp. LJLP1-15|D_6__Pseudomonas sp. PYD-4|D_6__Pseudomonas sp. VITDM1"	0	0	616
"AB508839.1.1527|AB813716.1.1291|AY030329.1.1502|EF422864.1.1474|EU363702.1.1320|EU366382.1.1482|EU586319.1.1450|EU679368.1.1464|EU780733.1.1447|FJ157236.1.1449|FJ769135.1.1435|FJ789808.1.1250|FJ863109.1.1429|FN556453.1.1454|GQ199587.1.1262|GQ280077.1.1432|GQ280079.1.1417|GQ280082.1.1432|GQ301542.1.1528|GU121487.1.1396|GU121494.1.1359|GU122948.1.1451|GU366049.2.1213|HM150646.1.1434|HM588147.1.1452|HQ021420.1.1577|HQ436036.1.1463|HQ731028.1.1462|HQ731029.1.1457|HQ834863.1.1471|HQ844504.1.1456|JF708240.1.1478|JQ424889.1.1463|JX156418.1.1500|KC310835.1.1447|KC405250.1.1449|KC683890.1.1387|KC855545.1.1458|KC855547.1.1459|KF254579.1.1446|KF482851.1.1452|KF574386.1.1433|KF844068.1.1524|KF860141.1.1455|KF917163.1.1326|KF917168.1.1524|KF928702.1.1454|KF928703.1.1455|KJ162241.1.1402|KJ743290.1.1518|KJ752760.1.1525|KP743130.1.1409|KP851955.1.1453|KP877505.1.1455|KR131622.1.1367|KT149667.1.1347|KT200495.1.1261|KT247502.1.1441|KT250765.1.1412|KT266579.1.1506|KT722838.1.1501|KU157226.1.1478|KU230011.1.1202|KX082973.1.1412|KX262911.1.1501|KX262912.1.1566"	"D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;Ambiguous_taxa|D_6__Bacillus amyloliquefaciens|D_6__Bacillus mojavensis|D_6__Bacillus sp. 1.143|D_6__Bacillus sp. 12-82|D_6__Bacillus sp. BAB-3438|D_6__Bacillus sp. BAB-4129|D_6__Bacillus sp. BJC2.1|D_6__Bacillus sp. CM4(2015)|D_6__Bacillus sp. CZB26|D_6__Bacillus sp. HYC-1-3|D_6__Bacillus sp. LX-119|D_6__Bacillus sp. LX-120|D_6__Bacillus sp. RPT0001|D_6__Bacillus sp. TT1|D_6__Bacillus sp. Ti28|D_6__Bacillus sp. YBN13|D_6__Bacillus sp. YM2|D_6__Bacillus sp. sadinb1|D_6__Bacillus subtilis|D_6__Bacillus subtilis subsp. subtilis|D_6__Bacillus tequilensis|D_6__Bacillus vallismortis|D_6__Bacillus velezensis|D_6__Geobacillus sp. RSNPB7|D_6__Paenibacillus sp. BAB-3433|D_6__bacterium ARb05|D_6__bacterium B1-6-2|D_6__bacterium enrichment culture clone 16(2011)|D_6__bacterium enrichment culture clone 79(2011)
AB523727.1.1479	D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Enterobacteriales;D_4__Enterobacteriaceae;D_5__Enterobacter;D_6__Enterobacteriaceae bacterium NES11
AB548850.1.1254|FJ901047.1.1308	D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;Ambiguous_taxa

This seems to stem from a mis-formatted reconstructed_taxonomy.tsv (line 3 starts but doesnt end in " and from line four on there isnt any " anymore)

"ID"	"Taxon"
"AB361591.1.1439|AY486367.1.1434|DQ140184.1.1403|DQ302158.1.1386|EF509324.1.1494|EF509367.1.1462|EF509378.1.1458|EF509460.1.1463|EF509596.1.1469|EF509605.1.1477|EF510026.1.1464|EF510037.1.1401|EF510941.1.1400|EF511012.1.1465|EU139850.1.1411|EU661692.1.1480|EU874609.1.1395|FJ940905.1.1467|GU122959.1.1401|GU181421.1.1378|HM480353.1.1479|HQ232955.1.1433|HQ455027.1.1444|HQ455028.1.1424|HQ880674.1.1424|JF723552.1.1398|JN846903.1.1315|JQ900535.1.1456|JQ900537.1.1448|KC862289.1.1429|KR811027.1.1488|KU196753.1.1447|KU352734.1.1377|LLQC01000080.229.1641|LN558607.1.1223"	"D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;D_6__Pseudomonas aeruginosa|D_6__Pseudomonas sp. 38(2011)|D_6__Pseudomonas sp. B7|D_6__Pseudomonas sp. BS-161R|D_6__Pseudomonas sp. DBTC4|D_6__Pseudomonas sp. DBTSML|D_6__Pseudomonas sp. LJLP1-15|D_6__Pseudomonas sp. PYD-4|D_6__Pseudomonas sp. VITDM1"
"AB508839.1.1527|AB813716.1.1291|AY030329.1.1502|EF422864.1.1474|EU363702.1.1320|EU366382.1.1482|EU586319.1.1450|EU679368.1.1464|EU780733.1.1447|FJ157236.1.1449|FJ769135.1.1435|FJ789808.1.1250|FJ863109.1.1429|FN556453.1.1454|GQ199587.1.1262|GQ280077.1.1432|GQ280079.1.1417|GQ280082.1.1432|GQ301542.1.1528|GU121487.1.1396|GU121494.1.1359|GU122948.1.1451|GU366049.2.1213|HM150646.1.1434|HM588147.1.1452|HQ021420.1.1577|HQ436036.1.1463|HQ731028.1.1462|HQ731029.1.1457|HQ834863.1.1471|HQ844504.1.1456|JF708240.1.1478|JQ424889.1.1463|JX156418.1.1500|KC310835.1.1447|KC405250.1.1449|KC683890.1.1387|KC855545.1.1458|KC855547.1.1459|KF254579.1.1446|KF482851.1.1452|KF574386.1.1433|KF844068.1.1524|KF860141.1.1455|KF917163.1.1326|KF917168.1.1524|KF928702.1.1454|KF928703.1.1455|KJ162241.1.1402|KJ743290.1.1518|KJ752760.1.1525|KP743130.1.1409|KP851955.1.1453|KP877505.1.1455|KR131622.1.1367|KT149667.1.1347|KT200495.1.1261|KT247502.1.1441|KT250765.1.1412|KT266579.1.1506|KT722838.1.1501|KU157226.1.1478|KU230011.1.1202|KX082973.1.1412|KX262911.1.1501|KX262912.1.1566"	"D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;Ambiguous_taxa|D_6__Bacillus amyloliquefaciens|D_6__Bacillus mojavensis|D_6__Bacillus sp. 1.143|D_6__Bacillus sp. 12-82|D_6__Bacillus sp. BAB-3438|D_6__Bacillus sp. BAB-4129|D_6__Bacillus sp. BJC2.1|D_6__Bacillus sp. CM4(2015)|D_6__Bacillus sp. CZB26|D_6__Bacillus sp. HYC-1-3|D_6__Bacillus sp. LX-119|D_6__Bacillus sp. LX-120|D_6__Bacillus sp. RPT0001|D_6__Bacillus sp. TT1|D_6__Bacillus sp. Ti28|D_6__Bacillus sp. YBN13|D_6__Bacillus sp. YM2|D_6__Bacillus sp. sadinb1|D_6__Bacillus subtilis|D_6__Bacillus subtilis subsp. subtilis|D_6__Bacillus tequilensis|D_6__Bacillus vallismortis|D_6__Bacillus velezensis|D_6__Geobacillus sp. RSNPB7|D_6__Paenibacillus sp. BAB-3433|D_6__bacterium ARb05|D_6__bacterium B1-6-2|D_6__bacterium enrichment culture clone 16(2011)|D_6__bacterium enrichment culture clone 79(2011)
AB523727.1.1479	D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Enterobacteriales;D_4__Enterobacteriaceae;D_5__Enterobacter;D_6__Enterobacteriaceae bacterium NES11
AB548850.1.1254|FJ901047.1.1308	D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;Ambiguous_taxa

QIIME2 barplot files looked fine though, so downstream it seems fine again.

Command used and terminal output

#that was the command that had the mis-formatted file
NXF_VER=23.10.1 nextflow run nf-core/ampliseq -r 2.9.0 -profile cfc --input illumina_multiregion_V1V2-V3V4-V6V8_samplesheet.tsv --multiregion illumina_multiregion_V1V2-V3V4-V6V8_multiregion.tsv --metadata illumina_multiregion_metadata.tsv --sidle_ref_taxonomy "silva=128" --skip_dada_taxonomy --skip_ancom --outdir ampliseq_illumina_multiregion_V1V2-V3V4-V6V8 -resume

#with this command I could not find any trouble:
NXF_VER=23.10.1 nextflow run nf-core/ampliseq -r 2.9.0 -profile cfc --input illumina_multiregion_V1V3-V4V5-V7V9_samplesheet.tsv --multiregion illumina_multiregion_V1V3-V4V5-V7V9_multiregion.tsv --metadata illumina_multiregion_metadata.tsv --sidle_ref_taxonomy "silva=128" --skip_dada_taxonomy --skip_ancom --outdir ampliseq_illumina_multiregion_V1V3-V4V5-V7V9 -resume

Relevant files

No response

System information

No response

@d4straub d4straub added the bug Something isn't working label Apr 18, 2024
@d4straub d4straub changed the title multi-region analysis: sidle/reconstructed/reconstructed_merged.tsv mis-formatted multi-region analysis: sidle/reconstructed/reconstructed_merged.tsv OCCATIONALLY mis-formatted Apr 18, 2024
@d4straub
Copy link
Collaborator Author

It seems like this is caused by un-common signs in the taxonomies. So its rather a file reading issue. The pipeline seems fine with it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant