Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV header/data length mismatch 5 != 3 on row that does not exist #1496

Open
vera-rykalina opened this issue Feb 15, 2024 · 2 comments
Open

Comments

@vera-rykalina
Copy link

Hi! I am joining 4 csv tables with the same number of rows. A mlr command is implemented in a Nextflow process.

Command:

script:
"""
mlr
--csv join
-u
--ul
--ur
-j SequenceName
-f ${stanford} ${comet} |
mlr --csv join -u --ul --ur -j SequenceName -f ${g2p} |
mlr --csv join -u --ul --ur -j SequenceName -f ${rega} > joint_${comet.getSimpleName().split('comet_')[1]}.csv
"""
comet, stanford, rega, and g2p are my csv tables.

The join was working without any problem last week, but since today I have been having this errow:

CSV header/data length mismatch 5 != 3 at filename (stdin) row 1176.

The thing is that 1176 row does not exist in any of my csv tables.
All my tables have 3 columns and 1175 rows each.

Any idea what is going on here?

Thanks,
Vera

@johnkerl
Copy link
Owner

@vera-rykalina is it possible for you to share your data files, e.g. at gist.github.com?

@johnkerl
Copy link
Owner

johnkerl commented Feb 15, 2024

Also, I suspect that the output of mlr --csv join -u --ul --ur -j SequenceName -f ${g2p} is intermediate data which does have 1176 rows (which can happen if there is a duplicate value of SequenceName) ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants