Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running RMS tests failed. #70

Open
chenji333 opened this issue Jun 16, 2021 · 18 comments
Open

Running RMS tests failed. #70

chenji333 opened this issue Jun 16, 2021 · 18 comments

Comments

@chenji333
Copy link

Hello, I recently wanted to use methylpy to calculate DMR, I processed my methylation data into the allc file format,:

8 7524770 + CTCGC 15 15 1
8 7524782 + GGCGC 19 19 1
8 7524784 + CGCGA 20 20 1
8 7524822 + GCCGC 21 21 1
8 7524826 + CACGC 21 21 1
8 7524867 + AACGC 21 21 1
My methylation file contains methylation data on 11 chromosomes.
I used the following command:
methylpy DMRfind --allc-files guo_1.tsv hua_1.tsv --samples FR FL --mc-type "CGN" --chroms 1 2 3 4 5 6 7 8 9 10 11 --num-procs 8 --output-prefix DMR_FR_FL

But I got an error like this:

Filtering allc files using 2 node(s).
Wed Jun 16 20:12:47 2021

Splitting allc files for chromosome 1
Wed Jun 16 20:12:56 2021

<class 'KeyError'> 179
'1'
Running RMS tests failed.

I don't know what the problem is, I hope to advise. thank you.

@yupenghe
Copy link
Owner

Maybe it is the index file. Can you check whether the chromosome information is correctly stored in .idx files? For example, guo_1.tsv.idx and hua_1.tsv.idx.

@chenji333
Copy link
Author

I don't have an index file, and my data is not BS-seq data. So I did not perform build-reference, Processing single-end data, and Processing paired-end data processes. I just changed my methylation result data to a similar allc file format, so the seventh column of my "allc file" was simply set to 1.
Can methylpy software skip the previous steps and only calculate DMR?

@yupenghe
Copy link
Owner

Yes, DMRfind only needs allc files. Do you mind to share the two files for me to reproduce the error?

@chenji333
Copy link
Author

Ok.If this step can be achieved,it will be very helpful for me .Thank you very much. What is your email address? My file is a little big.

@yupenghe
Copy link
Owner

Can you reproduce the error with the first say 30 lines of allc files?

@chenji333
Copy link
Author

Ok.My methylation data is:
HUA.tsv
chr pos strand CG count_modified coverage
1 37 + GGCGG 3 4
1 38 - ACCGC 3 3
1 74 + ACCGC 5 5
1 75 - GGCGG 2 2
1 138 + GGCGG 3 4
1 206 + TTCGG 6 6
1 207 - GCCGA 3 3
1 210 + GCCGA 6 6
1 211 - ATCGG 5 5
1 222 + TTCGC 0 4
1 223 - AGCGA 3 3
1 228 + GACGG 4 5
1 229 - CCCGT 5 5
1 232 + GGCGG 4 4
1 233 - CCCGC 1 2
1 304 + CCCGA 1 1
1 305 - CTCGG 4 4
1 325 + ATCGC 5 5
1 326 - GGCGA 2 2
1 349 + ATCGG 5 5
1 350 - ACCGA 5 5
1 373 + GGCGG 3 4
1 374 - ACCGC 4 4
1 410 + CACGC 4 5
1 411 - GGCGT 5 5
1 418 + TTCGG 5 5
1 419 - GCCGA 5 5
1 483 + GGCGG 5 5
1 484 - ACCGC 3 3
1 503 + ATCGT 4 5
YE.tsv
1 37 + GGCGG 5 5
1 38 - ACCGC 3 5
1 74 + ACCGC 4 5
1 75 - GGCGG 3 3
1 138 + GGCGG 4 6
1 139 - ACCGC 3 3
1 206 + TTCGG 7 8
1 207 - GCCGA 6 6
1 210 + GCCGA 7 7
1 211 - ATCGG 7 7
1 222 + TTCGC 2 10
1 223 - AGCGA 5 5
1 228 + GACGG 8 8
1 229 - CCCGT 6 6
1 232 + GGCGG 9 9
1 233 - CCCGC 5 7
1 304 + CCCGA 5 5
1 305 - CTCGG 7 7
1 325 + ATCGC 7 7
1 326 - GGCGA 6 6
1 349 + ATCGG 8 8
1 350 - ACCGA 7 7
1 373 + GGCGG 8 8
1 374 - ACCGC 5 6
1 410 + CACGC 7 8
1 411 - GGCGT 7 7
1 418 + TTCGG 8 8
1 419 - GCCGA 4 4
1 483 + GGCGG 6 8
1 484 - ACCGC 6 6
But I have a problem. The seventh column of the allc file mentioned in the tutorial is to be calculated, but there is no way to get the seventh column from my file through the calculation in the tutorial.
So I write 1 in the seventh column of the file.
The file format is:
1 37 + GGCGG 3 4 1
1 38 - ACCGC 3 3 1
1 74 + ACCGC 5 5 1
1 75 - GGCGG 2 2 1
1 138 + GGCGG 3 4 1
1 206 + TTCGG 6 6 1
1 207 - GCCGA 3 3 1
1 210 + GCCGA 6 6 1
1 211 - ATCGG 5 5 1
1 222 + TTCGC 0 4 1
1 223 - AGCGA 3 3 1
1 228 + GACGG 4 5 1
1 229 - CCCGT 5 5 1
1 232 + GGCGG 4 4 1
1 233 - CCCGC 1 2 1
1 304 + CCCGA 1 1 1
1 305 - CTCGG 4 4 1
1 325 + ATCGC 5 5 1
1 326 - GGCGA 2 2 1
1 349 + ATCGG 5 5 1
1 350 - ACCGA 5 5 1
1 373 + GGCGG 3 4 1
1 374 - ACCGC 4 4 1
1 410 + CACGC 4 5 1
1 411 - GGCGT 5 5 1
1 418 + TTCGG 5 5 1
1 419 - GCCGA 5 5 1
1 483 + GGCGG 5 5 1
1 484 - ACCGC 3 3 1
1 503 + ATCGT 4 5 1
My command is:
methylpy/bin/methylpy DMRfind --allc-files blue_guo_1.tsv blue_hua_1.tsv --samples FR FL --mc-type "CGN" --chroms 1 --output-prefix DMR_hua_1.tsv --samples guo_1 hua_1 --mc-type "CGN" --chroms 1 --output-prefix DMR_FR_FL
Filtering allc files using single node.
Mon Jun 21 11:20:19 2021

Splitting allc files for chromosome 1
Mon Jun 21 11:20:19 2021

<class 'KeyError'> 179
'1'
Running RMS tests failed.
Is it the reason that the seventh column of my allc file is not calculated?

@frimpz
Copy link

frimpz commented Jun 21, 2021

I am also facing the same problem, please have you been able to find the error.

@yupenghe
Copy link
Owner

It is totally fine to set the last column to be 1. The current issue is that the context column (4th) format in the input file is not supported by methylpy. Reformatting the sequence context as the the last three bases should fix this problem. For example, ACCGC -> CGC where the first C is the cytosine of interest.

@frimpz
Copy link

frimpz commented Jun 22, 2021

Hi is there a way to set --chroms 1 2 parameter to accept more than one string. For example my data is formated as NC_037328.1 but the map function splits it into ["N", "C", " _", "0", "3", "7", "3", "2", "8", ".", "1" ]. That is the cause of my error. Is there a way to set it, my data is very large and I am reluctant to reformat it?

@yupenghe
Copy link
Owner

Methylpy should be able to handle the chromosome names with more than one characters like chr1. Can you post the command you ran?

@frimpz
Copy link

frimpz commented Jun 22, 2021

methylpy DMRfind
--allc-files all_files/allc_ARS-UCD1_CTRL1.tsv.gz all_files/allc_ARS-UCD1_CTRL2.tsv.gz
--samples ARS-UCD1_CTRL1 ARS-UCD1_CTRL2
--mc-type "CGN"
--chroms NC_037328.1
--num-procs 64
--output-prefix DMR_CTRL1_CTRL2

@yupenghe
Copy link
Owner

What version of methylpy are you using? I am not able to reproduce your error. Below are what I tried. Input files are attached. Are you able to run the below command without error?

methylpy DMRfind --allc-files allc_sample_1.tsv.gz allc_sample_2.tsv.gz --samples ARS-UCD1_CTRL1 ARS-UCD1_CTRL2 --mc-type "CGN" --chroms NC_037328.1 --num-procs 64 --output-prefix DMR_CTRL1_CTRL2

Input files:
allc_sample_1.tsv.gz
allc_sample_2.tsv.gz

@frimpz
Copy link

frimpz commented Jun 22, 2021

I am using methylpy 1.4.3 version. The example you gave me works for me also but my input is not working.

This is the exact error that I get:
Splitting allc files for chromosome NC_037328.1
Mon Jun 21 20:08:37 2021

<class 'KeyError'> 184
'NC_037328.1'
Running RMS tests failed.

@yupenghe
Copy link
Owner

Do you mind to share the first 20 lines of your allc files?

@frimpz
Copy link

frimpz commented Jun 22, 2021

NC_037328.1,28599,+,CAG,0,1,1
NC_037328.1,34167,+,CTG,0,2,1
NC_037328.1,47181,-,CAT,0,1,1
NC_037328.1,134883,-,CAT,0,1,1
NC_037328.1,138299,-,CAT,0,2,1
NC_037328.1,138300,+,CCT,0,2,1
NC_037328.1,138301,+,CTG,0,2,1
NC_037328.1,138303,-,CAG,0,2,1
NC_037328.1,138306,-,CAT,0,2,1
NC_037328.1,138310,+,CAC,0,2,1
NC_037328.1,138312,+,CAG,0,2,1
NC_037328.1,138314,-,CTG,0,2,1
NC_037328.1,138317,+,CAA,0,2,1
NC_037328.1,138320,-,CTT,0,2,1
NC_037328.1,138322,-,CAC,0,2,1
NC_037328.1,140407,-,CTA,0,4,1
NC_037328.1,140408,-,CCT,0,4,1
NC_037328.1,140409,+,CAA,0,4,1
NC_037328.1,145179,-,CAG,0,1,1
NC_037328.1,145180,-,CCA,0,1,1
NC_037328.1,145868,-,CAA,0,3,1
NC_037328.1,146655,+,CAA,1,5,1
NC_037328.1,149309,-,CAG,0,1,1
NC_037328.1,149359,-,CAG,0,1,1
NC_037328.1,149361,-,CAC,0,1,1
NC_037328.1,149364,-,CAT,0,1,1
NC_037328.1,152099,-,CAT,0,1,1
NC_037328.1,152107,-,CTA,0,1,1
NC_037328.1,152109,-,CAC,0,1,1
NC_037328.1,153427,-,CAT,0,1,1
NC_037328.1,153435,-,CTA,0,1,1
NC_037328.1,153437,-,CAC,0,1,1
NC_037328.1,156494,-,CAT,0,1,1
NC_037328.1,156496,+,CTC,0,1,1
NC_037328.1,156498,+,CAT,0,1,1
NC_037328.1,156502,-,CTA,0,2,1
NC_037328.1,156504,-,CAC,0,2,1
NC_037328.1,156505,+,CAC,0,1,1
NC_037328.1,156507,+,CTC,0,1,1
NC_037328.1,156509,+,CTT,0,1,1
NC_037328.1,156512,+,CAC,0,1,1
NC_037328.1,157799,-,CAT,0,2,1
NC_037328.1,157801,+,CTC,0,2,1
NC_037328.1,157803,+,CAT,0,2,1
NC_037328.1,157807,-,CTA,0,2,1
NC_037328.1,157809,-,CAC,0,2,1
NC_037328.1,157810,+,CAC,0,2,1
NC_037328.1,157812,+,CTC,0,3,1
NC_037328.1,157814,+,CTT,0,3,1
NC_037328.1,157817,+,CAC,0,3,1
NC_037328.1,157819,+,CCT,0,3,1
NC_037328.1,158294,-,CAA,0,3,1
NC_037328.1,158509,-,CCT,0,7,1
NC_037328.1,158559,+,CAT,0,5,1
NC_037328.1,158562,+,CAA,0,5,1
NC_037328.1,158566,+,CAG,0,5,1
NC_037328.1,158590,+,CGC,4,5,1
NC_037328.1,158591,-,CGG,5,7,1
NC_037328.1,158592,+,CTA,0,5,1
NC_037328.1,158596,-,CTT,0,7,1
NC_037328.1,158597,+,CTG,0,5,1
NC_037328.1,158599,-,CAG,0,7,1
NC_037328.1,158600,-,CCA,0,7,1
NC_037328.1,158601,+,CAA,0,5,1
NC_037328.1,158606,-,CAA,0,6,1
NC_037328.1,158608,+,CCA,0,5,1
NC_037328.1,158609,+,CAG,0,5,1
NC_037328.1,158611,-,CTG,0,6,1
NC_037328.1,158612,+,CTG,0,5,1
NC_037328.1,158614,-,CAG,0,6,1
NC_037328.1,158617,-,CAT,0,6,1
NC_037328.1,158619,+,CCA,0,5,1
NC_037328.1,158620,+,CAA,0,5,1
NC_037328.1,158623,-,CTT,0,3,1
NC_037328.1,159987,+,CTG,0,4,1
NC_037328.1,159989,-,CAG,0,8,1
NC_037328.1,161149,+,CAT,0,6,1
NC_037328.1,161153,+,CTG,0,7,1
NC_037328.1,161155,-,CAG,0,1,1
NC_037328.1,161156,+,CTA,0,7,1
NC_037328.1,161160,-,CTT,0,1,1
NC_037328.1,161161,+,CTG,0,6,1
NC_037328.1,161163,-,CAG,0,1,1
NC_037328.1,161165,+,CAA,0,6,1
NC_037328.1,161169,+,CAT,0,6,1
NC_037328.1,161172,+,CAA,1,6,1
NC_037328.1,161176,+,CAG,0,4,1
NC_037328.1,161229,+,CCA,0,2,1
NC_037328.1,161230,+,CAG,0,2,1
NC_037328.1,161287,+,CAT,0,2,1
NC_037328.1,161291,+,CAC,0,2,1
NC_037328.1,161293,+,CCT,0,2,1
NC_037328.1,161294,+,CTC,0,2,1
NC_037328.1,161296,+,CAA,0,2,1
NC_037328.1,162233,+,CTG,0,1,1
NC_037328.1,162235,-,CAG,0,5,1
NC_037328.1,162237,+,CCA,0,1,1
NC_037328.1,163011,-,CTG,0,3,1
NC_037328.1,163141,+,CAT,0,5,1
NC_037328.1,163144,-,CAT,0,3,1
NC_037328.1,163145,+,CTG,0,5,1
NC_037328.1,163147,-,CAG,0,3,1
NC_037328.1,163149,-,CTC,0,3,1
NC_037328.1,163151,+,CGT,4,5,1
NC_037328.1,163152,-,CGT,2,3,1
NC_037328.1,163154,-,CAC,0,3,1
NC_037328.1,163156,-,CAC,0,3,1
NC_037328.1,163160,+,CCT,0,6,1
NC_037328.1,163161,+,CTG,0,6,1
NC_037328.1,163163,-,CAG,0,3,1
NC_037328.1,163164,+,CTT,0,6,1
NC_037328.1,163168,+,CTC,0,6,1
NC_037328.1,163170,+,CAG,0,6,1
NC_037328.1,163172,-,CTG,0,3,1
NC_037328.1,163173,+,CTG,0,6,1
NC_037328.1,163175,-,CAG,0,3,1
NC_037328.1,163176,+,CTG,0,6,1
NC_037328.1,163321,+,CGC,4,5,1
NC_037328.1,163322,-,CGC,1,1,1
NC_037328.1,163323,+,CAT,0,4,1
NC_037328.1,163347,+,CAT,0,4,1
NC_037328.1,163351,+,CAG,0,4,1
NC_037328.1,163353,-,CTG,0,1,1
NC_037328.1,163355,+,CAC,0,4,1
NC_037328.1,163357,+,CGT,2,4,1
NC_037328.1,163358,-,CGT,1,2,1
NC_037328.1,163360,-,CAC,0,1,1
NC_037328.1,163361,+,CTC,0,4,1
NC_037328.1,163363,+,CAC,0,4,1
NC_037328.1,163365,+,CTA,0,4,1
NC_037328.1,163368,+,CCT,0,4,1
NC_037328.1,163369,+,CTG,0,4,1
NC_037328.1,163371,-,CAG,0,1,1
NC_037328.1,163372,+,CTC,0,4,1
NC_037328.1,163374,+,CAG,0,4,1
NC_037328.1,163376,-,CTG,0,1,1
NC_037328.1,163378,+,CAT,0,4,1
NC_037328.1,163463,-,CAA,0,1,1
NC_037328.1,163465,-,CAC,0,1,1
NC_037328.1,163466,+,CAA,0,2,1
NC_037328.1,163470,+,CAA,0,2,1
NC_037328.1,163473,-,CTT,0,1,1
NC_037328.1,163474,-,CCT,0,1,1
NC_037328.1,163475,-,CCC,0,1,1
NC_037328.1,163478,+,CAG,0,2,1
NC_037328.1,163480,-,CTG,0,1,1
NC_037328.1,163481,+,CAC,0,2,1
NC_037328.1,163483,+,CAT,0,2,1
NC_037328.1,163570,-,CTT,0,1,1
NC_037328.1,163572,-,CAC,0,1,1
NC_037328.1,163858,-,CGA,3,3,1
NC_037328.1,163859,-,CCG,0,3,1
NC_037328.1,163860,-,CCC,0,3,1
NC_037328.1,163863,+,CAG,0,3,1
NC_037328.1,163865,-,CTG,0,3,1
NC_037328.1,163867,-,CAC,0,3,1
NC_037328.1,163868,+,CAT,0,3,1
NC_037328.1,164178,-,CAA,0,4,1
NC_037328.1,164499,+,CAG,0,2,1
NC_037328.1,164501,-,CTG,0,2,1
NC_037328.1,164662,+,CAC,0,5,1
NC_037328.1,164664,+,CGA,4,5,1
NC_037328.1,164668,+,CTG,0,5,1
NC_037328.1,164671,+,CCA,0,5,1
NC_037328.1,164672,+,CAC,0,5,1
NC_037328.1,164674,+,CGT,1,5,1
NC_037328.1,164843,-,CAT,1,3,1
NC_037328.1,165849,-,CAC,0,3,1
NC_037328.1,166080,+,CAT,0,3,1
NC_037328.1,166083,+,CTG,0,3,1
NC_037328.1,166085,-,CAG,0,2,1
NC_037328.1,166086,+,CTT,0,3,1
NC_037328.1,166383,-,CAA,0,2,1
NC_037328.1,166384,+,CGT,0,1,1
NC_037328.1,166385,-,CGC,1,2,1
NC_037328.1,166387,-,CAC,0,2,1
NC_037328.1,166392,-,CGA,1,2,1
NC_037328.1,166396,-,CGA,1,2,1
NC_037328.1,166398,-,CAC,0,2,1
NC_037328.1,166403,-,CTG,0,2,1
NC_037328.1,166707,+,CAT,0,1,1
NC_037328.1,166712,+,CAT,0,1,1
NC_037328.1,168185,+,CAA,0,6,1
NC_037328.1,168188,-,CTT,0,2,1
NC_037328.1,168189,+,CCT,0,6,1
NC_037328.1,168190,+,CTC,0,6,1
NC_037328.1,168192,+,CAG,0,6,1
NC_037328.1,168194,-,CTG,0,2,1
NC_037328.1,168195,-,CCT,0,2,1
NC_037328.1,168681,+,CTC,0,2,1
NC_037328.1,168683,+,CTG,0,2,1
NC_037328.1,168685,-,CAG,0,3,1
NC_037328.1,168686,-,CCA,0,3,1
NC_037328.1,168691,-,CTT,0,3,1
NC_037328.1,168692,+,CTG,0,2,1
NC_037328.1,168694,-,CAG,0,3,1
NC_037328.1,168769,+,CCA,0,2,1
NC_037328.1,168770,+,CAC,0,2,1
NC_037328.1,168810,+,CGA,2,2,1
NC_037328.1,168815,+,CAT,0,2,1
NC_037328.1,168819,+,CTT,0,1,1

@yupenghe
Copy link
Owner

Ah, the fields need to be tab separated. Can we try fixing the format and running DMRfind?

@frimpz
Copy link

frimpz commented Jun 22, 2021

Its tab separated; the split I replace the split with coma.

Howvere doing a print gives this:

NC_037328.1 28599 + CAG 0 1 1

NC_037328.1 34167 + CTG 0 2 1

NC_037328.1 47181 - CAT 0 1 1

NC_037328.1 134883 - CAT 0 1 1

NC_037328.1 138299 - CAT 0 2 1

NC_037328.1 138300 + CCT 0 2 1

NC_037328.1 138301 + CTG 0 2 1

NC_037328.1 138303 - CAG 0 2 1

NC_037328.1 138306 - CAT 0 2 1

NC_037328.1 138310 + CAC 0 2 1

NC_037328.1 138312 + CAG 0 2 1

NC_037328.1 138314 - CTG 0 2 1

NC_037328.1 138317 + CAA 0 2 1

NC_037328.1 138320 - CTT 0 2 1

NC_037328.1 138322 - CAC 0 2 1

NC_037328.1 140407 - CTA 0 4 1

NC_037328.1 140408 - CCT 0 4 1

NC_037328.1 140409 + CAA 0 4 1

NC_037328.1 145179 - CAG 0 1 1

NC_037328.1 145180 - CCA 0 1 1

NC_037328.1 145868 - CAA 0 3 1

NC_037328.1 146655 + CAA 1 5 1

NC_037328.1 149309 - CAG 0 1 1

could the extra space be the problem?

@yupenghe
Copy link
Owner

It seems that there are no CGN sites in your allc files. Is that correct? If so, that could be the cause of the problem.

Do you also get the same error by running this?

methylpy DMRfind
--allc-files all_files/allc_ARS-UCD1_CTRL1.tsv.gz all_files/allc_ARS-UCD1_CTRL2.tsv.gz
--samples ARS-UCD1_CTRL1 ARS-UCD1_CTRL2
--mc-type "CAG"
--chroms NC_037328.1
--num-procs 64
--output-prefix DMR_CTRL1_CTRL2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants