Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ti_index_core] the indexes overlap or are out of bounds #73

Open
molecule53 opened this issue Apr 11, 2023 · 7 comments
Open

[ti_index_core] the indexes overlap or are out of bounds #73

molecule53 opened this issue Apr 11, 2023 · 7 comments

Comments

@molecule53
Copy link

Hello,
I am trying to run pairix tool for the analysis with cooler tool to create and contact map but running into this error:

(base) ubuntu@ip-172-31-18-119:/Data1$ pairix corrected2_porec_test.concatemers.pairs.txt.gz
[get_intv] the following line cannot be parsed and skipped: CONCAT0 + - UU 1 chr2 5443 chr1 3003 32 R1
[ti_index_core] the indexes overlap or are out of bounds

zcat corrected2_porec_test.concatemers.pairs.txt.gz | head -n 20

pairs format v1.0.0

#shape: whole matrix
#genome_assembly: unknown
#chromsize: chr1 3577
#chromsize: chr2 7551
#samheader: @sq SN:chr1 LN:3577
#samheader: @sq SN:chr2 LN:7551
#samheader: CL:minimap2 -ay -t 2 @pg PN:minimap2 ID:minimap2 VN:2.24-r1122 map-ont -x
#samheader: PP:minimap2 CL:/home/epi2melabs/conda/bin/pore-c-py annotate - @pg PN:pore-c-py ID:pore-c-py-2 VN:2.0.1 --monomers porec_test.concatemers
#samheader: parse2 --output-stats porec_test.concatemers.stats.txt -c @pg ID:pairtools_parse2 PN:pairtools_parse2 CL:/home/epi2melabs/conda/bin/pairtools --single-end fasta.fai
#samheader: restrict -f fragments.bed -o @pg ID:pairtools_restrict PN:pairtools_restrict CL:/home/epi2melabs/conda/bin/pairtools extract_pairs.tmp porec_test.concatemers.pairs.gz
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type walk_pair_index walk_pair_type mapq1 mapq2 pos51 pos52 pos31 pos32 cigar1 cigar2 read_len1 read_len2 matched_bp1 matched_bp2 algn_ref_span1 algn_ref_span2 algn_read_span1 algn_read_span2 dist_to_51 dist_to_52 dist_to_31 dist_to_32 mismatches1 mismatches2 rfrag1 rfrag_start1 rfrag_end1 rfrag2 rfrag_start2 rfrag_end2
CONCAT0 + - UU 1 chr2 5443 chr1 3003 32 R1
CONCAT0 + - UU 2 chr1 1104 chr1 1103 60 R1
CONCAT0 + - UU 3 chr1 602 chr2 6455 60 R1
CONCAT0 + + UN 4 chr2 5530 ! 0 60 R1
CONCAT0 - - NU 5 ! 0 chr2 6538 0 R1
CONCAT0 + - UU 6 chr2 6456 chr2 5442 51 R1
CONCAT1 + - UU 1 chr1 3004 chr2 6538 60 R1
CONCAT2 + - UU 1 chr1 1104 chr1 601 60 R1

@SooLee
Copy link
Member

SooLee commented Apr 11, 2023 via email

@molecule53
Copy link
Author

I just tried sorting:

pairix sorted_Test_Galaxy_20230203_Pore-C-70K_C_fastq_to_bamsorted.pairs.txt.gz
[get_intv] the following line cannot be parsed and skipped: 0000575c-de6a-4338-bac8-cdd60d8c5a90 ! 0 ! 0 - + NN 1
[ti_index_core] the indexes overlap or are out of bounds

Here is my new sorted file before bgzip:
cat sorted_Test_Galaxy_20230203_Pore-C-70K_C_fastq_to_bamsorted.pairs.txt | tail -n 50
45bb6b15-44f2-4e56-82fa-8e21ac40c855 Chr5 26963502 Chr5 25595629 + + UU 3
2b35315c-299d-4d4b-aab3-4f6c41d02058 Chr5 26963502 Chr5 25602432 + - UU 6
c949465b-b890-4844-a28a-3806ceecb4f8 Chr5 26963502 Chr5 26961514 - + UU 4
712dbfd8-bd4a-4d2d-afd0-f6594617edb3 Chr5 26963505 Chr5 756533 - - UU 3
50f508a9-488e-4e39-8703-3341f0c9a70b Chr5 26963505 Chr5 26961514 - + UU 2
33cfaaa7-5c00-4ca3-9123-566f648944b6 Chr5 26963505 Chr5 26967237 - + UU 1
30a86009-39df-4fd1-9dc4-9411f673ee62 Chr5 26964548 Chr5 26566570 + + UU 2
9c1cf00f-1ab8-4447-9c77-3859f84dd40c Chr5 26965340 Chr5 63152 + + UU 2
7b40584f-1080-4e6c-a093-0df206ac0f62 Chr5 26965340 Chr5 26191756 + - UU 3
45a058c9-44e0-41bc-a4a8-0ab5fa43a3b8 Chr5 26965343 Chr5 11704004 - + UU 1
c949465b-b890-4844-a28a-3806ceecb4f8 Chr5 26965343 Chr5 26959362 - + UU 2
f3e8780e-8974-480a-b7d2-569686aeb629 Chr5 26965343 Chr5 26962556 - + UU 2
b0d06e56-181e-4dac-8398-345b061ad7e5 Chr5 26965343 Chr5 26962569 - + UU 2
a3ee7f1d-3dde-4ed8-9ebb-679eceb3eb9b Chr5 26965343 Chr5 26965344 - + UU 1
0f7fa3b6-57ad-478a-9c5b-81886b2b1a7b Chr5 26968475 Chr5 26968471 + - UU 4
0f7fa3b6-57ad-478a-9c5b-81886b2b1a7b Chr5 26968702 Chr5 1452557 + + UU 1
d2cfbf8e-f961-4a4f-b69f-2cfcb6f6ff2e Chr5 26968702 Chr5 20478337 + + UU 3
821da5c8-55cb-47d5-8c8f-3b2b27a96db1 Chr5 26968702 Chr5 26971328 + - UU 2
111685de-003c-4c25-af8a-6a70caa413d5 Chr5 26968705 Chr5 26576591 - - UU 1
33cfaaa7-5c00-4ca3-9123-566f648944b6 Chr5 26968705 Chr5 26963506 - + UU 2
f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26968705 Chr5 26971333 - + UU 3
d06413ae-e1f7-4a44-b1c5-dd8bee87606c Chr5 26968965 Chr5 689615 + - UU 2
7e829b73-1d2a-43e4-93ae-6051f2efbb46 Chr5 26968965 Chr5 3670881 + + UU 2
5696f59d-992e-44f3-bc43-c5d3e7b3310b Chr5 26968965 Chr5 26970537 + - UU 1
47eb03d3-49f1-4a13-92ad-faffe3a2d88b Chr5 26968965 Chr5 26972686 + - UU 2
52c598be-01bf-4989-9187-d499888473bd Chr5 26968965 Chr5 26972689 + - UU 2
6a5adb50-3aae-4ba9-b30e-9b0cc8b48c66 Chr5 26968965 Chr5 26972689 + - UU 1
1f297eda-db06-4b1c-82b2-c3a14f3ed04b Chr5 26968968 Chr5 3507431 - - UU 1
f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26968968 Chr5 26967251 - + UU 2
5696f59d-992e-44f3-bc43-c5d3e7b3310b Chr5 26969914 Chr5 26968964 + - UU 2
f2afa522-f322-4890-888a-fef4434b1441 Chr5 26970495 Chr5 24855805 - - UU 5
52c598be-01bf-4989-9187-d499888473bd Chr5 26971329 Chr5 26973320 + - UU 3
f3c3951b-c94f-4b2f-8e13-fd980e1e39e1 Chr5 26971332 Chr5 3398919 - - UU 1
f2afa522-f322-4890-888a-fef4434b1441 Chr5 26971332 Chr5 26376061 - - UU 1
d6d5e650-6579-4e6d-a015-d9d86ab9f83b Chr5 26971332 Chr5 26968706 - + UU 3
f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26971332 Chr5 26968706 - + UU 1
e012f3ec-dc9d-4a51-967e-b30b1aeb6cc0 Chr5 26972690 Chr5 26974046 + - UU 2
c8617e93-3a38-4334-8bac-ff627804bf88 Chr5 26972690 Chr5 26975502 + - UU 1
d6d5e650-6579-4e6d-a015-d9d86ab9f83b Chr5 26972693 Chr5 26968969 - + UU 2
259632b1-a672-41f7-86aa-b04605b15c69 Chr5 26972693 Chr5 26971333 - + UU 1
3be4e565-7a97-4bac-bc7d-775183af421f Chr5 26973548 Chr5 25085045 + - UU 3
74e4c442-d98e-451f-bbe3-a79e91288783 Chr5 26973548 Chr5 26975502 + - UU 2
cf14d609-5521-48bf-8944-3e4ba9c1152a Chr5 26973551 Chr5 88460 - + UU 7
ce284cca-8966-4db2-9220-e2f95e767bb9 Chr5 26973551 Chr5 2077524 - - UU 2
dd5c42b4-13db-4ef4-a30b-3e9604bd0208 Chr5 26973551 Chr5 26135458 - + UU 3
941c415e-b004-426b-9478-c48a9c05029e Chr5 26973551 Chr5 26766316 - - UU 1

pairs format v1.0.0

#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
#genome_assembly: unknown
#shape: whole matrix

@SooLee
Copy link
Member

SooLee commented Apr 11, 2023 via email

@molecule53
Copy link
Author

Sorry, it is me again.

Here is my original .pairs file ForSoring_porec_test.concatemers.pairs.txt.

Some line have "!"instead of "chr#". Could it be a problem?

cat ForSoring_porec_test.concatemers.pairs.txt | head -n 20

pairs format v1.0.0

#shape: whole matrix
#genome_assembly: unknown
#chromsize: chr1 3577
#chromsize: chr2 7551
#samheader: @sq SN:chr1 LN:3577
#samheader: @sq SN:chr2 LN:7551
#samheader: @pg PN:minimap2 ID:minimap2 VN:2.24-r1122
#samheader: @pg PN:pore-c-py ID:pore-c-py-2 VN:2.0.1
#samheader: @pg ID:pairtools_parse2 PN:pairtools_parse2 CL:/home/epi2melabs/conda/bin/pairtools
#samheader: @pg ID:pairtools_restrict PN:pairtools_restrict CL:/home/epi2melabs/conda/bin/pairtools
#columns: readID chrom1 pos1 chrom2
CONCAT0 chr2 5443 chr1 3003
CONCAT0 chr1 1104 chr1 1103
CONCAT0 chr1 602 chr2 6455
CONCAT0 chr2 5530 ! 0
CONCAT0 ! 0 chr2 6538
CONCAT0 chr2 6456 chr2 5442
CONCAT1 chr1 3004 chr2 6538
CONCAT2 chr1 1104 chr1 601

Then I sort:
.../Data1$ sort -k2,2 -k4,4 -k3,3n -k5,5n ForSoring_porec_test.concatemers.pairs.txt > Sorted_porec_test.concatemers.pairs.txt

.../Data1$ cat Sorted_porec_test.concatemers.pairs.txt | head -n 20
CONCAT107 ! 0 chr1 601
CONCAT148 ! 0 chr1 601
CONCAT171 ! 0 chr1 601
CONCAT175 ! 0 chr1 601
CONCAT180 ! 0 chr1 601
CONCAT185 ! 0 chr1 601
CONCAT211 ! 0 chr1 601
CONCAT27 ! 0 chr1 601
CONCAT277 ! 0 chr1 601
CONCAT31 ! 0 chr1 601
CONCAT312 ! 0 chr1 601
CONCAT353 ! 0 chr1 601
CONCAT471 ! 0 chr1 601
CONCAT491 ! 0 chr1 601
CONCAT512 ! 0 chr1 601
CONCAT514 ! 0 chr1 601
CONCAT593 ! 0 chr1 601
CONCAT611 ! 0 chr1 601
CONCAT619 ! 0 chr1 601
CONCAT638 ! 0 chr1 601

.../Data1$ cat Sorted_porec_test.concatemers.pairs.txt | tail -n 20
CONCAT829 chr2 6765 chr2 6764
CONCAT836 chr2 6765 chr2 6764
CONCAT841 chr2 6765 chr2 6764
CONCAT846 chr2 6765 chr2 6764
CONCAT870 chr2 6765 chr2 6764
CONCAT872 chr2 6765 chr2 6764
CONCAT875 chr2 6765 chr2 6764
CONCAT883 chr2 6765 chr2 6764
CONCAT885 chr2 6765 chr2 6764
CONCAT899 chr2 6765 chr2 6764
CONCAT901 chr2 6765 chr2 6764
CONCAT915 chr2 6765 chr2 6764
CONCAT931 chr2 6765 chr2 6764
CONCAT966 chr2 6765 chr2 6764
CONCAT989 chr2 6765 chr2 6764
CONCAT997 chr2 6765 chr2 6764

pairs format v1.0.0

#columns: readID chrom1 pos1 chrom2
#genome_assembly: unknown
#shape: whole matrix

.../Data1$ bgzip Sorted_porec_test.concatemers.pairs.txt

.../Data1$ pairix Sorted_porec_test.concatemers.pairs.txt.gz
[get_intv] the following line cannot be parsed and skipped: CONCAT107 ! 0 chr1 601
[ti_index_core] the indexes overlap or are out of bounds

No index file generated!

@SooLee
Copy link
Member

SooLee commented Apr 12, 2023 via email

@molecule53
Copy link
Author

Hi,
Sorry, I am not sure what do you mean. I have a sorted file at this point. How do I change to 1-based index. At what step? Are there any specific instructions that I can use?

@maize821
Copy link

I had the same problem, you can try this parameter: pairix -p pairs -f corrected2_porec_test.concatemers.pairs.txt.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants