Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restriction with Arima kit #391

Open
jeff-godwin opened this issue Apr 19, 2024 · 10 comments
Open

Restriction with Arima kit #391

jeff-godwin opened this issue Apr 19, 2024 · 10 comments

Comments

@jeff-godwin
Copy link

Dear team,

I would like to clarify if the --renz parameter in tadbit map supports the restriction site for the
ARIMA kit (https://arimagenomics.com/products/genome-wide-hic/) - ^GATC,G^ANTC.
If so, what is the string I need to use for the --renz parameter?

Many thanks,
Jeff

@fransua
Copy link
Member

fransua commented Apr 19, 2024

Hi Jeff,
yes, this would be the enzymes DpnII (or MboI) and HinfI. So you can just input --renz MboI Hinfl.
Let us know how this works.

cheers

@jeff-godwin
Copy link
Author

Hi @fransua,

Thank you for the reply.

I ran tadbit map with --renz MboI Hinfl and --renz MboI DpnII HinfI and it produced the following error -

Writing log to /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out/process.log
Writing versions of TADbit and dependencies
Generating Hi-C QC plot
Traceback (most recent call last):
File "/home/jgodwin/miniconda3/envs/tadbit_add/bin/tadbit", line 170, in
main(sys.argv)
File "/home/jgodwin/miniconda3/envs/tadbit_add/bin/tadbit", line 167, in main
args.func(args)
File "/home/jgodwin/miniconda3/envs/tadbit_add/lib/python3.7/site-packages/pytadbit/tools/tadbit_map.py", line 69, in run
savefig=fig_path)
File "/home/jgodwin/miniconda3/envs/tadbit_add/lib/python3.7/site-packages/pytadbit/utils/fastq_utils.py", line 314, in quality_plot
liges[k][max_seq_len // 2])
IndexError: list index out of range

The mapping runs fine with just --renz DpnII but the addition of HinfI produces the above error.
How do I troubleshoot this?

Thanks

@jeff-godwin
Copy link
Author

It appears the "HinfI" enzyme (STRING) is the issue. Any other combinations of enzymes proceeds normally. Running with just --renz HinfI produces the same error as above
Curiously it does appear in the list of available enzymes yet somehow is not identified by the fastq_utils.

@fransua
Copy link
Member

fransua commented Apr 19, 2024

this is indeed unexpected...
I had a look to it, with the current version of TADbit in this repo.
Seems to work fine with me (in the mapping stage at least).
image

but I am perhaps missing something, can you share your command line?

@jeff-godwin
Copy link
Author

My TADbit version is v1.1

I'm attaching my bash script for your reference -

fastq_dir="/home/jgodwin/Data/Hi-C/RK_data/1_Raw_data"
workdir="/home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out"
yeast_ref="/home/jgodwin/Data/Hi-C/Yeast_ref"

tadbit map -w ${workdir} --fastq ${fastq_dir}/WT_L1_1.dsrc --read 1 --index ${yeast_ref}/sacCer3.gem --renz DpnII HinfI --genome ${yeast_ref}/sacCer3.fa -C 3
tadbit map -w ${workdir} --fastq ${fastq_dir}/WT_L1_2.dsrc --read 2 --index ${yeast_ref}/sacCer3.gem --renz DpnII HinfI --genome ${yeast_ref}/sacCer3.fa -C 3

I have tried with --renz MboI HinfI as well.

@fransua
Copy link
Member

fransua commented Apr 19, 2024

ok, I have tried with Saccharomyces reference genome and the QC step is generated properly:
image
(uggly because using mock data)

but it's not the genome either... can you try to reinstall TADbit? Perhaps it is something that has been fixed already.

(Note: MboI and DpnII are the same internally, it should not matter if you use one or the other... but perhaps you could try MboI and HindIII)

@jeff-godwin
Copy link
Author

jeff-godwin commented Apr 19, 2024

Unfortunately the problem exists after re-installation too.

Running with MboI and HindIII proceeds without a problem. Attaching the log here -

Writing log to /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out/process.log
Writing versions of TADbit and dependencies
Generating Hi-C QC plot

  • Dangling-ends (sensu-stricto): 10.191%
  • Dangling-ends (sensu-stricto): 0.138%
  • Ligation sites: 3.069%
  • Ligation sites: 0.011%
  • Ligation sites: 0.006%
  • Ligation sites: 0.000%
    mapping /home/jgodwin/Data/Hi-C/RK_data/1_Raw_data/WT_L1_1.dsrc read 1 to /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out
    Preparing FASTQ file
  • conversion to MAP format
    Mapping reads in window 1-end_5f7b312935...
    TO GEM 3 /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y
    /home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper -I /home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.gem -t 3 -F SAM -o /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_full_1-end_5f7b312935.map -i /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y
    Parsing result...
    x removing GEM input /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y
    x removing map /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_full_1-end_5f7b312935.map
  • splitting into restriction enzyme (RE) fragments using ligation sites
  • ligation sites are replaced by RE sites to match the reference genome
    • enzymes: MboI & MboI, ligation site: GATCGATC, RE site: GATC & GATC
    • enzymes: MboI & HindIII, ligation site: GATCAGCTT, RE site: GATC & AAGCTT
    • enzymes: HindIII & MboI, ligation site: AAGCTGATC, RE site: AAGCTT & GATC
    • enzymes: HindIII & HindIII, ligation site: AAGCTAGCTT, RE site: AAGCTT & AAGCTT
      Preparing MAP file
      x removing pre-GEM input /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_filt_1-end_5f7b312935.map
      Mapping fragments of remaining reads...
      TO GEM 3 /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt
      /home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper -I /home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.gem -t 3 -F SAM -o /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt_frag_1-end_5f7b312935.map -i /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt
      Parsing result...
      x removing GEM input /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt
      x removing failed to map /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a90dcp1y_fail_5f7b312935.map
      x removing tmp mapped /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out_tmp_r1_5f7b312935/WT_L1_1_a56ec3qt_frag_1-end_5f7b312935.map
      ,---------------.
      | MAPPED_INPUTs |
      ,----.--------.------------.------.------.------.--------------.-----------------------------.---------------------------------------------------------------------------------.----------.-----------------.---------.
      | Id | PATHid | Entries | Trim | Frag | Read | Enzyme | Dangling_Ends | Ligation_Sites | WRKDIRid | MAPPED_OUTPUTid | INDEXid |
      |----+--------+------------+------+------+------+--------------+-----------------------------+---------------------------------------------------------------------------------+----------+-----------------+---------|
      | 1 | 2 | 23,709,683 | None | full | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 5 | 3 |
      | 2 | 2 | 6,487,346 | None | frag | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 6 | 3 |
      '----^--------^------------^------^------^------^--------------^-----------------------------^---------------------------------------------------------------------------------^----------^-----------------^---------'
      ,-------.
      | PATHs |
      ,----.-------.------------------------------------------------.--------------.
      | Id | JOBid | Path | Type |
      |----+-------+------------------------------------------------+--------------|
      | 1 | 1 | /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out | WORKDIR |
      | 2 | 1 | ../1_Raw_data/WT_L1_1.dsrc | MAPPED_FASTQ |
      | 3 | 1 | ../../Yeast_ref/sacCer3.gem | INDEX |
      | 4 | 1 | WT_L1_1.dsrc_MboI-HindIII_5f7b312935.png | FIGURE |
      | 5 | 1 | 01_mapped_r1/WT_L1_1_full_1-end_5f7b312935.map | SAM/MAP |
      | 6 | 1 | 01_mapped_r1/WT_L1_1_frag_1-end_5f7b312935.map | SAM/MAP |
      '----^-------^------------------------------------------------^--------------'
      ,------.
      | JOBs |
      ,----.------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.---------------------.---------------------.------.----------------.
      | Id | Parameters | Launch_time | Finish_time | Type | Parameters_md5 |
      |----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+------+----------------|
      | 1 | skip_mapping:0 genome:[/home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.fa] read:1 chr_name:[] noX:0 fast_fragment:0 cpus:3 mapper:gem mapper_binary:/home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper mapper_param:{} gem_version:3 | 19/04/2024 17:50:30 | 19/04/2024 18:00:13 | Map | 5f7b312935 |
      '----^------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^---------------------^---------------------^------^----------------'
      ,---------------.
      | MAPPED_INPUTs |
      ,----.--------.------------.------.------.------.--------------.-----------------------------.---------------------------------------------------------------------------------.----------.-----------------.---------.
      | Id | PATHid | Entries | Trim | Frag | Read | Enzyme | Dangling_Ends | Ligation_Sites | WRKDIRid | MAPPED_OUTPUTid | INDEXid |
      |----+--------+------------+------+------+------+--------------+-----------------------------+---------------------------------------------------------------------------------+----------+-----------------+---------|
      | 1 | 2 | 23,709,683 | None | full | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 5 | 3 |
      | 2 | 2 | 6,487,346 | None | frag | 1 | MboI-HindIII | MboI:10.191% HindIII:0.138% | MboI-MboI:3.069% MboI-HindIII:0.011% HindIII-MboI:0.006% HindIII-HindIII:0.000% | 1 | 6 | 3 |
      '----^--------^------------^------^------^------^--------------^-----------------------------^---------------------------------------------------------------------------------^----------^-----------------^---------'
      ,-------.
      | PATHs |
      ,----.-------.------------------------------------------------.--------------.
      | Id | JOBid | Path | Type |
      |----+-------+------------------------------------------------+--------------|
      | 1 | 1 | /home/jgodwin/Data/Hi-C/RK_data/2_Tadbit_Out | WORKDIR |
      | 2 | 1 | ../1_Raw_data/WT_L1_1.dsrc | MAPPED_FASTQ |
      | 3 | 1 | ../../Yeast_ref/sacCer3.gem | INDEX |
      | 4 | 1 | WT_L1_1.dsrc_MboI-HindIII_5f7b312935.png | FIGURE |
      | 5 | 1 | 01_mapped_r1/WT_L1_1_full_1-end_5f7b312935.map | SAM/MAP |
      | 6 | 1 | 01_mapped_r1/WT_L1_1_frag_1-end_5f7b312935.map | SAM/MAP |
      '----^-------^------------------------------------------------^--------------'
      ,------.
      | JOBs |
      ,----.------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.---------------------.---------------------.------.----------------.
      | Id | Parameters | Launch_time | Finish_time | Type | Parameters_md5 |
      |----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+------+----------------|
      | 1 | skip_mapping:0 genome:[/home/jgodwin/Data/Hi-C/Yeast_ref/sacCer3.fa] read:1 chr_name:[] noX:0 fast_fragment:0 cpus:3 mapper:gem mapper_binary:/home/jgodwin/miniconda3/envs/tadbit-3d/bin/gem-mapper mapper_param:{} gem_version:3 | 19/04/2024 17:50:30 | 19/04/2024 18:00:13 | Map | 5f7b312935 |
      '----^------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^---------------------^---------------------^------^----------------'
      cleaning temporary files

@jeff-godwin
Copy link
Author

It appears it's a problem with the reads. The mapping proceeds with the demo data used for the 3DARORC tutorial (SRR5077821) without a problem.
I'm rechecking the reads although I do believe it was prepared with the ARIMA double digestion kit.

@fransua
Copy link
Member

fransua commented Apr 19, 2024

Hi,
yes, it works for me too with other dataset... I suspect it is either the reference genome, can you try with this index: https://drive.google.com/file/d/1vBqQ3CjlEDnVo9p5cIHt81aB5blFNSDD
(guessing you are using gem v3).

or an heterogeneous read lengths, can you check that all your reads are the same size?

@jeff-godwin
Copy link
Author

Hi Francois,

The same issue shows up with the new index as well. My read lengths are the same (35bp - wonder if this is tripping something). I tried mapping with bowtie and hisat as well.
The reads are from publicly available data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5918358)
I was able to process them through the nextflow HiC pipeline (https://nf-co.re/hic/2.1.0/docs/usage)
with the --digestion 'arima' parameter. I will let you know if anything else comes up.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants