Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with cooltools.api.snipping.pileup() compatibility with bioframe/core/construction.py", line 150 #462

Open
haleybianchi opened this issue Oct 6, 2023 · 5 comments
Labels

Comments

@haleybianchi
Copy link

Hi There,

I am attempting to call cooltools.api.snipping.pileup(clr,features_df=peak_df , flank=300_000), where :

peak_df = pd.read_csv(f"ctcf.enhancer.promoters/{peak}.bed", sep="\t", names=['chrom', "start", "end"]), as per the cooltools feature_df requirements. I am not passing in a view_df, but when the API attempts to construct one, I get an error :
File "conda/envs/matrix/lib/python3.8/site-packages/bioframe/core/construction.py", line 229, in make_viewframe
view_df = from_any(regions, name_col=view_name_col, cols=cols)
File "conda/envs/matrix/lib/python3.8/site-packages/bioframe/core/construction.py", line 150, in from_any
raise ValueError(f"Unknown input format: {type(regions)}")
ValueError: Unknown input format: <class 'NoneType'>

@gfudenberg
Copy link
Member

not quite clear what is going on here-- can you:

  1. print peak_df
  2. print clr.info
  3. see if the snipping tutorial notebook works with your environment

thanks!

@haleybianchi
Copy link
Author

haleybianchi commented Oct 11, 2023

Hi, I was able to get the pileup to work after making a view_df with cooltools on genomic elements that are not CTCF, but when I do use CTCF, I always get the error

IndexError                                Traceback (most recent call last)
/tmp/ipykernel_3875287/3943304134.py in <module>
      2 dnase_overlap= overlap_dnase(ctcf)
      3 sides = 300_000
----> 4 stack= cooltools.pileup(clr,ctcf,view_df=view, flank=sides) #Expected output in stack should follow [xPos, yPox, nSites], while pileup() returns [nSites, xPos, yPox].
      5 # Aggregate. Note that some pixels might be converted to NaNs after IC, thus we aggregate by nanmean:
      6 mtx = np.nanmean(stack, axis=2)

/data/bianchiah/mymamba/envs/mambamatrix/lib/python3.7/site-packages/cooltools/api/snipping.py in pileup(clr, features_df, view_df, expected_df, expected_value_col, flank, min_diag, clr_weight_name, nproc)
    819     else:
    820         mymap = map
--> 821     stack = pileup_legacy(features_df, snipper.select, snipper.snip, map=mymap)
    822     if feature_type == "bed":
    823         stack = np.nansum([stack, np.transpose(stack, axes=(1, 0, 2))], axis=0)

/data/bianchiah/mymamba/envs/mambamatrix/lib/python3.7/site-packages/cooltools/api/snipping.py in pileup_legacy(features, data_select, data_snip, map)
    243             partial(_pileup, data_select, data_snip),
    244             # Note that unannotated regions will form a separate group
--> 245             features.groupby("region", sort=False),
    246         )
    247     )

/data/bianchiah/mymamba/envs/mambamatrix/lib/python3.7/site-packages/cooltools/api/snipping.py in _pileup(data_select, data_snip, arg)
    199 
    200     data = data_select(region1, region2)
--> 201     stack = list(map(partial(data_snip, data, region1, region2), zip(s1, e1, s2, e2)))
    202 
    203     return np.dstack(stack), feature_group["_rank"].values

/data/bianchiah/mymamba/envs/mambamatrix/lib/python3.7/site-packages/cooltools/api/snipping.py in snip(self, matrix, region1, region2, tup)
    413         if self.min_diag is not None:
    414             D = self.diag_indicators[region1][lo1:hi1, lo2:hi2] < self.min_diag
--> 415             snippet[D] = np.nan
    416         return snippet
    417 

IndexError: boolean index did not match indexed array along dimension 0; dimension is 61 but corresponding boolean dimension is 2

I tried clustering the CTCF sites by the resolution, but the error persisted

@gfudenberg
Copy link
Member

views are generally used for specifying things like chromosomes or chromosome arms-- see cell [7] of the tutorial: https://cooltools.readthedocs.io/en/latest/notebooks/pileup_CTCF.html

its not clear what your ctcf data frame is: does it have chrom, start, end, as in the tutorial?

@haleybianchi
Copy link
Author

haleybianchi commented Oct 24, 2023

Yes, it does.
for the view:
view= cooltools.lib.common.make_cooler_view(clr, ucsc_names=False) #Generate a full chromosome viewframe using cooler’s chromsizes
for the clr.info:
{'bin-size': 5000, 'bin-type': 'fixed', 'creation-date': '2020-08-16T14:48:11.708578', 'format': 'HDF5::Cooler', 'format-url': 'https://github.com/mirnylab/cooler', 'format-version': 3, 'generated-by': 'cooler-0.8.5', 'genome-assembly': 'unknown', 'metadata': {}, 'nbins': 545118, 'nchroms': 22, 'nnz': 156203230, 'storage-mode': 'symmetric-upper', 'sum': 572517229}
I tried using another bedfile I generated from deeptools multibigwig summary using 5kb bins, and I am getting a similar error:
`---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
c:\Users\BIANCH~1\AppData\Local\Temp\scp14986\vf\users\SenLab\shared\Haley\EPinteractions\scripts\transcription.length.ipynb Cell 33 line 1
13 top[["chrom", "start", "end"]]
15 sides = 300_000
---> 16 stack= cooltools.pileup(clr=clr,features_df=top,view_df=view, flank=sides)
17 # make_matrix(top, "Top 5% Regions with Bidirectional Nascent RNA")
18 # make_matrix(bottom, "Bottom 5% Regions Bidirectional Nascent RNA")
19 # pos_only = binned_rna[binned_rna.strand_neg == 0]
20 # make_matrix(pos_only, "Regions with only Positive Strand Nascent RNA")

File /data/bianchiah/mymamba/envs/mambamatrix2/lib/python3.8/site-packages/cooltools/api/snipping.py:986, in pileup(clr, features_df, view_df, expected_df, expected_value_col, flank, min_diag, clr_weight_name, nproc)
984 else:
985 mymap = map
--> 986 stack = _pileup(features_df, snipper.select, snipper.snip, map=mymap)
987 if feature_type == "bed":
988 stack = np.nansum([stack, np.transpose(stack, axes=(1, 0, 2))], axis=0)

File /data/bianchiah/mymamba/envs/mambamatrix2/lib/python3.8/site-packages/cooltools/api/snipping.py:275, in _pileup(features, data_select, data_snip, map)
271 features["_rank"] = range(len(features))
273 # cumul_stack = []
274 # orig_rank = []
--> 275 cumul_stack, orig_rank = zip(
276 *map(
277 partial(_extract_stack, data_select, data_snip),
...
463 D = self.diag_indicators[region1][lo1:hi1, lo2:hi2] < self.min_diag
--> 464 snippet[D] = np.nan
465 return snippet

IndexError: boolean index did not match indexed array along dimension 0; dimension is 121 but corresponding boolean dimension is 4
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...`

@gfudenberg
Copy link
Member

I wonder if some of your snips are going out of bounds-- perhaps you can check if all of your sites are greater than the flank distance from chromosome starts/ends?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants