Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying extra columns in BED files #54

Open
jwokaty opened this issue Dec 15, 2023 · 4 comments
Open

Identifying extra columns in BED files #54

jwokaty opened this issue Dec 15, 2023 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@jwokaty
Copy link

jwokaty commented Dec 15, 2023

Hi,

I'm creating an R client for api.bedbase.org at https://github.com/jwokaty/BEDbaseR. I want to import the BED files into GRanges objects; however, I noticed that the BED files have a varying number of extra columns. Is there anyway for me to know from the API the what these columns are?

Also, when I look at bed/example, I see

{
  "genome": {
    "alias": "hg38",
    "digest": ""
  },
  "expected_partitions": {
    "path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
    "title": "Expected distribution over genomic partitions",
    "thumbnail_path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.png"
  },
  "gc_content": null,
  "fiveutr_frequency": 2925,
  "intron_percentage": 0.4246,
  "pipestat_modified_time": "2023-10-19T19:15:01.945492",
  "cumulative_partitions": {
    "path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_cumulative_partitions.pdf",
    "title": "Cumulative distribution over genomic partitions",
    "thumbnail_path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_cumulative_partitions.png"
  },
...

Are files such as output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf available somewhere? I tried https://api.bedbase.org/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf but get {"detail":"Not Found"}. This is more of a curiosity at this point as I am mostly interested in importing into a GRanges object as I am still trying to understand the API.

Thanks for your help.

@nsheff
Copy link
Member

nsheff commented Dec 15, 2023

BED files have a varying number of extra columns. Is there anyway for me to know from the API the what these columns are?

No, the API doesn't know that. Is this important? Do you suggest we change something here? Why are you interested in knowing the columns?

In reality, I suppose we may not even know the column, depending on where the BED file came from... but as of right now we're not tracking that. We could work on that, though.

Are files such as output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf available somewhere?

Yes, the files are served on a separate S3-compatible server. To find the URls for them, you use the DRS endpoints. That's described here: https://api.bedbase.org/docs/guide

To show you specifically for this example, here's how to do it: In that example you'll see the identifier for that BED record: "record_identifier": "421d2128e183424fcc6a74269bae7934"

You'll see that it has an object called expected_partitions. Use these to make an object identifier: bed.421d2128e183424fcc6a74269bae7934.expected_partitions

You can pass this to the DRS endpoints to get the object metadata:

https://api.bedbase.org/objects/bed.421d2128e183424fcc6a74269bae7934.expected_partitions

This has the URLs where you can get the object itself:

{
  "id": "bed.421d2128e183424fcc6a74269bae7934.expected_partitions",
  "name": null,
  "self_uri": "drs://api.bedbase.org/bed.421d2128e183424fcc6a74269bae7934.expected_partitions",
  "size": "unknown",
  "created_time": "2023-10-17T18:53:05.653831",
  "updated_time": "2023-10-19T19:15:01.945492",
  "checksums": "bed.421d2128e183424fcc6a74269bae7934.expected_partitions",
  "access_methods": [
    {
      "type": "http",
      "access_url": {
        "url": "https://data2.bedbase.org/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
        "headers": null
      },
      "access_id": "http",
      "region": null
    },
    {
      "type": "s3",
      "access_url": {
        "url": "s3://data2.bedbase.org/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
        "headers": null
      },
      "access_id": "s3",
      "region": null
    },
    {
      "type": "local",
      "access_url": {
        "url": "/static/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
        "headers": null
      },
      "access_id": "local",
      "region": null
    }
  ],
  "description": null
}

You could also get these PDFs from the links on the splash page : https://dev.bedbase.org/bed/421d2128e183424fcc6a74269bae7934 (these will point to the same files)

@nsheff nsheff added the question Further information is requested label Dec 15, 2023
@jwokaty
Copy link
Author

jwokaty commented Dec 15, 2023

Thanks for the explanation. I am still trying to understand the API as I develop the client. If the column information was available, I wanted to provide that to the user. I am not proposing any changes at this point.

@jwokaty
Copy link
Author

jwokaty commented Apr 8, 2024

I wanted to follow up on identifying the types of BED files. I see that there's been some development on api-dev.bedbase.org. Should I be developing my client against your development version?

@khoroshevskyi
Copy link
Member

Hi, yes, I rewrote bedbase API and divided endpoints into statistics, classification, files, plots, and raw metadata. All of these fields will be developed further. All endpoints now have schemas so it should be easier to understand.
Additionally, I would appreciate your feedback about the new API, what do you think should be added or changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants