Identifying extra columns in BED files #54

jwokaty · 2023-12-15T19:29:55Z

Hi,

I'm creating an R client for api.bedbase.org at https://github.com/jwokaty/BEDbaseR. I want to import the BED files into GRanges objects; however, I noticed that the BED files have a varying number of extra columns. Is there anyway for me to know from the API the what these columns are?

Also, when I look at bed/example, I see

{
  "genome": {
    "alias": "hg38",
    "digest": ""
  },
  "expected_partitions": {
    "path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
    "title": "Expected distribution over genomic partitions",
    "thumbnail_path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.png"
  },
  "gc_content": null,
  "fiveutr_frequency": 2925,
  "intron_percentage": 0.4246,
  "pipestat_modified_time": "2023-10-19T19:15:01.945492",
  "cumulative_partitions": {
    "path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_cumulative_partitions.pdf",
    "title": "Cumulative distribution over genomic partitions",
    "thumbnail_path": "output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_cumulative_partitions.png"
  },
...

Are files such as output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf available somewhere? I tried https://api.bedbase.org/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf but get {"detail":"Not Found"}. This is more of a curiosity at this point as I am mostly interested in importing into a GRanges object as I am still trying to understand the API.

Thanks for your help.

The text was updated successfully, but these errors were encountered:

nsheff · 2023-12-15T19:54:37Z

BED files have a varying number of extra columns. Is there anyway for me to know from the API the what these columns are?

No, the API doesn't know that. Is this important? Do you suggest we change something here? Why are you interested in knowing the columns?

In reality, I suppose we may not even know the column, depending on where the BED file came from... but as of right now we're not tracking that. We could work on that, though.

Are files such as output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf available somewhere?

Yes, the files are served on a separate S3-compatible server. To find the URls for them, you use the DRS endpoints. That's described here: https://api.bedbase.org/docs/guide

To show you specifically for this example, here's how to do it: In that example you'll see the identifier for that BED record: "record_identifier": "421d2128e183424fcc6a74269bae7934"

You'll see that it has an object called expected_partitions. Use these to make an object identifier: bed.421d2128e183424fcc6a74269bae7934.expected_partitions

You can pass this to the DRS endpoints to get the object metadata:

https://api.bedbase.org/objects/bed.421d2128e183424fcc6a74269bae7934.expected_partitions

This has the URLs where you can get the object itself:

{
  "id": "bed.421d2128e183424fcc6a74269bae7934.expected_partitions",
  "name": null,
  "self_uri": "drs://api.bedbase.org/bed.421d2128e183424fcc6a74269bae7934.expected_partitions",
  "size": "unknown",
  "created_time": "2023-10-17T18:53:05.653831",
  "updated_time": "2023-10-19T19:15:01.945492",
  "checksums": "bed.421d2128e183424fcc6a74269bae7934.expected_partitions",
  "access_methods": [
    {
      "type": "http",
      "access_url": {
        "url": "https://data2.bedbase.org/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
        "headers": null
      },
      "access_id": "http",
      "region": null
    },
    {
      "type": "s3",
      "access_url": {
        "url": "s3://data2.bedbase.org/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
        "headers": null
      },
      "access_id": "s3",
      "region": null
    },
    {
      "type": "local",
      "access_url": {
        "url": "/static/output/bedstat_output/421d2128e183424fcc6a74269bae7934/GSM6856752_S1_H3K27ac_peaks_expected_partitions.pdf",
        "headers": null
      },
      "access_id": "local",
      "region": null
    }
  ],
  "description": null
}

You could also get these PDFs from the links on the splash page : https://dev.bedbase.org/bed/421d2128e183424fcc6a74269bae7934 (these will point to the same files)

jwokaty · 2023-12-15T20:43:07Z

Thanks for the explanation. I am still trying to understand the API as I develop the client. If the column information was available, I wanted to provide that to the user. I am not proposing any changes at this point.

jwokaty · 2024-04-08T19:32:12Z

I wanted to follow up on identifying the types of BED files. I see that there's been some development on api-dev.bedbase.org. Should I be developing my client against your development version?

khoroshevskyi · 2024-04-09T01:16:55Z

Hi, yes, I rewrote bedbase API and divided endpoints into statistics, classification, files, plots, and raw metadata. All of these fields will be developed further. All endpoints now have schemas so it should be easier to understand.
Additionally, I would appreciate your feedback about the new API, what do you think should be added or changed.

nsheff added the question Further information is requested label Dec 15, 2023

khoroshevskyi assigned donaldcampbelljr Feb 3, 2024

nsheff mentioned this issue Dec 19, 2023

Specify different flavors of BED databio/bedboss#34

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identifying extra columns in BED files #54

Identifying extra columns in BED files #54

jwokaty commented Dec 15, 2023

nsheff commented Dec 15, 2023

jwokaty commented Dec 15, 2023

jwokaty commented Apr 8, 2024

khoroshevskyi commented Apr 9, 2024

Identifying extra columns in BED files #54

Identifying extra columns in BED files #54

Comments

jwokaty commented Dec 15, 2023

nsheff commented Dec 15, 2023

jwokaty commented Dec 15, 2023

jwokaty commented Apr 8, 2024

khoroshevskyi commented Apr 9, 2024