Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability of processed datasets for example reproducibility #6

Open
ijmiller2 opened this issue Feb 17, 2020 · 0 comments
Open

Availability of processed datasets for example reproducibility #6

ijmiller2 opened this issue Feb 17, 2020 · 0 comments

Comments

@ijmiller2
Copy link

Dear Reymond Group,

Thank you very much for the development and release of this great tool! As I started to look through some of the examples, I wondered if you might be able to make the processed data available for the worked examples. I'm guessing my data will be most similar in format/shape to the RNA Seq data, but I'm having some issues confirming that.

For instance, in the RNA Sequencing example, your input files are generically named ("data.csv.xz" and "labels.csv":

DATA = pd.read_csv("data.csv.xz", index_col=0, sep=",")
LABELS = pd.read_csv("labels.csv", index_col=0, sep=",")

I see at the top of that file the data source is https://gdc.cancer.gov/about-data/publications/pancanatlas, but when I follow that URL, I'm not clear on which file in particular I should download and if there's any further processing required to get the "labels.csv."

Is it the EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv file (it's 1.88 GB in size)?

So, I'm wondering if you'd be able to host the processed example data files on your site? Or offer more info on their shape/format?

Thanks again,
Ian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant