Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No samples in this workspace to parse!" error in flowjo_to_gatingset, apparently bc of divergent sample name and FCS file name #113

Open
PedroMilanezAlmeida opened this issue Aug 24, 2020 · 4 comments

Comments

@PedroMilanezAlmeida
Copy link

PedroMilanezAlmeida commented Aug 24, 2020

A workaround for #112 is to provide the directory where the FCS files are located and the FCS filename to path and subset, respectively. For example:

gs <- CytoML::flowjo_to_gatingset(ws,
                                  name = 1,
                                  path = dirname(sampleURI),
                                  subset = basename(sampleURI),
                                  extend_val = -Inf)

However, apparently if the sample name and the FCS filename don't match, Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : No samples in this workspace to parse! is thrown. A workaround would be:

gs <- CytoML::flowjo_to_gatingset(ws,
                                  name = 1,
                                  path = dirname(sampleURI),
                                  subset = fj_ws_get_samples(ws)$name[sampleID], # sampleID being the integer for the sample of interest
                                  extend_val = -Inf)

which works pretty well!

However, I am working with several samples that have the same sample name but different FCS files and different file names (these are all technical replicates of the same sample acquired on different days). Using the second code chunk above will, unfortunately, load all technical replicates with the same sample name, and, as mentioned in #112, I need to parse only one sample at a time.

Again, any help would be deeply appreciated.

PS: the help for subset indicates that FCS filenames can be used instead of sample names ("Or a character specifying the FCS filenames to be imported.")

@PedroMilanezAlmeida
Copy link
Author

Just FYI, for my specific purpose, I found a workaround combining the second code chunk above and

if(length(gs) > 1) {
  gs <- gs[flowWorkspace::keyword(gs, "FILENAME")$FILENAME == sampleURI]
}

which subsets the GatingSet to keep only the sample of interest.

This is not ideal since I have to load more samples than needed, slowing things down.

Also, please let me know whether I should keep this and #112 open (are these the expected behavior for path and subset?).

@gfinak
Copy link
Member

gfinak commented Aug 24, 2020

path is really meant to point to the directory where the FCS files reside. We search for files based on $FIL keywords if I recall correctly. subset is a bit of a legacy argument, it subsets the table of sampleid, samplename, groupname that's constructed from the XML, based on the sampleID or index. The subsetting API was implemented 10 years ago when flowWorkspace was still a pure R package.
I think we probably need to take a second look at this interface and clean it up a bit.
Can you provide more details about your use case?

@mikejiang
Copy link
Member

subset is indeed the legacy argument, but besides the conventional numeric idx or FCS filenames based selection , it is still also able to take a R filter expression to sub-select samples based on keywords content recorded in xml (i.e. through fj_ws_get_keywords under the hood).
see https://www.bioconductor.org/packages/devel/bioc/vignettes/CytoML/inst/doc/flowjo_to_gatingset.html#24_Import_a_subset
for details
If there is some keyword that can uniquely identify these replicates, then you can pass that as a filter expression to subset argument.

If they only differentiate by fcs filenames, then you will need to pre-load the target file into a cytoset and pass it to the parser, see this new feature introduced by #100
and illustrated here https://rpubs.com/rglab/622259

@PedroMilanezAlmeida
Copy link
Author

Hey guys, thank you both for your replies.

@gfinak: pls see https://github.com/PedroMilanezAlmeida/ezDAFi for use case.

@mikejiang: yeah, that is what I ended up doing (pre-loading a cytoset with a single fcs).

I will leave this and #112 open since the help is not correct (path as data.frame gives error and subset as FCS filename loads more than one FCS file in some cases). Pls, feel free to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants