Improve the documentation for `tidybulk` #282

stemangiola · 2023-07-22T02:41:25Z

improving examples for methods
solving documentation warnings (when packages are loading or installing)
Improving methods' argument description
improving the transparency of the method used in the backend in the description of the methods

aliosmancetin · 2023-07-28T19:40:09Z

Hi, I'd like to contribute this one, but I am not sure how it should be done or when is the deadline. I am a medical doctor but currently a PhD student as a computational biologist of my group. I know transcriptomics analysis with bioconductor and I like tidy paradigm. Since this is labeled as a "good first issue", if there will be enough guidance, I can give it a try, what do you think?

stemangiola · 2023-07-29T01:22:32Z

Thanks @aliosmancetin! Nice to meet you.

I am not sure when is the deadline

There is a soft deadline to commit and start working, so we can form the team for the upcoming paper. But there is no hard deadline for completing the task. We trust the community on that side. Of course, each challenge should not drag too long, so to keep the ball rolling.

I am not sure how it should be done

Assign yourself to this issue,
Read about Roxigen2 https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html
Fork tidybulk
Look at all available methods in the R/methods.R script
For each of those methods, look at the documentation (e.g. ?test_differential_abundance)
For each argument in each method, judge if it is very, very easily understandable; if the answer is not (as I expect for the majority of them), try to improve it.
Is the output structure and class clearly described in the documentation? If not, you can improve that.
Are examples covering most argument combinations and use cases, if not you can add more examples

Then, other more challenging tasks are

solving documentation warnings (when packages are loading or installing)
Make sure that the backend code for each method is pasted and described in the @description part of the documentation to give transparency of what is happening underneath.
For all analysis methods, create messaging framework, to communicate to the user the operations that are running below (e.g, "tidybulk says: normalising with TMM", "tidybulk says: estimating .. with xxx")

Let me know if you want more guidance. @HelenaLC might be a good code reviewer, as she is an expert on R packages and standards.

aliosmancetin · 2023-07-29T07:11:09Z

Nice to meet you too! Thanks for your comments and outline. It seems doable for me, I'll start asap.

aliosmancetin · 2023-07-29T09:09:48Z

I couldn't figure out how to assign myself. Should ".take" work or not?

stemangiola · 2023-07-30T00:46:02Z

I assigned you. Not sure if users can choose their name from the drop-down menu

Looking forward to you PR!

stemangiola · 2023-08-02T21:08:19Z

Hello @aliosmancetin , just to do some planning on our side.

Do you have a timeline already to work on this challenge?

A note: for the tidy adapters, please follow the tidySingleCellExperiment package, which was very recently revamped.

aliosmancetin · 2023-08-03T00:09:45Z

Hi @stemangiola,

If you refer very structured work-plan with "timeline", no, I don't have unfortunately 😅 But, I've already started to read must-read documents. I am willing to commit some work on this issue, but I am not sure how long it takes.

stemangiola · 2023-08-06T22:19:52Z

Hi @stemangiola,

If you refer very structured work-plan with "timeline", no, I don't have unfortunately 😅 But, I've already started to read must-read documents. I am willing to commit some work on this issue, but I am not sure how long it takes.

I assigned you prematurely. Now I have emptied the assignee field, so you can assign yourself when you will be ready, with no rush :)

aliosmancetin · 2023-08-07T07:07:41Z

I assigned you prematurely. Now I have emptied the assignee field, so you can assign yourself when you will be ready, with no rush :)

Okay, let me focus on this one more week. Then we can discuss my progress and whether it's going to work or not.

stemangiola · 2023-08-09T00:28:55Z

@GrootJ welcome to this issue! As this issue is multifaceted, we can spread the work with other devs (@aliosmancetin potentially).

Please have a look at the description for the various goals. Sure, we can have a short Zoom now or any other time. I will be in NY until tomorrow

aliosmancetin · 2023-08-14T12:06:55Z

Hi @stemangiola,

I saw you assigned me again. I think I figured out how generally tidybulk and tidySummarizedExperiment work but I have couple of questions and I need some confirmations if I get this right. Then I could start improving the documentation. Do you prefer doing this under this issue or with email (questions may sound easy and basic, I don't know).

stemangiola · 2023-08-14T18:46:22Z

I was reorganising things and I figured out it was more informative to put potential contributors anyway :)

feel free to ask here. So other contributors will be able to learn as well.

GrootJ · 2023-08-16T21:01:18Z

@GrootJ welcome to this issue! As this issue is multifaceted, we can spread the work with other devs (@aliosmancetin potentially).

Please have a look at the description for the various goals. Sure, we can have a short Zoom now or any other time. I will be in NY until tomorrow

@stemangiola - got more set up, about to start on this - i will follow suggestions you provided to @aliosmancetin.
I forked tidybulk (main) - and opened R/methods.R and went through the readme
I also want to run through the README.Rmd workflow myself to make sure functions do/what steps they are used.
running README.Rmd seems to require a fork of dev as well? (please note I am newer to the development aspects here) - open to suggestions
either way - happy to connect, also on how to split up documentation editing amongst us..

cheers, Joost

stemangiola · 2023-08-16T23:45:34Z

Thanks @GrootJ and @aliosmancetin, for the enthusiasm! tidybulk really needed this. Welcome to the team and to the publication ;)

We can divide the tasks so @GrootJ and @aliosmancetin can pick and work independently.

If possible, do this with the perspective of a new user who should find everything intuitive and clear and should feel tidybulk as a black box (a thing that more experienced people often criticise). So tidybulk can be not only convenient but also an educational tool.

relevant for dplyr_methods.R, tidyr_methods.R Clean dplyr, tidyr methods replicating the new tidySingleCellExperiment roxygen style that @HelenaLC did.
relevant for methods.R Improve roxygen documentation (arguments names, examples across all possible backend methods) of the analysis methods.
relevant for methods.R, as SE_methods.R use the same method list Improve the analysis methods roxygen description, exposing some of the backend code (see what I did for test_differential_abundance, that can also be improved), so users are aware of what doce functions (e.g. deseq2, edger, GSEA, cibersort, etc..) are used. This would include references to papers on the methods in the documentation section.
relevant for methods.R and SE_methods.R, as both need messaging system Improve/create a messaging system inside analysis methods that update the user on what is going on underneath. We need to choose a messaging function. (message() I think prints just at the end, so it would not be suitable. cat() cannot be easily suppressed, so it is not suitable. But there is something I might be missing). The messages would be of this nature:

tidybulk says: scaling using TMM, or
tidybulk says: scaling using upper quartile, and
tidybulk says: calculating mean-variance trend using ...
tidybulk says: fitting using ...
tidybulk says: testing gene set enrichment using ...

You can put your name next to 1, 2, 3, 4 so we know what we are working on.

aliosmancetin · 2023-08-17T05:57:24Z

Now this task is more doable for us, welcome @GrootJ :)

@stemangiola At this point, my question is, should we focus on "methods.R" or as well as "methods_SE.R".
As far as I understand, methods directly work on SummarizedExperiment should be more important now.

Other question, you refer to "analysis methods". Are those in "methods.R", "methods_SE.R" files or "functions.R", "functions_SE.R" files?

I may have a suggestion about documentation, we may include a kind of "developer guide" to documentation. Maybe experienced users/developers can easily understand how tidybulk works (classes, generics, methods and backend implementation etc.) however, it took me some time to understand how it works exactly (after I saw there are feature__ and sample__ variables directly in utilities, it became easier :) ). At this point, I know tidyadapters generally work similarly, maybe other developers are working on this right now, I am not sure.

stemangiola · 2023-08-17T06:26:26Z

should we focus on "methods.R" or as well as "methods_SE.R".

Good point, I edited the to-do list specifying which files they apply to

As far as I understand, methods directly work on SummarizedExperiment should be more important now.

I would not say so. Still, we have to give equal importance to the tibble and SummarizedExperiment interfaces. They should behave consistently. (bare in mind that methods_SE.R does not have documentation and relies on methods.R)

Other question, you refer to "analysis methods". Are those in "methods.R", "methods_SE.R" files or "functions.R", "functions_SE.R" files?

"methods.R", "methods_SE.R" (I specified now in the to-do list). functions_*.R are only internal, definitely a lower priority. Although documented code (even if internal) will be appreciated by future developers.

I may have a suggestion about documentation, we may include a kind of "developer guide" to documentation. Maybe experienced users/developers can easily understand how tidybulk works (classes, generics, methods and backend implementation etc.) however, it took me some time to understand how it works exactly (after I saw there are feature__ and sample__ variables directly in utilities, it became easier :) ). At this point, I know tidyadapters generally work similarly, maybe other developers are working on this right now, I am not sure.

Amazing point. This is definitely a new/different "#tidyomics open challenge". I see this to be developed as a blog post and vignette as a guide. You @aliosmancetin and @GrootJ are great judges of the most confusing, less documented and intuitive aspects of the code base, on the front and back end. You could open a challenge/issue, and start listing all confusing stuff that you are figuring out with sweet and tiers, and that can be the material for a vignette/blog post.

maybe other developers are working on this right now, I am not sure.

None has tackled the developer side, but it is a good idea.

stemangiola · 2023-08-20T12:34:00Z

@aliosmancetin and @GrootJ when you manage to do your first commit, please add your authorship details here https://docs.google.com/spreadsheets/d/19XqhN3xAMekCJ-esAolzoWT6fttruSEermjIsrOFcoo/edit?usp=sharing

stemangiola · 2023-08-27T04:31:02Z

Hello Fellas,

@chilampoon would be on their way to resolve the conflicts of tidybulk for tidyomics package. (fixing roxigen for dplyr_methods.R and tidyr_methods.R)?

Is any of you (@aliosmancetin and @GrootJ ) actively working on this? I don't want to create duplications.

aliosmancetin · 2023-08-27T07:11:07Z

Hi @stemangiola,

I am not working on solving the conflicts.

GrootJ · 2023-08-28T04:47:26Z

Hi @stemangiola, others - I am not working on resolving those conflicts either (item 1 of your list above).

I am making some headway to get a bit better grip on some of the ongoing conversations you guys have (around tidybulk package development, tidiness in omics goals in general). I test-ran 2 tidybulk workflows (README.Rmd with se_mini dataset and diff feature abundance from the manuscript with the pasilla dataset) which gave me a better idea of the functions and documentation in there.

I think I see the opportunities for documentation improvement you mentioned (item 3 of your list above). E.g. adjust_abundance function documentation lists inputs and outputs but does not mention batch correction (using sva which it seems to auto-install+load) it seems to perform (between single and paired reads - type in the pasilla dataset). Is this is an example of documentation you like to get clarified? How to start writing such improvement also depends of on from which new user's perspective it would need to be made "easily" understandable; new users w basic familiarity with RNA-Seq and tidyverse?

getting the backend code exposed as you suggested sounds like I plan (so we can figure out what is under the hood)
getting references to source methods used in the function sounds like a plan
perhaps reference or even descriptions from those source methods (e.g. as with combat?)

Probably good to touch base on this (virtual chat or zoom?) and how we wil go about that (in version control, who which functions etc). i still need to get more familiar with roxigen too (not sure if that would fit the timeline for your publication?)

stemangiola · 2023-08-28T05:19:29Z

E.g. adjust_abundance function documentation lists inputs and outputs but does not mention batch correction (using sva which it seems to auto-install+load) it seems to perform (between single and paired reads - type in the pasilla dataset). Is this is an example of documentation you like to get clarified?

Yes, correct

How to start writing such improvement also depends of on from which new user's perspective it would need to be made "easily" understandable; new users w basic familiarity with RNA-Seq and tidyverse?

Very basic knowledge. But we should not rely on any jargon or abbreviation. Basically the English dictionary should be enough to understand tidybulk and the underlying methods.

getting the backend code exposed as you suggested sounds like I plan (so we can figure out what is under the hood)

getting references to source methods used in the function sounds like a plan

perhaps reference or even descriptions from those source methods (e.g. as with combat?)

Agree

Probably good to touch base on this (virtual chat or zoom?) and how we wil go about that (in version control, who which functions etc). i still need to get more familiar with roxigen too (not sure if that would fit the timeline for your publication?)

Sure. Where are you based? I could do it in 2 hours from now.

GrootJ · 2023-08-28T22:59:09Z

E.g. adjust_abundance function documentation lists inputs and outputs but does not mention batch correction (using sva which it seems to auto-install+load) it seems to perform (between single and paired reads - type in the pasilla dataset). Is this is an example of documentation you like to get clarified?

Yes, correct

How to start writing such improvement also depends of on from which new user's perspective it would need to be made "easily" understandable; new users w basic familiarity with RNA-Seq and tidyverse?

Very basic knowledge. But we should not rely on any jargon or abbreviation. Basically the English dictionary should be enough to understand tidybulk and the underlying methods.

Check - clearly an educational goal as well which can synergize with the laudable further standardization/tidy-ing you guys are working on. I guess there is a trade-off between level of summarization of documentation (and functions) where advanced users may want more technical detail (in functions I guess configuration options could deal with that). But from the sound of it, the aim would be plenty of "basic knowledge" users still learning about RNA-Seq in general.

Some initial ideas:

add a bit more extensive workflow example that also serves as an RNA-Seq "best practices" example that touches upon a variety of common data processing steps (implementation eased by tidybulk); taking away some "education" from function documentation, with cross-references of the workflow example in the function documentation.
start from tibble format inputs, add/explain summarized experiment (benefits) further down or separately
use more recent datasets representative of what many users will run into (e.g. paired-end read data processed with common NGS pipelines, with some known/established differential processes, ideally some multi-group)

Perhaps some of that sounds obvious - just trying to organize thoughts, get some initial feedback

Here 2 initial ideas for datasets (again, open to feedback/other suggestions)

GSE62944 TCGA data (eg BRCA tumor vs adj.normal) http://bioconductor.org/packages/release/bioc/vignettes/GSEABenchmarkeR/inst/doc/GSEABenchmarkeR.html#rna-seq-compendium
macrophage activation https://bioconductor.org/packages/release/data/experiment/html/macrophage.html

getting the backend code exposed as you suggested sounds like I plan (so we can figure out what is under the hood)

getting references to source methods used in the function sounds like a plan

perhaps reference or even descriptions from those source methods (e.g. as with combat?)

Agree

Probably good to touch base on this (virtual chat or zoom?) and how we wil go about that (in version control, who which functions etc). i still need to get more familiar with roxigen too (not sure if that would fit the timeline for your publication?)

Sure. Where are you based? I could do it in 2 hours from now.

I am on the US east coast - you're back in Melbourne again?
@aliosmancetin - where are you?

stemangiola · 2023-08-28T23:43:47Z

Great initiative!

add a bit more extensive workflow example that also serves as an RNA-Seq "best practices" example that touches upon a variety of common data processing steps (implementation eased by tidybulk); taking away some "education" from function documentation, with cross-references of the workflow example in the function documentation.

We don't necessarily want to suggest the best way to do analysis. In this first instance just give transparency of what is happening underneath.

start from tibble format inputs,

Although we are still supporting tibble input, we are not recommending it anymore, as the tidyomics uses SummarizedExperiment as the backend.

add/explain summarized experiment (benefits) further down or separately

Probably we don't want to go that basic.

Let's take this task as an onion-like. We can first tackle the high-return part, which exposes the backend code (just the really relevant part of the method called) in the @ description part.

Maybe let's start from the adjust_abundance, and improve the transparency of that, so I can give some feedback, and we will be aligned for the rest of the methods.

Here 2 initial ideas for datasets (again, open to feedback/other suggestions)

Thanks. Let's tackle this as a later task.

Having said all that. I think in the future, after tidybulk will be extremely well documented, we could

rewrite some of the books "Orchestrating ..." using tidy tools.
Add deeper educational material in the tidybulk repo itself.

One step at a time, we will climb the mountain ;)

stemangiola · 2023-08-29T01:30:49Z

@GrootJ I have added combat_seq to the adjust_abundance in the new master.

aliosmancetin · 2023-08-29T05:36:44Z

Hi guys @stemangiola @GrootJ,

Sorry for my late response. I was very busy recently, I am trying to move to new apartment. By the way, I am in Germany.

As far as I understand, first step we should do is to improve @description. It looks like most straightforward I think and requires less creativity more work. @GrootJ lets split up the methods that we will work on.

@stemangiola previously you mentioned documentation is based on methods.R. Isn't methods_SE.R included at all? This still confuses me because as you said before, although tidybulk is still supported, it is not recommended. Maybe default documentation should be changed with methods_SE.R. Tidybulk related differences or notes could be mentioned under @details section.

stemangiola · 2023-08-29T05:42:12Z

@stemangiola previously you mentioned documentation is based on methods.R. Isn't methods_SE.R included at all? This still confuses me because as you said before, although tidybulk is still supported, it is not recommended. Maybe default documentation should be changed with methods_SE.R. Tidybulk related differences or notes could be mentioned under @details section.

Yes I should have been clearer. We have three elements in tidybulk

Generic methods (http://adv-r.had.co.nz/S4.html)
tibble methods (also including the tidybulk enhanced tibbles)
SummarizedExperiment methods

The documentation is relative to (1). (2) and (3) do not need documentation, and they use the same underlying analysis methods, so the documentation for (1) will apply to (2) and (3)

stemangiola · 2023-10-31T21:16:18Z

Any news team?

chilampoon · 2023-11-13T19:32:57Z

Any news team?

will update soon, any new requests?

aliosmancetin · 2023-11-14T21:41:09Z

Any news team?

From my side, I don't have unfortunately.

stemangiola added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jul 22, 2023

stemangiola changed the title ~~Improve the documentation for tidybulk~~ Improve the documentation for tidybulk Jul 28, 2023

stemangiola assigned aliosmancetin Jul 29, 2023

stemangiola unassigned aliosmancetin Aug 6, 2023

stemangiola mentioned this issue Aug 6, 2023

Improve the documentation of tidytranscriptomics adapters, tidySE, tidySCE, tidySPE, tidyseurat stemangiola/tidyseurat#66

Closed

stemangiola assigned aliosmancetin Aug 14, 2023

stemangiola assigned GrootJ Aug 21, 2023

stemangiola mentioned this issue Sep 11, 2023

resolve conflicts for dplyr/tidyr funcs #287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the documentation for `tidybulk` #282

Improve the documentation for `tidybulk` #282

stemangiola commented Jul 22, 2023 •

edited

aliosmancetin commented Jul 28, 2023

stemangiola commented Jul 29, 2023 •

edited

aliosmancetin commented Jul 29, 2023

aliosmancetin commented Jul 29, 2023 •

edited

stemangiola commented Jul 30, 2023 •

edited

stemangiola commented Aug 2, 2023

aliosmancetin commented Aug 3, 2023

stemangiola commented Aug 6, 2023

aliosmancetin commented Aug 7, 2023

stemangiola commented Aug 9, 2023 •

edited

aliosmancetin commented Aug 14, 2023

stemangiola commented Aug 14, 2023

GrootJ commented Aug 16, 2023

stemangiola commented Aug 16, 2023 •

edited

aliosmancetin commented Aug 17, 2023 •

edited

stemangiola commented Aug 17, 2023 •

edited

stemangiola commented Aug 20, 2023

stemangiola commented Aug 27, 2023 •

edited

aliosmancetin commented Aug 27, 2023

GrootJ commented Aug 28, 2023 •

edited

stemangiola commented Aug 28, 2023

GrootJ commented Aug 28, 2023

stemangiola commented Aug 28, 2023 •

edited

stemangiola commented Aug 29, 2023

aliosmancetin commented Aug 29, 2023

stemangiola commented Aug 29, 2023 •

edited

stemangiola commented Oct 31, 2023

chilampoon commented Nov 13, 2023

aliosmancetin commented Nov 14, 2023

Improve the documentation for tidybulk #282

Improve the documentation for tidybulk #282

Comments

stemangiola commented Jul 22, 2023 • edited

aliosmancetin commented Jul 28, 2023

stemangiola commented Jul 29, 2023 • edited

aliosmancetin commented Jul 29, 2023

aliosmancetin commented Jul 29, 2023 • edited

stemangiola commented Jul 30, 2023 • edited

stemangiola commented Aug 2, 2023

aliosmancetin commented Aug 3, 2023

stemangiola commented Aug 6, 2023

aliosmancetin commented Aug 7, 2023

stemangiola commented Aug 9, 2023 • edited

aliosmancetin commented Aug 14, 2023

stemangiola commented Aug 14, 2023

GrootJ commented Aug 16, 2023

stemangiola commented Aug 16, 2023 • edited

aliosmancetin commented Aug 17, 2023 • edited

stemangiola commented Aug 17, 2023 • edited

stemangiola commented Aug 20, 2023

stemangiola commented Aug 27, 2023 • edited

aliosmancetin commented Aug 27, 2023

GrootJ commented Aug 28, 2023 • edited

stemangiola commented Aug 28, 2023

GrootJ commented Aug 28, 2023

stemangiola commented Aug 28, 2023 • edited

stemangiola commented Aug 29, 2023

aliosmancetin commented Aug 29, 2023

stemangiola commented Aug 29, 2023 • edited

stemangiola commented Oct 31, 2023

chilampoon commented Nov 13, 2023

aliosmancetin commented Nov 14, 2023

Improve the documentation for `tidybulk` #282

Improve the documentation for `tidybulk` #282

stemangiola commented Jul 22, 2023 •

edited

stemangiola commented Jul 29, 2023 •

edited

aliosmancetin commented Jul 29, 2023 •

edited

stemangiola commented Jul 30, 2023 •

edited

stemangiola commented Aug 9, 2023 •

edited

stemangiola commented Aug 16, 2023 •

edited

aliosmancetin commented Aug 17, 2023 •

edited

stemangiola commented Aug 17, 2023 •

edited

stemangiola commented Aug 27, 2023 •

edited

GrootJ commented Aug 28, 2023 •

edited

stemangiola commented Aug 28, 2023 •

edited

stemangiola commented Aug 29, 2023 •

edited