Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some thoughts #43

Open
2 of 5 tasks
h-2 opened this issue Feb 15, 2021 · 4 comments
Open
2 of 5 tasks

Some thoughts #43

h-2 opened this issue Feb 15, 2021 · 4 comments

Comments

@h-2
Copy link
Member

h-2 commented Feb 15, 2021

First of all: it's great to have this!

I just looked through this repository and had the following very subjective thoughts. Please see this as constructive criticism :)

There are many things I found surprising to see / have never seen with applications before. This includes most of the test things, but also Doxygen. While the terms might be clear for a library developer, in the context of the APP, they are not clear to me. Does an app have an API? And if yes, isn't this the CLI? Why are tests for the CLI written inside C++? I don't think I have ever seen that before, and I am fairly sure regular app developers have never heard of GMock etc.
Similarly: what does the documentation cover? The interface of the binary or some functions inside the app source code? Why would users of the app be interested in the latter?

Don't get me wrong, I do think that these things might make sense for large applications that have multiple developers and are maintained for a longer period of time. And I also think that it is kind of fancy that one can write app-tests from inside C++ and have coverage checked... I just don't see 90% of this happening in most academic projects or even projects in my company. So for the assumed primary audience, it feels a little over-engineered -> there are a lot of things to understand that are likely not relevant, while those things that are, are not easy to do.

Being an app-developer myself, I am primarily interested in the following:

  • Repository that has SeqAn3 included as submodule and provides a CMakeLists
  • Help with setting up Github CI
  • Predefined make targets to export the command-line description. This is the documentation for the APP. In the context of Github, an export to GFMD would be great, but the exports to man and html that we currently have are also good! make doc-html, make doc-md...
  • An easy test-system where I only need to provide three things per test (CLI arguments, input data, output control data). Ideally, this would be a CMake-macro that takes exactly these three arguments and understands if the URL to the data is a local folder or a http-link with a tarball in which cases it fetches the stuff.
  • Nice-to-have: make update-seqan3 updates the SeqAn3 submodule to the latest stable release and prints the release notes :)
@marehr
Copy link
Member

marehr commented Feb 15, 2021

Hi thank you for the feedback!

What is a GFMD? And what should the result be?

An easy test-system where I only need to provide three things per test (CLI arguments, input data, output control data)

What is the output control data? How do you compare that? A simple diff if the files are equal?

In seqan2 we had some magic python scripts that did the testing, now we said that C++ is the main source for the tests (as the app itself will be written in C++).

I agree that the flexibility might be overwhelming when you start and if we can streamline the simple use cases that it would be awesome. Do you have any examples in seqan2 that are simple tests?

But looking at seqan2 apps, like bs_tools, I guess that a cmake command would be more complex than just 3 parameters.

    conf = app_tests.TestConf(
        program=path_to_bisar,
        redir_stdout=ph.outFile('other.stdout'),
        args=['-e3', str(4), '-e4', str(5),
              #-e3 4 -e4 5
              '-o', ph.outFile('reads_se_N6000_0.CT_GA.verified.sam'),
              ph.inFile('reads_se_N6000.CT_GA.sam'),
              ph.inFile('hg18_chr21_3000.fa'),
              ph.inFile('reads_se_N6000.fastq')],
        to_diff=[#(ph.inFile('STDOUT_FILE'),
                  #ph.outFile('STDOUT_FILE')),
                 (ph.inFile('reads_se_N6000_0.CT_GA.verified.sam'),
                  ph.outFile('reads_se_N6000_0.CT_GA.verified.sam'),
                  transforms)])
    conf_list.append(conf)

let's figure out some cmake syntax (based on https://cmake.org/cmake/help/latest/command/execute_process.html):

# ignoring the original post-processing of the data:

# some test description
cli_test (
    bisar_target
        -e3 4
        -e4 5
        -o reads_se_N6000_0.CT_GA.verified.sam
        reads_se_N6000.CT_GA.sam
        hg18_chr21_3000.fa.sam
        reads_se_N6000.fastq
    DATASOURCES
        reads_se_N6000.CT_GA.sam # will be copied into current working directory
        hg18_chr21_3000.fa.sam # will be copied into current working directory
        reads_se_N6000.fastq # will be copied into current working directory
    TEST_DATASOURCES
        reads_se_N6000_0.CT_GA.verified.sam.expected # will be copied into current working directory
    OUTPUT_FILE_DIFF
        # convention over configuration:
        #   diff reads_se_N6000_0.CT_GA.verified.sam with reads_se_N6000_0.CT_GA.verified.sam.expected
        reads_se_N6000_0.CT_GA.verified.sam
)

compared to

TEST_F(cli_test, some_test_description)
{
    cli_test_result result = execute_app("bisar", 
                                         "-e3", "4", "-e4", "5",
                                         "-o", "reads_se_N6000_0.CT_GA.verified.sam",
                                         data("reads_se_N6000.CT_GA.sam"),
                                         data("hg18_chr21_3000.fa.sam"),
                                         data("reads_se_N6000.fastq"));

    // simple diff
    EXPECT_FILE_CONTENT_EQ("reads_se_N6000_0.CT_GA.verified.sam", data("reads_se_N6000_0.CT_GA.verified.sam.expected"));
}

I think we can't hide any inherent test-declaration complexity. Neither by using bash / pyhton / cmake or C++.

I generally would argue that staying within one echo system is preferable. google-test is known to most of us.

@marehr
Copy link
Member

marehr commented Feb 15, 2021

* Nice-to-have: `make update-seqan3` updates the SeqAn3 submodule to the latest stable release and prints the release notes :)

We want to offer a github-action for that. Creating a new PR if a new release candidate / stable release is available.

@h-2
Copy link
Member Author

h-2 commented Feb 18, 2021

What is a GFMD? And what should the result be?

Github-flavored Markdown! The result should be something you can copy and paste into your README.

What is the output control data? How do you compare that? A simple diff if the files are equal?

I usually just compute checksums on the files.

compared to

Oh, this looks much better than I expected! I thought that users need to edit cli_test.hpp, but if you only need to do is what is included in fastq_to_fasta_options_test.cpp, this is actually really good 👍 🚀

[I would maybe move the struct fastq_to_fasta : public cli_test {}; to the cpp, so people really don't need to touch the hpp. Since multiple tests can be defined in one cpp, I would maybe also choose a "neutral" name for the filename, so users don't need to touch the CMakelists and can just edit the cpp in-place.]

And I would put more focus on these tests and not promote the others as much. I also found the folder names in the top-level directory a bit surprising.

I know that these things are subjective, but I personally would do it like this:

include      # SeqAn, other header-libraries
src          # application source (both headers and cpp)
test         # Only the "cli-test"
advanced     # "api-docs" and other "tests"
CMakeLists.txt
README.md

AFAIK include is only used for system and library headers and not for application headers, at least I have never seen the latter. I would use this name for the directory with SeqAn3, or possibly submodules as a more generic term that also clearly indicates that this git-repo depends on others.

src always included all application source-code on the projects I worked on. Why separate them?

test This includes what is typically understood to be the "application test".

advanced All the other cool stuff that people could/should do but probably won't. It should be clearly stated that this folder can be removed without breaking other things.

That way we have less top-lovel folders and the typical/easy things are immediately visible. People know that there is more fancy stuff ("advanced") but also know that they don't need to look at it or understand it to do the other stuff. [Prevents the "What's all of this?" -- reaction]

@joergi-w
Copy link
Member

[I would maybe move the struct fastq_to_fasta : public cli_test {}; to the cpp, so people really don't need to touch the hpp. Since multiple tests can be defined in one cpp, I would maybe also choose a "neutral" name for the filename, so users don't need to touch the CMakelists and can just edit the cpp in-place.]

This is already planned :)

And I like your suggestions about the directories and CLI documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants