Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alphabet] Alphabet types can no longer be constructed using their rank #2745

Open
tloka opened this issue Jun 9, 2021 · 12 comments
Open

[Alphabet] Alphabet types can no longer be constructed using their rank #2745

tloka opened this issue Jun 9, 2021 · 12 comments
Labels
documentation documentation is missing to close this issue ready to tackle Fully specified, discussed and understood. Can be tackled.

Comments

@tloka
Copy link
Contributor

tloka commented Jun 9, 2021

I am just working on an SeqAn3 upgrade to have the newest version running.
I recognized with the quality alphabets (in my case mostly phred42), that it is no longer possible to construct them using a rank value, e.g., what I could do before was:

seqan3::phred42 quality_value{32};

Is it the desired behavior that this is no longer possible?
If yes, why was it removed? Especially for quality values, it is very useful to allow initialization by rank.
What would be your recommendation to initialize an alphabet element efficiently and with single-line code (I didn't find something helpful in the snippets)? Is there some useful function I didn't find to quickly convert a numeric value to an alphabet type intepreting the numeric value as its rank? Assigning the rank after construction feels "wrong" and is probably also not the most efficient solution...

@joshuak94
Copy link
Contributor

I think this is to prevent ambiguity as described here under "Assigning and retrieving values".

For a one line initialization you could do this:

using namespace seqan3::literals;
seqan3::phred42 quality_value{'A'_phred42};

@tloka
Copy link
Contributor Author

tloka commented Jun 10, 2021

Thanks for your answer.

I am aware of the one line using the char representation, and I use it for nucleotide alphabets (where it makes sense, and is easily understandable).
However, for qualities this way of initialization imo is a bit counter-intuitive. When setting a quality of 9 (e.g., in a test), I don't really want to look up which character represents a value of 9.

I currently need this in my tests to init qualified types, e.g. I what I did before was something like:

    seqan3::qualified<dna5, phred42> nucleotide1{'A'_dna5, phred42{37}};
    seqan3::qualified<dna5, phred42> nucleotide2{'A'_dna5, phred42{37}};
    seqan3::qualified<dna5, phred42> nucleotide3{'A'_dna5, phred42{37}};

This is also why I need a one-liner. Having multiple lines just to init these qualified values is not appropriate. What I would have with your suggestion would be:

    seqan3::qualified<dna5, phred42> nucleotide1{'A'_dna5, '%'_phred42};
    seqan3::qualified<dna5, phred42> nucleotide2{'A'_dna5, 'C'_phred42};
    seqan3::qualified<dna5, phred42> nucleotide3{'A'_dna5, '!'_phred42};

which is not really intuitive to read (could you tell me what quality '%' stands for?). Of course I can always comment, but you know what I mean ...

For now, I am actually going with

   seqan3::qualified<dna5, phred42> nucleotide1{'A'_dna5, seqan3::phred42{}.assign_phred(37)};

which works and would even easily allow to use template types (which the operators don't), something like:

template<typename dna_t, typename qual_t>
void whatever_function()
{
   seqan3::qualified<dna_t, qual_t> nucleotide1{dna_t{}.assign_char('A'), qual_t{}.assign_phred(37)};
}

I just don't find this solution very nice and thought there might be a nicer solution for this.

@marehr
Copy link
Member

marehr commented Jun 14, 2021

Hi @tloka,

thank you for the input :)

One of our initial design goals for alphabets were to disallow initialising an alphabet by builtin types. For some reasons we did introduce this exception for phred-scores.

Another problem is that

SEQAN3_DEPRECATED_310 constexpr phred_base(phred_type const p) noexcept
{
static_cast<derived_type *>(this)->assign_phred(p);
}

this constructor is implicit. That means seqan3::phred42 p = 32; would be a valid expression and would allow for weird data conversion C++ magic. This is completely undesirable.

I recognized with the quality alphabets (in my case mostly phred42), that it is no longer possible to construct them using a rank value, e.g., what I could do before was:

seqan3::phred42 quality_value{32};

This shows that the API isn't clear. The given value is actually the PHRED-score value and not the rank.

We currently have three ways to assign a value to a phred alphabet, assign_char by char, assign_rank by rank or assign_phred by PHRED-score. All three are technically speaking a char (e.g. one "byte" of memory) and can't be differentiated at construction.

That means

seqan3::phred42 p = 0; // is that rank or phred score?
seqan3::phred42 p = -5; // is that rank or phred score?
seqan3::phred42 p = 'I'; // is that rank or phred score?

were all valid constructors.

To disambiguate those, we disallow a direct construction and use explicit initialisation by their assignment functions.

Furthermore, the most generic way would be to use

using some_phred_type = seqan3::phred42;
seqan3::assign_phred_to(some_phred_type{}, 34);

But, from what I read so far is that you don't really miss this particular constructor, but you want to have a short-way to initialise a known at compile time phred score value.

I think introducing something like this:

int main()
{
    auto cutoff_phred = 14_phred42; // short for seqan3::phred42.assign_phred(14)
    auto legacy = -3_phred68solexa;
    auto by_char = 'I'_phred42;
}

would solve that problem, too.

(See https://godbolt.org/z/3aKz3s41q for an example :))

@tloka
Copy link
Contributor Author

tloka commented Jun 14, 2021

Hey @marehr,
thanks a lot for this very detailed (!) explanation and the reasons for this design decision.
I was indeed not aware that the rank is not the same as the quality value (however, that's probably my fault).

Furthermore, the most generic way would be to use
using some_phred_type = seqan3::phred42;
seqan3::assign_phred_to(some_phred_type{}, 34);

I think this is solution is pretty much what I was looking for. I wasn't aware of this function, and the behavior that you can use it with an rvalue returning a copy. So, if I understand it correctly, it would work to use the following for qualified types:

using some_phred_type = seqan3::phred42
seqan3::qualified<dna5, some_phred_type> nucleotide1{'A'_dna5, seqan3::assign_phred_to(some_phred_type{}, 37)};

which I already find somehow nicer than the seqan3::phred42{}.assign_phred(37) notation I used.


For your last suggestion, wouldn't something like this

auto cutoff_phred = 14_phred42; // short for seqan3::phred42.assign_phred(14)

again produce the same ambiguity as before for the constructors (is it the rank or the phred score)? At the same time, these would not work when using a template type or type alias. It would probably be the solution with the shortest syntax, but very limited to use.

What I would probably like would be some generic initialization function template(s) for alphabets that you can clearly define in an unambiguous way, like

// Template function definition
template<typename T> // TODO: restrict for alphabets
constexpr T init_by_rank(int rank)
{
   return T{}.assign_rank(rank);
}

// ... and others

// ... and then use it like this

using some_alphabet_type = seqan3::phred42;
some_alphabet_type some_quality = seqan3::init_by_rank<some_alphabet_type>(37); // Available for all alphabet types
some_alphabet_type some_quality2 = seqan3::init_by_phred<some_alphabet_type>(25); // Only available for phred types
some_alphabet_type some_quality3 = seqan3::init_by_char<some_alphabet_type>('A'); // Available for all alphabet types
some_alphabet_type some_quality4 = seqan3::init_by_rank(37); // Auto deduction should be easily possible

// I guess, for qualified alphabets it would even enable
seqan3::qualified<seqan3::dna5, some_alphabet_type>{'A'_dna5, seqan3::init_by_phred(37)}; // (hopefully) auto deducted .. not sure though

// Note: I didn't think about all details, and probably it's not working exactly like this, but I guess you understand roughly what I mean..

Not as short, but unambiguous, intuitive (imo), possible to use with template types and aliases, possible to implement as constexpr for compile-time use (your alphabets are literals!?). But that's probably just my (very subjective) taste of writing code, from a functional perspective you can achieve the same with seqan3::assign_phred_to(some_phred_type{}, 37)}, so for now I am happy with this.

In every case, it would be helpful to have examples for seqan3::assign_phred_to(some_phred_type{}, 37)} etc. in the snippets for initialization of quality types (at least, I didn't find any).

@marehr
Copy link
Member

marehr commented Jun 14, 2021

For your last suggestion, wouldn't something like this

auto cutoff_phred = 14_phred42; // short for seqan3::phred42.assign_phred(14)

again produce the same ambiguity as before for the constructors (is it the rank or the phred score)? At the same time, these would not work when using a template type or type alias.

True, we would need some documentation to explain that no alphabet provides literals for the rank representation and this is actually the PHRED-score value.

It would probably be the solution with the shortest syntax, but very limited to use.

All literals only work for constant-expressions (e.g. values that are known at compile time).


some_alphabet_type some_quality4 = seqan3::init_by_rank(37); // Auto deduction should be easily possible

// I guess, for qualified alphabets it would even enable
seqan3::qualified<seqan3::dna5, some_alphabet_type>{'A'_dna5, seqan3::init_by_phred(37)}; // (hopefully) auto deducted .. not sure though

What should the return type of seqan3::init_by_rank(37) be? It does not know what it's target alphabet is.

I think your suggestion would lead to a completely different API. The main problem is that it assumes a unified way to construct/create an alphabet. Right now we don't require any alphabet to be constructible in a certain way, this allows us for example to have proxy-alphabets that can only be constructed with a reference alphabet (i.e. no default constructor).

  • seqan3::assign_char_to('A', seqan3::dna4{}),
  • seqan3::assign_rank_to(5, seqan3::dna4{})

are designed in such a way that they modify the state, but don't create the state. I think this is a more "generic"-way of API design, your solution wouldn't allow for that.

I understand the desire to have something equally short as 'A'_dna4 for the "variable" case, but I haven't seen a good solution
for that.

We currently have this free-function design:

// what's the difference between
seqan3::assign_rank_to(some_alphabet_type{}, 37); // note construction is done OUTSIDE the function, we assume how to construct it 
// and
seqan3::init_by_rank<some_alphabet_type>(37); //  init_by_rank needs to assume how to construct some_alphabet_type, we have no control over that
// ?

I don't think a strong-type like

seqan3::dna4{seqan3::alphabet_rank{42}}; // strong-type alphabet_rank

would be much better.

seqan3::dna4::by_rank(42); // strong-type alphabet_rank

something like that wouldn't be universal.

@eseiler
Copy link
Member

eseiler commented Jun 15, 2021

In every case, it would be helpful to have examples for seqan3::assign_phred_to(some_phred_type{}, 37)} etc. in the snippets for initialization of quality types (at least, I didn't find any).

Where would you expect to find such an example?

Some locations:

Independent of that, it seems like assign_phred_to could use a \relates/\relatesalso to the quality alphabets, or vice versa. Same with the other free functions.

@tloka
Copy link
Contributor Author

tloka commented Jun 15, 2021

ok, I think you are right, and what I would be looking for is simply not possible.
I thought that auto deduction might work in the cases above, but you are right, it doesn't (I am not really familiar with HOW exactly auto deduction works, so I thought it could maybe find the correct type from the left side of the assignment / the required input type for the qualified constructor - but you are right, this doesn't work).
So, in the end you are simply right with

// what's the difference between
seqan3::assign_rank_to(some_alphabet_type{}, 37); // note construction is done OUTSIDE the function, we assume how to construct it 
// and
seqan3::init_by_rank<some_alphabet_type>(37); //  init_by_rank needs to assume how to construct some_alphabet_type, we have no control over that
// ?

I think the remaining points are very subjective, so I don't see where I have a valid point left 🤪

However, I think my previous closing is still valid

In every case, it would be helpful to have examples for seqan3::assign_phred_to(some_phred_type{}, 37)} etc. in the snippets for initialization of quality types (at least, I didn't find any).

Also in the cookbook, you just show construction using literals and lvalue construction with follow-up assignment:

    // Two objects of seqan3::dna4 alphabet constructed with a char literal.
    seqan3::dna4 ade = 'A'_dna4;
    seqan3::dna4 gua = 'G'_dna4;
 
    // Two additional objects assigned explicitly from char or rank.
    seqan3::dna4 cyt, thy;
    cyt.assign_char('C');
    thy.assign_rank(3);

I think, you should add at least the following examples:

   // assign_rank and assign_char can also be used with rvalues. This is, for example, useful for one-line construction of a qualified type
   seqan3::qualified<seqan3::dna4, seqan3::phred42>(seqan3::dna4{}.assign_char('C'), seqan3::phred42{}.assign_rank(10));

   // The same can be achieve with free functions
   seqan3::qualified<seqan3::dna4, seqan3::phred42>(seqan3::assign_char_to(seqan3::dna4{}, 'C'), seqan3::assign_rank_to(seqan3::phred42{}, 10);

   // Importantly, this way of alphabet type construction also works with type aliases and template types
   using dna_t = seqan3::dna4;
   dna_t cyt2 = tna_t{}.assign_char('C');

@tloka
Copy link
Contributor Author

tloka commented Jun 15, 2021

In every case, it would be helpful to have examples for seqan3::assign_phred_to(some_phred_type{}, 37)} etc. in the snippets for initialization of quality types (at least, I didn't find any).

Where would you expect to find such an example?

Some locations:

* Class docs: https://docs.seqan.de/seqan/3-master-user/classseqan3_1_1phred42.html

* Free function: https://docs.seqan.de/seqan/3-master-user/group__alphabet__quality.html#gabd397385e3ae6a90f812db7b7760feac

@eseiler Maybe I am not using the documentation 100% as I should. I usually check the cookbook and the snippets directly (in the repo). For my exact example, in both places, I couldn't find how to construct a qualified type with specific values in a single line (EDIT: without using literals; I needed to use it with a type alias). I think, in both places, I even didn't find how to perform rvalue construction with specific values (examples only show the assignment to lvalues). But maybe I just missed something ...

EDIT2: I think the main point is the use of type aliases and template types. You mainly focus on literals in the examples - but these don't work in these situations. At the same time, I wasn't aware that the free function can be used with rvalues returning a copy (however, I found that in the documentation later) .. ok ,maybe it's just my fault 🤪

@eseiler
Copy link
Member

eseiler commented Jun 15, 2021

I think we could do (either or both):

  • a cookbook entry where we show all the "advanced" usage (and then refer there from the alphabets)
  • add "advanced" usage to the snippets

Depends on how much is "left over" after doing the snippets.

I think we should check if we can link assign_phred_to to the quality alphabets and then have snippets in the respective quality alphabets. Otherwise, since assign_phred_to is a free function, we would have to fix an alphabet for the snippet.

@tloka
Copy link
Contributor Author

tloka commented Jun 15, 2021

I think we should check if we can link assign_phred_to to the quality alphabets and then have snippets in the respective quality alphabets. Otherwise, since assign_phred_to is a free function, we would have to fix an alphabet for the snippet.

With thinking a bit about it and having one more look in the documentation, I think the documentation of the assign_phred_to etc. targets my case. Having some linking from the alphabet types there would probably help, at least I didn't find it (but as I said, maybe it's also just my fault..). Plus maybe an entry in the cookbook, at least for me this is usually the first place where I look for examples.

@marehr
Copy link
Member

marehr commented Jun 16, 2021

I have just glanced through all comments, and I'm not sure if you already suggested this. Maybe we should add a notice box to all member functions that are there to make class model a certain concept / customisation point object (CPO).

Oh, I just saw that alphabet_base does that for seqan3::alphabet_base::to_char, seqa3::alphabet_base::to_rank, etc...

But, adding a snippet there would be helpful, too.

seqan3::phred_base::assign_phred just says:

Assign from the numeric Phred score value.

Satisfies the seqan3::writable_quality_alphabet requirement via the seqan3::assign_rank_to() wrapper.

I think no-one knows what wrapper mean in this case. I think it should just say:

Provides an implementation for seqan3::assign_phred_to, required to model seqan3::writable_quality_alphabet.

PLUS a snippet showing "free" function (CPO) and member function usage.


Otherwise, since assign_phred_to is a free function, we would have to fix an alphabet for the snippet.

Technically speaking this isn't a free function, but a functor (i.e. class with call operator) instance with some special overload rules which is called a CPO. We need to fix an alphabet for the member functions, too, as they all inherit from seqan3::alphabet_base.


Plus maybe an entry in the cookbook, at least for me this is usually the first place where I look for examples.

Sounds reasonable.

@tloka
Copy link
Contributor Author

tloka commented Jun 17, 2021

Yeah, just to sum it up and sort it a bit from my side:

  • My initial question is answered. I initially just asked this because I didn't see a reason for the value constructor being removed and didn't quickly find an alternative way that looked nice for me to construct the qualities by value. The reasons you gave why you removed the constructor make total sense
  • The way to construct I was using now seqan3::phred42{}.assign_phred(37) is actually one of the two possible ways to achieve what I want. The second one is the use of the free functions / functors
  • I think not the documentation itself, but the examples / snippets could be extended to make it easier to find the functors, and to clarify that these can be used for valued construction seqan3::phred42 quality = seqan3::assign_phred_to(seqan3::phred42{}, 37) (well, the functor itself doesn't construct anything, but you know what I mean...) which is not obvious for me that it can be used in this way (again, the documentation of the function makes this clear - I just never arrived there when searching the documentation)
  • I think, such an example (at least with one of the functors) should be present in the snippets for each alphabet type. It might seem redundant, but (from my very subjective perspective) when I want to find out how to construct a phred42, I look into the documentation of phred42 (for other types accordingly), and not into some generic alphabet documentation (maybe I arrive there somehow later, but I would always start at the specific type I want to use).
  • Additionally, having this in the cookbook, especially to show that this is how construction works when using type aliases / template types where the literals cannot be used. Here, I would expect to find the broadest range of ways to construct alphabets, also the seqan3::phred42{}.assign_phred(37) which is missing there as well.

The documentation is really good and extensive once you find the correct place. For me, I was just not able to find the right places (though I must admit that I also didn't look too intensively into the documentation this time; but the first places where I looked didn't point me to the place where I would find the information I needed - this is what could probably be improved by adding the examples suggested above).

@marehr marehr added this to the SeqAn 3.1.0 milestone Sep 28, 2021
@smehringer smehringer removed this from the SeqAn 3.1.0 milestone Nov 8, 2021
@eseiler eseiler added the documentation documentation is missing to close this issue label Oct 20, 2022
@eseiler eseiler self-assigned this Oct 20, 2022
@smehringer smehringer added the ready to tackle Fully specified, discussed and understood. Can be tackled. label Oct 20, 2022
@eseiler eseiler changed the title Alphabet types can no longer be constructed using their rank [Alphabet] Alphabet types can no longer be constructed using their rank Oct 20, 2022
@eseiler eseiler removed their assignment Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation documentation is missing to close this issue ready to tackle Fully specified, discussed and understood. Can be tackled.
Projects
None yet
Development

No branches or pull requests

5 participants