New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change typing rate table to account for reads mapped to multiple markers at the same locus #174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@standage this is ready for review!
microhapulator/data/template.html
Outdated
@@ -371,20 +371,22 @@ <h2>Read Mapping</h2> | |||
|
|||
<a name="typing"></a> | |||
<h2>Haplotype Calling</h2> | |||
<p>Haplotypes are called empirically on a per-read basis using <code>mhpl8r type</code>. Reads that span all SNPs of interest in the corresponding marker are examined; all other reads are discarded. The haplotype tallies represent a <em>typing result</em> for each sample.</p> | |||
<p>Haplotypes are called empirically on a per-read basis using <code>mhpl8r type</code>. Reads that span all SNPs of interest in the corresponding marker are examined; all other reads are discarded. If a read maps to multiple markers at the same locus, haplotypes are called for each marker at the locus; therefore, the number of typing events can exceed the number of mapped reads. The haplotype tallies represent a <em>typing result</em> for each sample.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of struggle with the language here of how to define a typing event. Very open to alternative was to describe this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following is a bit more verbose, but hopefully it gives the reader a clear and steady conceptual on-ramp. Let me know what you think.
Haplotypes are called empirically using
mhpl8r type
as follows. MicroHapulator examines each aligned read to determine its suitability for haplotype calling: this is a typing event. If the read alignment spans all SNPs of interest, the typing event is successful and a haplotype call is made. If not, the typing event is failed and no haplotype call is made. (Note that if more than one marker is defined at a given locus, MicroHapulator can attempt multiple typing events per read. In this case the number of Attempted Typing Events will exceed the number of Mapped Reads.) Collectively, the tallies of each observed haplotype represent a typing result for each sample. The typing rate is calculated as the number of successful typing events divided by the total number of attempted typing events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much better! Thanks
num_typed_reads = typing.TypedReads.sum() | ||
typing_total_reads = typing.TotalReads.sum() | ||
num_typing_success = typing.TypedReads.sum() | ||
num_typing_attempted = typing.TotalReads.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just changed the variable names here to be more consistent with the new table columns.
<th class="alnrt">Typed Reads</th> | ||
<th class="alnrt">Attempted Typing Events</th> | ||
<th class="alnrt">Successful Typing Events</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These column names are good. I fiddled around a bit and tried more concise names—typing attempts, just typing events—but none of these was as clear and effective at communicating the nuances of what's going on as what you have here.
This PR closes #171 . Now that we allow multiple marker definitions at a given locus, reads are typed multiple times. This PR changes the table in the report to show
Attempted Typing Events
andSuccessful Typing Events
where a "typing event" is defined as haplotype calling for 1 read at a single marker definition.