Skip to content

Commit

Permalink
Deploying to gh-pages from @ b91949f 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
richard-rogers committed Apr 25, 2024
1 parent f490d92 commit 960f7cf
Show file tree
Hide file tree
Showing 4 changed files with 86 additions and 19 deletions.
45 changes: 38 additions & 7 deletions _modules/whylogs/experimental/api/logger.html
Original file line number Diff line number Diff line change
Expand Up @@ -570,18 +570,39 @@ <h1>Source code for whylogs.experimental.api.logger</h1><div class="highlight"><
<span class="k">return</span> <span class="n">averages</span>


<span class="k">def</span> <span class="nf">_all_strings</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">all</span><span class="p">([</span><span class="nb">all</span><span class="p">([</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span> <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span>


<div class="viewcode-block" id="log_batch_ranking_metrics"><a class="viewcode-back" href="../../../../api/whylogs/experimental/api/logger/index.html#whylogs.experimental.api.logger.log_batch_ranking_metrics">[docs]</a><span class="k">def</span> <span class="nf">log_batch_ranking_metrics</span><span class="p">(</span>
<span class="n">data</span><span class="p">:</span> <span class="n">pd</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">frame</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">,</span>
<span class="n">prediction_column</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
<span class="n">target_column</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
<span class="n">score_column</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
<span class="n">k</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
<span class="n">convert_non_numeric</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">schema</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="n">DatasetSchema</span><span class="p">,</span> <span class="kc">None</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
<span class="n">log_full_data</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="n">ViewResultSet</span><span class="p">:</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;Log ranking metrics for a batch of data.</span>

<span class="sd"> You can call the function several ways:</span>
<span class="sd"> - Pass both prediction_column and target_column.</span>
<span class="sd"> - The named columns contain lists of strings. In this case, the prediction column contains the</span>
<span class="sd"> items the model has predicted are relevant, and the target column contains the items that</span>
<span class="sd"> are actually relevant. In this case, relevance is boolean.</span>

<span class="sd"> - The prediction column contains lists of integers and the target column contains lists of numbers</span>
<span class="sd"> or booleans. The value at the i-th position in the predicted list is the predicted rank of the i-th</span>
<span class="sd"> element of the domain. The value at the i-th position in the target list is the true relevance score of the</span>
<span class="sd"> i-th element of the domain. The score can be numeric or boolean. Higher scores indicate higher relevance.</span>

<span class="sd"> - Pass both target_column and score_column. The value at the i-th position in the target list is the true relevance</span>
<span class="sd"> of the i-th element of the domain (represented as a number, higher being more relevant; or boolean). The value at</span>
<span class="sd"> the i-th position in the score list is the model output for the i-th element of the domain.</span>

<span class="sd"> - Pass only target_column. The target column contians lists of numbers or booleans. The list entries are the true</span>
<span class="sd"> relevance of the items predicted by the model in prediction order.</span>

<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> data : pd.core.frame.DataFrame</span>
Expand All @@ -596,9 +617,6 @@ <h1>Source code for whylogs.experimental.api.logger</h1><div class="highlight"><
<span class="sd"> k : Optional[int], optional</span>
<span class="sd"> Consider the top k ranks for metrics calculation.</span>
<span class="sd"> If `None`, use all outputs, by default None</span>
<span class="sd"> convert_non_numeric : bool, optional</span>
<span class="sd"> Indicates whether prediction/target columns are non-numeric.</span>
<span class="sd"> If True, prediction/target should be strings, by default False</span>
<span class="sd"> schema : Union[DatasetSchema, None], optional</span>
<span class="sd"> Defines the schema for tracking metrics in whylogs, by default None</span>
<span class="sd"> log_full_data : bool, optional</span>
Expand Down Expand Up @@ -665,19 +683,28 @@ <h1>Source code for whylogs.experimental.api.logger</h1><div class="highlight"><

<span class="sd"> binary_single_df = pd.DataFrame(</span>
<span class="sd"> {</span>
<span class="sd"> &quot;raw_predictions&quot;: [</span>
<span class="sd"> &quot;raw_targets&quot;: [</span>
<span class="sd"> [True, False, True], # First recommended item: Relevant, Second: Not relevant, Third: Relevant</span>
<span class="sd"> [False, False, False], # None of the recommended items are relevant</span>
<span class="sd"> [True, True, False], # First and second recommended items are relevant</span>
<span class="sd"> ]</span>
<span class="sd"> }</span>
<span class="sd"> )</span>

<span class="sd"> result = log_batch_ranking_metrics(data=binary_single_df, prediction_column=&quot;raw_predictions&quot;, k=3)</span>
<span class="sd"> result = log_batch_ranking_metrics(data=binary_single_df, target_column=&quot;raw_targets&quot;, k=3)</span>

<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">formatted_data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="c1"># TODO: does this have to be deep?</span>

<span class="k">if</span> <span class="n">score_column</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">prediction_column</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&quot;Cannot specify both score_column and prediction_column&quot;</span><span class="p">)</span>

<span class="k">if</span> <span class="n">prediction_column</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">score_column</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">target_column</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># https://github.com/whylabs/whylogs/issues/1505</span>
<span class="c1"># The column use logic is complex, so just swapping them here for this case</span>
<span class="c1"># rather than unraveling all the use cases.</span>
<span class="n">prediction_column</span><span class="p">,</span> <span class="n">target_column</span> <span class="o">=</span> <span class="n">target_column</span><span class="p">,</span> <span class="n">prediction_column</span>

<span class="k">if</span> <span class="n">prediction_column</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="n">score_column</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">target_column</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">prediction_column</span> <span class="o">=</span> <span class="s2">&quot;__predictions&quot;</span>
Expand All @@ -687,7 +714,7 @@ <h1>Source code for whylogs.experimental.api.logger</h1><div class="highlight"><
<span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="nb">list</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">argsort</span><span class="p">(</span><span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">row</span><span class="p">)))</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&quot;Either prediction_column or score+target columns must be specified&quot;</span><span class="p">)</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&quot;Either target_column or score+target columns must be specified&quot;</span><span class="p">)</span>

<span class="n">relevant_cols</span> <span class="o">=</span> <span class="p">[</span><span class="n">prediction_column</span><span class="p">]</span>

Expand Down Expand Up @@ -719,6 +746,10 @@ <h1>Source code for whylogs.experimental.api.logger</h1><div class="highlight"><
<span class="k">if</span> <span class="n">k</span> <span class="ow">and</span> <span class="n">k</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&quot;k must be a positive integer&quot;</span><span class="p">)</span>

<span class="n">convert_non_numeric</span> <span class="o">=</span> <span class="n">_all_strings</span><span class="p">(</span><span class="n">formatted_data</span><span class="p">[</span><span class="n">prediction_column</span><span class="p">])</span> <span class="ow">and</span> <span class="n">_all_strings</span><span class="p">(</span>
<span class="n">formatted_data</span><span class="p">[</span><span class="n">target_column</span><span class="p">]</span>
<span class="p">)</span>

<span class="n">row_wise_functions</span> <span class="o">=</span> <span class="n">RowWiseMetrics</span><span class="p">(</span><span class="n">target_column</span><span class="p">,</span> <span class="n">prediction_column</span><span class="p">,</span> <span class="n">convert_non_numeric</span><span class="p">)</span>
<span class="n">formatted_data</span><span class="p">[</span><span class="s2">&quot;count_at_k&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">formatted_data</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">row_wise_functions</span><span class="o">.</span><span class="n">relevant_counter</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">k</span><span class="p">,),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">formatted_data</span><span class="p">[</span><span class="s2">&quot;count_all&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">formatted_data</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">row_wise_functions</span><span class="o">.</span><span class="n">relevant_counter</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">_max_k</span><span class="p">,),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
Expand Down
27 changes: 21 additions & 6 deletions _sources/api/whylogs/experimental/api/logger/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,10 +188,28 @@ Attributes
.. py:function:: log_batch_ranking_metrics(data: whylogs.core.stubs.pd.core.frame.DataFrame, prediction_column: Optional[str] = None, target_column: Optional[str] = None, score_column: Optional[str] = None, k: Optional[int] = None, convert_non_numeric=False, schema: Union[whylogs.core.DatasetSchema, None] = None, log_full_data: bool = False) -> whylogs.api.logger.result_set.ViewResultSet
.. py:function:: log_batch_ranking_metrics(data: whylogs.core.stubs.pd.core.frame.DataFrame, prediction_column: Optional[str] = None, target_column: Optional[str] = None, score_column: Optional[str] = None, k: Optional[int] = None, schema: Union[whylogs.core.DatasetSchema, None] = None, log_full_data: bool = False) -> whylogs.api.logger.result_set.ViewResultSet
Log ranking metrics for a batch of data.

You can call the function several ways:
- Pass both prediction_column and target_column.
- The named columns contain lists of strings. In this case, the prediction column contains the
items the model has predicted are relevant, and the target column contains the items that
are actually relevant. In this case, relevance is boolean.

- The prediction column contains lists of integers and the target column contains lists of numbers
or booleans. The value at the i-th position in the predicted list is the predicted rank of the i-th
element of the domain. The value at the i-th position in the target list is the true relevance score of the
i-th element of the domain. The score can be numeric or boolean. Higher scores indicate higher relevance.

- Pass both target_column and score_column. The value at the i-th position in the target list is the true relevance
of the i-th element of the domain (represented as a number, higher being more relevant; or boolean). The value at
the i-th position in the score list is the model output for the i-th element of the domain.

- Pass only target_column. The target column contians lists of numbers or booleans. The list entries are the true
relevance of the items predicted by the model in prediction order.

:param data: Dataframe with the data to log.
:type data: pd.core.frame.DataFrame
:param prediction_column: Column name for the predicted values. If not provided, the score_column and target_column must be provided, by default None
Expand All @@ -204,9 +222,6 @@ Attributes
:param k: Consider the top k ranks for metrics calculation.
If `None`, use all outputs, by default None
:type k: Optional[int], optional
:param convert_non_numeric: Indicates whether prediction/target columns are non-numeric.
If True, prediction/target should be strings, by default False
:type convert_non_numeric: bool, optional
:param schema: Defines the schema for tracking metrics in whylogs, by default None
:type schema: Union[DatasetSchema, None], optional
:param log_full_data: Whether to log the complete dataframe or not.
Expand Down Expand Up @@ -271,15 +286,15 @@ Attributes

binary_single_df = pd.DataFrame(
{
"raw_predictions": [
"raw_targets": [
[True, False, True], # First recommended item: Relevant, Second: Not relevant, Third: Relevant
[False, False, False], # None of the recommended items are relevant
[True, True, False], # First and second recommended items are relevant
]
}
)

result = log_batch_ranking_metrics(data=binary_single_df, prediction_column="raw_predictions", k=3)
result = log_batch_ranking_metrics(data=binary_single_df, target_column="raw_targets", k=3)



0 comments on commit 960f7cf

Please sign in to comment.