Skip to content

Commit

Permalink
Version 3.22rc0
Browse files Browse the repository at this point in the history
  • Loading branch information
mborsetti committed Apr 25, 2024
1 parent 8310b87 commit 82a9e91
Show file tree
Hide file tree
Showing 32 changed files with 1,516 additions and 1,013 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Expand Up @@ -169,7 +169,7 @@ repos:
require_serial: true
args: [--max-line-length, '120']
- id: mypy # https://github.com/python/mypy
name: Static typing for Python (mypy)
name: Check Python static typing (mypy)
entry: mypy
additional_dependencies:
- mypy
Expand Down
57 changes: 57 additions & 0 deletions CHANGELOG.rst
Expand Up @@ -33,6 +33,63 @@ can check out the `wish list <https://github.com/mborsetti/webchanges/blob/main/
Internals, for changes that don't affect users. [triggers a minor patch]
Version 3.22rc0
===================
Unreleased

⚠ Breaking Changes
------------------
* For developers hooking their own Python code, see "Internals" below.

Changed
-------
* Moved the snapshot database from the "user_cache" directory (which is typically not backed up) to within the
"user_data" directory (i.e. ``~/.local/share/webchanges`` or ``$XDG_DATA_HOME/webchanges`` in linux,
``~/Library/Application Support/webchanges`` in macOS and ``%LOCALAPPDATA%\webchanges\webchanges`` in
Windows. Also renamed the file ``snapshots.db`` to more clearly denote its contents. Many thanks to `Markus Weimar
<https://github.com/Markus00000>`__ for pointing this problem out in issue `#75
<https://github.com/mborsetti/webchanges/issues/75>`__.

- Added command line ``--database`` to specify the filename to use for the snapshot database (renamed from
``--cache``, deprecated but still supported)

* Command line argument ``--test-differ`` not takes a second argument, the maximum number of diffs to produce.
* Command line argument ``--dump_history`` shows the ``mime_type`` field.
* Various improvements to differs:

- Standardized the headers of ``deepdiff`` and ``imagediff`` to more closely resemble those of ``unified``;
- Various improvements to ``google_ai`` differ:

- Better error handling (differ will no longer fail if Google API returns an error, but will produce a report
containing the error and the unified diff);
- Improved the default prompt to ``Analyze this unified diff and create a summary listing only the
changes:\n\n{unified_diff}``.

- Documentation updates.


Fixed
-----
* AttributeError Exception when the fallback HTTP client package ``requests`` is not installed. Reported by `yubiuser
<https://github.com/yubiuser>`__ in issue `#76 <https://github.com/mborsetti/webchanges/issues/76>`__.
* ValueError when using ``--test-differ`` regression reported by `Markus Weimar
<https://github.com/Markus00000>`__ in issue `#79 <https://github.com/mborsetti/webchanges/issues/79>`__.
* To avoid missing changes, a new snapshot is not saved if a differ fails with an Exception.


Internals
---------
* We are now capturing the ``mime_type`` attribute of the data (and store it alongside the data in the snapshot
database) to enable automated handling of filtering, diffing and reporting (in the future). For programmers using
their own hooked Python code, this change unfortunately requires updating the filter method of a class that inherits
from FilterBase and the retrieve method of any class inheriting from JobBase to handle mime_type. You can see new
definitions in the `hooks documentation
<https://webchanges.readthedocs.io/en/stable/hooks.html#:~:text=Changed%20in%20version%203.22>`__.
* Object names containing ``cache`` have been updated to use ``ssdb`` (snapshot database) instead.
* Created a NamedTuple called ``Snapshot`` to simplify code retrieving and saving snapshots to the database.



Version 3.21
===================
2024-04-16
Expand Down
2 changes: 1 addition & 1 deletion docs/_static/css/webchanges.css
Expand Up @@ -6,7 +6,7 @@ See https://docs.readthedocs.io/en/stable/guides/adding-custom-css.html#overridi
/* This line is theme specific - it includes the base theme CSS */
/*@import '../alabaster.css'; /* for Alabaster */
@import 'theme.css'; /* for the Read the Docs theme */

@import url('https://fonts.googleapis.com/css?family=Roboto:400,700&display=swap');
.strike {
text-decoration: line-through;
}
Expand Down
10 changes: 10 additions & 0 deletions docs/cli.rst
Expand Up @@ -106,6 +106,13 @@ how diffs from job 1 look like in HTML if running on a machine with a web browse
webchanges --test-differ 1 --test-reporter browser


Optionally, you can specify the maximum number of comparisons (diffs) to run, instead of producing diffs for all
the snapshots that have been saved::

webchanges --test-differ 1 2 --test-reporter browser # run differ for job 1 a maximum of 2 times



.. versionchanged:: 3.3
Will now display all saved snapshots instead of only the latest 10.

Expand All @@ -115,6 +122,9 @@ how diffs from job 1 look like in HTML if running on a machine with a web browse
.. versionchanged:: 3.9
Can be used in combination with ``--test-reporter``.

.. versionchanged:: 3.22
Added the maximum number of comparisons to perform (optional).


.. _test-reporter:

Expand Down
7 changes: 4 additions & 3 deletions docs/cli_help.txt
@@ -1,5 +1,5 @@
usage: webchanges [-h] [-V] [-v] [--jobs FILE] [--config FILE] [--hooks FILE]
[--cache FILE] [--list-jobs] [--errors [REPORTER]]
[--database FILE] [--list-jobs] [--errors [REPORTER]]
[--test [JOB]] [--no-headless] [--test-differ JOB [JOB ...]]
[--dump-history JOB] [--max-workers WORKERS]
[--test-reporter REPORTER] [--smtp-login] [--telegram-chats]
Expand Down Expand Up @@ -35,8 +35,9 @@ override file defaults:
matching a glob pattern
--config FILE read configuration from FILE
--hooks FILE use FILE as hooks.py module to import
--cache FILE use FILE as cache (snapshots database); FILE can be a
redis URI
--database FILE, --cache FILE
use FILE as snapshots database; FILE can be a redis
URI

job management:
--list-jobs list jobs and their index number
Expand Down
Binary file removed docs/differ_ai_google_example.png
Binary file not shown.
105 changes: 76 additions & 29 deletions docs/differs.rst
Expand Up @@ -171,14 +171,17 @@ advances in the technology and the prospect of integrating more generative AI mo
<https://github.com/mborsetti/webchanges/discussions>`__.

Prefaces a unified diff with a textual summary of changes generated by Google's `Gemini Pro 1.5 Generative AI model
<https://ai.google.dev/>`__ (in Preview) called via an API call. This is free of charge for most.
<https://ai.google.dev/>`__ (in `Preview <https://cloud.google.com/products?hl=en#product-launch-stages>`__) called via
an API call. This is free of charge for most developers.

.. important:: Requires a system environment variable ``GOOGLE_AI_API_KEY`` containing the Google Cloud AI Studio
API Key which you obtain `here <https://aistudio.google.com/app/apikey>`__. To access the Gemini Pro 1.5 model
during the Preview period, make a request `here <https://aistudio.google.com/app/waitlist/97445851>`__. Note that
API Key which you obtain `here <https://aistudio.google.com/app/apikey>`__ and which itself requires a `Google
Cloud <https://cloud.google.com/>`__ account. To access the Gemini Pro 1.5 model
during the `Preview <https://cloud.google.com/products?hl=en#product-launch-stages>`__ period, you may have to
make a request `here <https://aistudio.google.com/app/waitlist/97445851>`__. Please note that
starting on 2 May 2024, the use of Gemini API from a project that has billing enabled will be subject to
`pay-as-you-go pricing <https://ai.google.dev/pricing>`__. To avoid surprises, we recommend you set up your API key
on a project without billing or, at a minimum, set up a `budget
on a GCP project without billing or, at a minimum, set up a `budget
<https://console.cloud.google.com/billing/01457C-2ABCC1-8A6144/budgets>`__ with threshold notification.

Gemini Pro 1.5 is the first widely available model with a context window of up to 1 million tokens, which allows it
Expand All @@ -198,15 +201,39 @@ access is not gated.

Examples
````````
The below output used the default ``prompt`` and a summary is prefaced to the unified diff.
Using the default ``prompt``, a summary is prefaced to the unified diff:

.. image:: differ_ai_google_example.png
:width: 1039
:alt: Google AI differ example output
.. raw:: html

<embed>
<div style="padding:12px;margin-bottom:24px;font-family:Roboto,sans-serif;font-size:13px;
border:1px solid#e1e4e5;background:white;">
<strong>Summary of Changes:</strong><br><br>
The provided unified diff shows a single line change:<br><br>
<ul style="line-height:1.2em">
<li><strong>Line 1:</strong> The timestamp was updated from
<span style="font-family:monospace;white-space:pre-wrap">Sat Apr 6 10:46:13 UTC 2024</span> to
<span style="font-family:monospace;white-space:pre-wrap">Sat Apr 6 10:55:04 UTC 2024</span>. </li>
</ul>
<table style="border-collapse:collapse">
<tr><td style="font-family:monospace;color:darkred">--- @ Sat, 06 Apr 2024 10:46:13 +0000</td></tr>
<tr><td style="font-family:monospace;color:darkgreen">+++ @ Sat, 06 Apr 2024 10:55:04 +0000</td></tr>
<tr><td style="background-color:#fbfbfb">@@ -1 +1 @@</td></tr>
<tr style="background-color:#fff0f0;color:#9c1c1c;text-decoration:line-through">
<td>Sat Apr 6 10:46:13 UTC 2024</td>
</tr>
<tr style="background-color:#d1ffd1;color:#082b08"><td>Sat Apr 6 10:55:04 UTC 2024</td></tr>
</table>
<i><small>
---<br>
Summary generated by Google Generative AI (differ directive(s): model=gemini-1.5-pro-latest)
</small></i>
</div>
</embed>

The job directive below will uses a custom ``prompt`` to have the Generative AI make the comparison. This requires a
lot more tokens and time, but may work better in certain cases. More information about writing input prompts for
these models can be found `here <https://ai.google.dev/docs/prompt_best_practices>`__.
The job directive below uses a custom ``prompt`` to have the Generative AI make the comparison. This requires a lot
more tokens and time, but may work better in certain cases. More information about writing input prompts for these
models can be found `here <https://ai.google.dev/docs/prompt_best_practices>`__.

.. code-block:: yaml
Expand All @@ -226,25 +253,26 @@ This differ is currently in BETA and these directives MAY change in the future.
.. model default is retrievable from
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest?key=$GOOGLE_AI_API_KEY
* ``model``: A `model name <https://ai.google.dev/models/gemini>`__ (default: ``gemini-1.5-pro-latest``).
* ``prompt``: The prompt sent to the model; the strings ``{unified_diff}``, ``{old_data}`` and ``{new_data}`` will
be replaced by the respective content (default: ``Summarize this unified diff:\n\n{unified_diff}``).
* ``model`` (str): A `model name <https://ai.google.dev/models/gemini>`__ (default: ``gemini-1.5-pro-latest``).
* ``prompt`` (str): The prompt sent to the model; the strings ``{unified_diff}``, ``{old_data}`` and ``{new_data}`` will
be replaced by the respective content (default: ``Analyze this unified diff and create a summary listing only the
changes:\n\n{unified_diff}``).
* ``prompt_ud_context_lines`` (int): Number of context lines in the unified diff sent to the model if
``{unified_diff}`` is present in the ``prompt`` (default: 999). If the resulting model prompt becomes approximately
too big for the model to handle, the unified diff will be recalculated with the default number of context lines (3).
Note that this unified diff is a different one than the one in the report itself,.
Note that this unified diff is a different one than the diff included in the report itself.
* ``timeout`` (float): The number of seconds before timing out the API call (default: 300).
* ``temperature`` (float between 0.0 and 1.0): The model's Temperature parameter, which controls randomness; higher
values increase diversity (see note below) (default: 0.0).
* ``top_k`` (int 1 or greater): The model's TopK parameter, i.e. sample from the k most likely next tokens at
* ``top_k`` (int of 1 or greater): The model's TopK parameter, i.e. sample from the k most likely next tokens at
each step; lower k focuses on higher probability tokens (see note below) (default: model-dependent, but typically 1,
see Google documentation; not available in ``gemini-1.5-pro-latest``)
* ``top_p`` (float between 0.0 and 1.0): The model's TopP parameter, or the cumulative probability cutoff for token
selection; lower p means sampling from a smaller, more top-weighted nucleus and reduce diversity (see note below)
selection; lower p means sampling from a smaller, more top-weighted nucleus and reduces diversity (see note below)
(default: model-dependent, but typically 0.95 or 1.0, see Google documentation)
* ``token_limit`` (int): An override of the maximum size of the model's context window (used for internal testing).
* ``unified`` (dict): directives passed to :ref:`unified differ <unified_diff>` for the unified differ attached to the
output.
* ``unified`` (dict): directives passed to :ref:`unified differ <unified_diff>`, which prepares the unified diff
attached to this report.

Directives for the underlying :ref:`unified differ <unified_diff>` can be passed in as key called ```unified``, as
follows:
Expand Down Expand Up @@ -276,9 +304,9 @@ follows:

command
-------
Executes an outside command to use an external differ (e.g. wdiff). The external program will have to exit with a
status of 0 if no differences were found, a status of 1 if any differences were found, or any other status for any
error.
Call an external differ (e.g. wdiff). The old data and new data are written to a temporary file, and the names of the
two files are appended to the command. The external program will have to exit with a status of 0 if no differences
were found, a status of 1 if any differences were found, or any other status for any error.

If ``wdiff`` is used, its output will be colorized when displayed on stdout (typically a screen) and for HTML reports.

Expand Down Expand Up @@ -324,6 +352,23 @@ Examples:
data_type: xml
ignore_order: true
Example diff:

.. raw:: html

<embed>
<div style="padding:12px;margin-bottom:24px;font-family:Roboto,sans-serif;font-size:13px;
border:1px solid#e1e4e5;background:white;"><span style="font-family:monospace;white-space:pre-wrap;font-size:13px;">Differ: deepdiff for json
<span style="color:darkred">Old Sat, 13 Apr 2024 21:19:36 +0800</span>
<span style="color:darkgreen">New Sun, 14 Apr 2024 21:24:14 +0800</span>
------------------------------------
• Type of [&#39;Items&#39;][0][&#39;<wbr>CurrentInventory&#39;] changed from int to NoneType and value changed from <span style="background-color:#fff0f0;color:#9c1c1c;text-decoration:line-through">&quot;1&quot;</span> to <span style="background-color:#d1ffd1;color:#082b08">None</span>.
• Type of [&#39;Items&#39;][0][&#39;<wbr>Description&#39;] changed from str to NoneType and value changed from <span style="background-color:#fff0f0;color:#9c1c1c;text-decoration:line-through">&quot;Gadget&quot;</span> to <span style="background-color:#d1ffd1;color:#082b08">None</span>.
</span>
</div>
</embed>


Optional directives
```````````````````
* ``data_type`` (``json`` or ``xml``): The type of data being analyzed (default: ``json``).
Expand Down Expand Up @@ -367,17 +412,17 @@ Optional directives
```````````````````
This differ is currently in BETA and the directives may change in the future.

* ``data_type`` (``url``, ``filename``, ``ascii85`` or ``base64``): What the data represent: a link to the image, the
path to the file containing the image or the image itself as `Ascii85 <https://en.wikipedia.org/wiki/Ascii85>`__ or
`RFC 4648 <https://datatracker.ietf.org/doc/html/rfc4648.html>`__ `Base_64 <https://en.wikipedia.org/wiki/Base64>`__
text (default: ``url``).
* ``data_type`` (``url``, ``filename``, ``ascii85`` or ``base64``): The type of data to process: a link to the image,
the path to the file containing the image, or the image itself encoded as `Ascii85
<https://en.wikipedia.org/wiki/Ascii85>`__ or `RFC 4648 <https://datatracker.ietf.org/doc/html/rfc4648.html>`__
`Base_64 <https://en.wikipedia.org/wiki/Base64>`__ text (default: ``url``).
* ``mse_threshold`` (float): The minimum mean squared error (MSE) between two images to consider them changed;
requires the package ``numpy`` to be installed (default: 2.5).

.. note:: If you pass a ``url`` or ``filename`` to the differ, it will detect changes only if the url or
filename changes, not if the image behind the url/filename does. To detect changes in an image when the url or
filename doesn't change, build a job that captures the image itself encoded in Ascii85 or Base64 (potentially using
the :ref:`ascii85` filter) and set ``data_type: ascii85`` or ``data_type: base64``.
filename doesn't change, build a job that captures the image itself encoded in Ascii85 (preferably, see the
:ref:`ascii85` filter or Base64 and set ``data_type`` accordingly.

Required packages
`````````````````
Expand Down Expand Up @@ -411,6 +456,8 @@ deleted, or changed, but the HTML table format produced by Python's `difflib.Htm
.diff_chg { color: orange; background-color: lightyellow; }
</style>
<!-- Created in Python 3.12 -->
<div style="padding:12px;margin-bottom:24px;font-family:Roboto,sans-serif;font-size:13px;
border:1px solid#e1e4e5;background:white;">
<table class="diff" id="difflib_chg_to0__top" cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
Expand Down Expand Up @@ -449,12 +496,12 @@ deleted, or changed, but the HTML table format produced by Python's `difflib.Htm
</tr>
</tbody>
</table>
</div>
</embed>

For backwards compatibility, this is the default differ for an ``html`` reporter with the configuration setting
``diff`` (deprecated) set to ``html``.


.. code-block:: yaml
url: https://example.net/table.html
Expand Down

0 comments on commit 82a9e91

Please sign in to comment.