Skip to content

Releases: mborsetti/webchanges

v3.22

25 Apr 05:03
Compare
Choose a tag to compare

⚠ Breaking Changes

  • Developers integrating custom Python code (hooks.py) should refer to the "Internals" section below for important
    changes.

Changed

  • Snapshot database

    • Moved the snapshot database from the "user_cache" directory (typically not backed up) to the "user_data" directory.
      The new paths are (typically):

      • Linux: ~/.local/share/webchanges or $XDG_DATA_HOME/webchanges
      • macOS: ~/Library/Application Support/webchanges
      • Windows: %LOCALAPPDATA%\webchanges\webchanges
    • Renamed the file from cache.db to snapshots.db to more clearly denote its contents.

    • Introduced a new command line option --database to specify the filename for the snapshot database, replacing
      the previous --cache option (which is deprecated but still supported).

    • Many thanks to Markus Weimar <https://github.com/Markus00000>__ for pointing this problem out in issue #75 <https://github.com/mborsetti/webchanges/issues/75>__.

  • Modified the command line argument --test-differ to accept a second parameter, specifying the maximum number of
    diffs to generate.

  • Updated the command line argument --dump-history to display the mime_type attribute when present.

  • Enhanced differs functionality:

    • Standardized headers for deepdiff and imagediff to align more closely with those of unified.

    • Improved the google_ai differ:

      • Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
        when Google API errors occur.
      • Improved the default prompt to Analyze this unified diff and create a summary listing only the changes:\n\n{unified_diff} for improved results.

Fixed

  • Fixed an AttributeError Exception when the fallback HTTP client package requests is not installed, as reported
    by yubiuser <https://github.com/yubiuser>__ in issue #76 <https://github.com/mborsetti/webchanges/issues/76>__.
  • Addressed a ValueError in the --test-differ command, a regression reported by Markus Weimar <https://github.com/Markus00000>__ in issue #79 <https://github.com/mborsetti/webchanges/issues/79>__.
  • To prevent overlooking changes, webchanges now refrains from saving a new snapshot if a differ operation fails
    with an Exception.

Internals

  • New mime_type attribute: we are now capturing and storing the data type (as a MIME type) alongside data in the
    snapshot database to facilitate future automation of filtering, diffing, and reporting. Developers using custom
    Python code will need to update their filter and retrieval methods in classes inheriting from FilterBase and
    JobBase, respectively, to accommodate the mime_type attribute. Detailed updates are available in the hooks documentation <https://webchanges.readthedocs.io/en/stable/hooks.html#:~:text=Changed%20in%20version%203.22>__.
  • Updated terminology: References to cache in object names have been replaced with ssdb (snapshot database).
  • Int

v3.21

16 Apr 04:52
Compare
Choose a tag to compare

Added

  • Job selectable differs: The differ, i.e. the method by which changes are detected and summarized, can now be
    selected job by job. Also gone is the restriction to have only unified diffs, HTML table diff, or calling an outside
    executable, as differs have become modular.

    • Python programmers can write their own custom differs using the hooks.py file.
    • Backward-compatibility is preserved, so your current jobs will continue to work.
  • New differs:

    • difflib to report element-by-element changes in JSON or XML structured data.
    • imagediff (BETA) to report an image showing changes in an image being tracked.
    • ai_google (BETA) to use a Generative AI provide a summary of changes (free API key required). We use
      Google's Gemini Pro 1.5 since it is the first model that can ingest 1M tokens, allowing to analyze changes in
      long documents (up to 350,000 words, or about 700 pages single-spaced) such as terms and conditions, privacy
      policies, etc. where summarization adds the most value and which other models can't handle. The differ can call
      the Gen AI model to summarize a unified diff or to find and summarize the differences itself. Also supported is
      Gemini 1.0, but it can handle a lower number of tokens.

Changed

  • Filter absolute_links now converts URLs of the action, href and src attributes in any HTML tag, as
    well as the data attribute of the <object> tag; it previously converted only the href attribute of
    <a> tags.
  • Updated explanatory text and error messages for increased clarity.
  • You can now select jobs to run by using its url/command instead of its number, e.g. webchanges https://test.com is
    just as valid as webchanges 1.

Deprecated

  • Job directive diff_tool. Replaced with the command differ (see here <https://webchanges.readthedocs.io/en/stable/differs.html#command_diff>__.

Fixed

  • webchanges --errors will no longer check jobs who have disabled: true (thanks to yubiuser <https://github.com/yubiuser>__ for reporting this in issue # 73 <https://github.com/mborsetti/webchanges/issues/73>__).
  • Markdown links with no text were not clickable when converted to HTML; conversion now adds a 'Link without text'
    label.

Internals

  • Improved speed of creating a unified diff for an HTML report.
  • Reduced excessive logging from httpx's sub-modules hpack and httpcore when running with -vv.

v3.20.2

16 Mar 23:05
Compare
Choose a tag to compare

Fixed

  • Parsing the to address for the sendmail email reporter.

v3.20.1

16 Mar 05:59
Compare
Choose a tag to compare

Fixed

  • Regression introduced in supporting sending to multiple "to" addresses.

v3.20

15 Mar 08:30
Compare
Choose a tag to compare

Added

  • re.findall filter to extract, delete or replace non-overlapping text using Python re.findall.

Changed

  • --test-reporter now allows testing of reporters that are not enabled; if a reporter is not enabled, a warning
    will be issued. This simplifies testing.
  • email reporter (both SMTP and sendmail) supports sending to multiple "to" addresses.

Fixed

  • Reports from jobs with monospace: true were not being rendered correctly in Gmail.

v3.19.1

07 Mar 00:10
Compare
Choose a tag to compare

Fixed

  • Added the Date header field to SMTP email messages to ensure the timestamp is present even when it is not added
    by the server upon receipt. Contributed by Dominik <https://github.com/DL6ER>__ in #71 <https://github.com/mborsetti/webchanges/pull/71>__.

v3.19

28 Feb 11:17
Compare
Choose a tag to compare

Fixed

  • Under certain circumstances, certain default jobs directives declared in the configuration file would not be applied
    to jobs.
  • Fixed automatic fallback to requests when the required HTTP client package httpx is not installed.

Added

  • block_elements directive for jobs with use_browser: true is supported again and can be used to improve
    speed by preventing binary and media content loading, while providing all elements required dynamic web page load
    (see the advanced section of the documentation for a suggestion of elements to block). This was available under
    Pypetteer and has been reintroduced for Playwright.
  • init_script directive for jobs with use_browser: true to execute a JavaScript in Chrome after launching it
    and before navigating to url. This can be useful to e.g. unset certain default Chrome navigator
    properties by calling a JavaScript function to do so.

v3.18.1

20 Feb 01:22
Compare
Choose a tag to compare

Fixed

  • Fixed regression whereby configuration key empty-diff was inadvertently renamed empty_diff.

v3.18

19 Feb 10:49
Compare
Choose a tag to compare

Fixed

  • Fixed incorrect handling of HTTP client libraries when httpx is not installed (should graciously fallback to
    requests). Reported by drws <https://github.com/drws>__ as an add-on to issuse #66 <https://github.com/mborsetti/webchanges/issues/66>__.

Added

  • Job directive enabled to allow disabling of a job without removing or commenting it in the jobs file (contributed
    by James Hewitt <https://github.com/Jamstah>__ upstream <https://github.com/thp/urlwatch/pull/785>__).
  • webhook reporter has a new rich_text config option for preformatted rich text for Slack (contributed
    by K̶e̶v̶i̶n̶ <https://github.com/vimagick>__ upstream <https://github.com/thp/urlwatch/pull/780>__).

Changed

  • Command line argument --errors now uses conditional requests to improve speed. Do not use to test newly modified
    jobs since websites reporting no changes from the last snapshot stored by webchanges are skipped; use
    --test instead.
  • If the simplejson library is installed, it will be used instead of the built-in json module (see
    https://stackoverflow.com/questions/712791).

v3.17.2

11 Dec 17:44
Compare
Choose a tag to compare

Fixed

  • Exception in error handling when requests is not installed (reported by
    yubiuser <https://github.com/yubiuser>__ in #66 <https://github.com/mborsetti/webchanges/issues/66>__).