Releases: mborsetti/webchanges
v3.22
⚠ Breaking Changes
- Developers integrating custom Python code (
hooks.py
) should refer to the "Internals" section below for important
changes.
Changed
-
Snapshot database
-
Moved the snapshot database from the "user_cache" directory (typically not backed up) to the "user_data" directory.
The new paths are (typically):- Linux:
~/.local/share/webchanges
or$XDG_DATA_HOME/webchanges
- macOS:
~/Library/Application Support/webchanges
- Windows:
%LOCALAPPDATA%\webchanges\webchanges
- Linux:
-
Renamed the file from
cache.db
tosnapshots.db
to more clearly denote its contents. -
Introduced a new command line option
--database
to specify the filename for the snapshot database, replacing
the previous--cache
option (which is deprecated but still supported). -
Many thanks to
Markus Weimar <https://github.com/Markus00000>
__ for pointing this problem out in issue#75 <https://github.com/mborsetti/webchanges/issues/75>
__.
-
-
Modified the command line argument
--test-differ
to accept a second parameter, specifying the maximum number of
diffs to generate. -
Updated the command line argument
--dump-history
to display themime_type
attribute when present. -
Enhanced differs functionality:
-
Standardized headers for
deepdiff
andimagediff
to align more closely with those ofunified
. -
Improved the
google_ai
differ:- Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
when Google API errors occur. - Improved the default prompt to
Analyze this unified diff and create a summary listing only the changes:\n\n{unified_diff}
for improved results.
- Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
-
Fixed
- Fixed an AttributeError Exception when the fallback HTTP client package
requests
is not installed, as reported
byyubiuser <https://github.com/yubiuser>
__ inissue #76 <https://github.com/mborsetti/webchanges/issues/76>
__. - Addressed a ValueError in the
--test-differ
command, a regression reported byMarkus Weimar <https://github.com/Markus00000>
__ inissue #79 <https://github.com/mborsetti/webchanges/issues/79>
__. - To prevent overlooking changes, webchanges now refrains from saving a new snapshot if a differ operation fails
with an Exception.
Internals
- New
mime_type
attribute: we are now capturing and storing the data type (as a MIME type) alongside data in the
snapshot database to facilitate future automation of filtering, diffing, and reporting. Developers using custom
Python code will need to update their filter and retrieval methods in classes inheriting from FilterBase and
JobBase, respectively, to accommodate themime_type
attribute. Detailed updates are available in thehooks documentation <https://webchanges.readthedocs.io/en/stable/hooks.html#:~:text=Changed%20in%20version%203.22>
__. - Updated terminology: References to
cache
in object names have been replaced withssdb
(snapshot database). - Int
v3.21
Added
-
Job selectable differs: The differ, i.e. the method by which changes are detected and summarized, can now be
selected job by job. Also gone is the restriction to have only unified diffs, HTML table diff, or calling an outside
executable, as differs have become modular.- Python programmers can write their own custom differs using the
hooks.py
file. - Backward-compatibility is preserved, so your current jobs will continue to work.
- Python programmers can write their own custom differs using the
-
New differs:
difflib
to report element-by-element changes in JSON or XML structured data.imagediff
(BETA) to report an image showing changes in an image being tracked.ai_google
(BETA) to use a Generative AI provide a summary of changes (free API key required). We use
Google's Gemini Pro 1.5 since it is the first model that can ingest 1M tokens, allowing to analyze changes in
long documents (up to 350,000 words, or about 700 pages single-spaced) such as terms and conditions, privacy
policies, etc. where summarization adds the most value and which other models can't handle. The differ can call
the Gen AI model to summarize a unified diff or to find and summarize the differences itself. Also supported is
Gemini 1.0, but it can handle a lower number of tokens.
Changed
- Filter
absolute_links
now converts URLs of theaction
,href
andsrc
attributes in any HTML tag, as
well as thedata
attribute of the<object>
tag; it previously converted only thehref
attribute of
<a>
tags. - Updated explanatory text and error messages for increased clarity.
- You can now select jobs to run by using its url/command instead of its number, e.g.
webchanges https://test.com
is
just as valid aswebchanges 1
.
Deprecated
- Job directive
diff_tool
. Replaced with thecommand
differ (seehere <https://webchanges.readthedocs.io/en/stable/differs.html#command_diff>
__.
Fixed
webchanges --errors
will no longer check jobs who havedisabled: true
(thanks toyubiuser <https://github.com/yubiuser>
__ for reporting this in issue# 73 <https://github.com/mborsetti/webchanges/issues/73>
__).- Markdown links with no text were not clickable when converted to HTML; conversion now adds a 'Link without text'
label.
Internals
- Improved speed of creating a unified diff for an HTML report.
- Reduced excessive logging from
httpx
's sub-moduleshpack
andhttpcore
when running with-vv
.
v3.20.2
Fixed
- Parsing the
to
address for thesendmail
email
reporter.
v3.20.1
Fixed
- Regression introduced in supporting sending to multiple "to" addresses.
v3.20
Added
re.findall
filter to extract, delete or replace non-overlapping text using Pythonre.findall
.
Changed
--test-reporter
now allows testing of reporters that are not enabled; if a reporter is not enabled, a warning
will be issued. This simplifies testing.email
reporter (both SMTP and sendmail) supports sending to multiple "to" addresses.
Fixed
- Reports from jobs with
monospace: true
were not being rendered correctly in Gmail.
v3.19.1
Fixed
- Added the
Date
header field to SMTP email messages to ensure the timestamp is present even when it is not added
by the server upon receipt. Contributed byDominik <https://github.com/DL6ER>
__ in#71 <https://github.com/mborsetti/webchanges/pull/71>
__.
v3.19
Fixed
- Under certain circumstances, certain default jobs directives declared in the configuration file would not be applied
to jobs. - Fixed automatic fallback to
requests
when the required HTTP client packagehttpx
is not installed.
Added
block_elements
directive for jobs withuse_browser: true
is supported again and can be used to improve
speed by preventing binary and media content loading, while providing all elements required dynamic web page load
(see the advanced section of the documentation for a suggestion of elements to block). This was available under
Pypetteer and has been reintroduced for Playwright.init_script
directive for jobs withuse_browser: true
to execute a JavaScript in Chrome after launching it
and before navigating tourl
. This can be useful to e.g. unset certain default Chromenavigator
properties by calling a JavaScript function to do so.
v3.18.1
Fixed
- Fixed regression whereby configuration key
empty-diff
was inadvertently renamedempty_diff
.
v3.18
Fixed
- Fixed incorrect handling of HTTP client libraries when
httpx
is not installed (should graciously fallback to
requests
). Reported bydrws <https://github.com/drws>
__ as an add-on toissuse #66 <https://github.com/mborsetti/webchanges/issues/66>
__.
Added
- Job directive
enabled
to allow disabling of a job without removing or commenting it in the jobs file (contributed
byJames Hewitt <https://github.com/Jamstah>
__upstream <https://github.com/thp/urlwatch/pull/785>
__). webhook
reporter has a newrich_text
config option for preformatted rich text for Slack (contributed
byK̶e̶v̶i̶n̶ <https://github.com/vimagick>
__upstream <https://github.com/thp/urlwatch/pull/780>
__).
Changed
- Command line argument
--errors
now uses conditional requests to improve speed. Do not use to test newly modified
jobs since websites reporting no changes from the last snapshot stored by webchanges are skipped; use
--test
instead. - If the
simplejson
library is installed, it will be used instead of the built-injson
module (see
https://stackoverflow.com/questions/712791).
v3.17.2
Fixed
- Exception in error handling when
requests
is not installed (reported by
yubiuser <https://github.com/yubiuser>
__ in#66 <https://github.com/mborsetti/webchanges/issues/66>
__).