Releases: mborsetti/webchanges
v3.7.1
⚠ Breaking Changes
- Removed Python 3.6 support to simplify code. Older Python versions are supported for 3 years after being obsoleted by
a new major release; as Python 3.7 was released on 27 June 2018, the last date of Python 3.6 support was 26 June 2021
Changed
- Improved
telegram
reporter now uses MarkdownV2 and preserves most formatting of HTML sites processed by the
html2text
filter, e.g. clickable links, bolding, underlining, italics and strikethrough
Added
- New filter
execute
to filter the data using an executable without invoking the shell (asshellpipe
does)
and therefore exposing to additional security risks - New sub-directive
silent
fortelegram
reporter to receive a notification with no sound (true/false) (default:
false) - Github Issues templates for bug reports and feature requests
Fixed
- Job
headers
stored in the configuration file (config.yaml
) are now merged correctly and case-insensitively
with those present in the job (injobs.yaml
). A header in the job replaces a header by the same name if already
present in the configuration file, otherwise is added to the ones present in the configuration file. - Fixed
TypeError: expected string or bytes-like object
error in cookiejar (called by requests module) caused by
somecookies
being read from the jobs YAML file in other formats
Internals
- Strengthened security with
bandit <https://pypi.org/project/bandit/>
__ to catch common security issues - Standardized code formatting with
black <https://pypi.org/project/black/>
__ - Improved pre-commit speed by using local libraries when practical
- More improvements to type hinting (moving towards testing with
mypy <https://pypi.org/project/mypy/>
__) - Removed module jobs_browser.py (needed only for Python 3.6)
v3.7.0
⚠ Breaking Changes
- Removed Python 3.6 support to simplify code. Older Python versions are supported for 3 years after being obsoleted by
a new major release; as Python 3.7 was released on 27 June 2018, the last date of Python 3.6 support was 26 June 2021
Changed
- Improved
telegram
reporter now uses MarkdownV2 and preserves most formatting of HTML sites processed by the
html2text
filter, e.g. clickable links, bolding, underlining, italics and strikethrough
Added
- New filter
execute
to filter the data using an executable without invoking the shell (asshellpipe
does)
and therefore exposing to additional security risks - New sub-directive
silent
fortelegram
reporter to receive a notification with no sound (true/false) (default:
false) - Github Issues templates for bug reports and feature requests
Fixed
- Job
headers
stored in the configuration file (config.yaml
) are now merged correctly and case-insensitively
with those present in the job (injobs.yaml
). A header in the job replaces a header by the same name if already
present in the configuration file, otherwise is added to the ones present in the configuration file. - Fixed
TypeError: expected string or bytes-like object
error in cookiejar (called by requests module) caused by
somecookies
being read from the jobs YAML file in other formats
Internals
- Strengthened security with
bandit <https://pypi.org/project/bandit/>
__ to catch common security issues - Standardized code formatting with
black <https://pypi.org/project/black/>
__ - Improved pre-commit speed by using local libraries when practical
- More improvements to type hinting (moving towards testing with
mypy <https://pypi.org/project/mypy/>
__) - Removed module jobs_browser.py (needed only for Python 3.6)
v3.6.1
Reminder
Older Python versions are supported for 3 years after being obsoleted by a new major release. As Python 3.7 was
released on 7 June 2018, the codebase will be streamlined by removing support for Python 3.6 on or after 7 June 2021.
Added
- Clearer results messages for
--delete-snapshot
command line argument
Fixed
- First run would fail when creating new
config.yaml
file. Thanks toDavid <https://github.com/notDavid>
__ in
issue#10 <https://github.com/mborsetti/webchanges/issues/10>
__. - Use same duration precision in all reports
v3.6.0
Added
- Run a subset of jobs by adding their index number(s) as command line arguments. For example, run
webchanges 2 3
to
only run jobs #2 and #3 of your jobs list. Runwebchanges --list
to find the job numbers. Suggested byDan Brown <https://github.com/dbro>
__ upstreamhere <https://github.com/thp/urlwatch/pull/641>
__. API is experimental and
may change in the near future. - Support for
ftp://
URLs to download a file from an ftp server
Fixed
- Sequential job numbering (skip numbering empty jobs). Suggested by
Markus Weimar <https://github.com/Markus00000>
__ in issue#9 <https://github.com/mborsetti/webchanges/issues/9>
__. - Readthedocs.io failed to build autodoc API documentation
- Error processing jobs with URL/URIs starting with
file:///
Internals
- Improvements of errors and DeprecationWarnings during the processing of job directives and their inclusion in tests
- Additional testing adding 3 percentage points of coverage to 75%
- Temporary database being written during run is now in memory-first (handled by SQLite3) (speed improvement)
- Updated algorithm that assigns a job to a subclass based on directives found
- Migrated to using the
pathlib <https://docs.python.org/3/library/pathlib.html>
__ standard library
Known issues
-
url
jobs withuse_browser: true
(i.e. usingPyppeteer
) will at times display the below error message in
stdout (terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed
in the future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
v3.5.1
Fixed
- Crash in
RuntimeError: dictionary changed size during iteration
with custom headers; updated testing scenarios - Autodoc not building API documentation
v3.5.0
Added
-
New sub-directives to the
strip
filter:chars
: Set of characters to be removed (default: whitespace)side
: One-sided removal, eitherleft
(leading characters) orright
(trailing characters)splitlines
: Whether to apply the filter on each line of text (true/false) (default:false
, i.e. apply to
the entire data)
-
--delete-snapshot
command line argument: Removes the latest saved snapshot of a job from the database; useful
if a change in a website (e.g. layout) requires modifying filters as invalid snapshot can be deleted andwebchanges
rerun to create a truthful diff -
--log-level
command line argument to control the amount of logging displayed by the-v
argument -
ignore_connection_errors
,ignore_timeout_errors
,ignore_too_many_redirects
andignore_http_error_codes
directives now work withurl
jobs havinguse_browser: true
(i.e. usingPyppeteer
)
Changed
- Diff-filter
additions_only
will no longer report additions that consist exclusively of added empty lines
(issue#6 <https://github.com/mborsetti/webchanges/issues/6>
, contributed byFedora7 <https://github.com/Fedora7>
) - Diff-filter
deletions_only
will no longer report deletions that consist exclusively of deleted empty lines - The job's index number is included in error messages for clarity
--smtp-password
now checks that the credentials work with the SMTP server (i.e. logs in)
Fixed
- First run after install was not creating new files correctly (inherited from
urlwatch
); nowwebwatcher
creates
the default directory, config and/or jobs files if not found when running (issue#8 <https://github.com/mborsetti/webchanges/issues/8>
, contributed byrtfgvb01 <https://github.com/rtfgvb01>
) test-diff
command line argument was showing historical diffs in wrong order; now showing most recent first- An error is now raised when a
url
job withuse_browser: true
returns no data due to an HTTP error (e.g.
proxy_authentication_required) - Jobs were included in email subject line even if there was nothing to report after filtering with
additions_only
ordeletions_only
hexdump
filter now correctly formats lines with less than 16 bytessha1sum
andhexdump
filters now accept data that is bytes (not just text)- An error is now raised when a legacy
minidb
database is found but cannot be converted because theminidb
package is not installed - Removed extra unneeded file from being installed
- Wrong ETag was being captured when a URL redirection took place
Internals
Pyppeteer
(url
jobs usinguse_browser: true
) now capture and save the ETag- Snapshot timestamps are more accurate (reflect when the job was launched)
- Each job now has a run-specific unique index_number, which is assigned sequentially when loading jobs, to use in
errors and logs for clarity - Improvements in the function chunking text into numbered lines, which used by certain reporters (e.g. Telegram)
- More tests, increasing code coverage by an additional 7 percentage points to 72% (although keyring testing had to be
dropped due to issues with GitHub Actions) - Additional cleanup of code and documentation
Known issues
-
url
jobs withuse_browser: true
(i.e. usingPyppeteer
) will at times display the below error message in
stdout (terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed
in the future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
v3.4.1
Internals
- Temporary database (
sqlite3
database engine) is copied to permanent one exclusively using SQL code instead of
partially using a Python loop
Known issues
-
url
jobs withuse_browser: true
(i.e. usingPyppeteer
) will at times display the below error message in
stdout (terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed
in the future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
v3.4.0
⚠ Breaking Changes
- Fixed the database from growing unbounded to infinity. Fix only works when running in Python 3.7 or higher and using
the new, default,sqlite3
database engine. In this scenario only the latest 4 snapshots are kept, and older ones
are purged after every run; the number is selectable with the new--max-snapshots
command line argument. To keep
the existing grow-to-infinity behavior, runwebchanges
with--max-snapshots 0
.
Added
--max-snapshots
command line argument sets the number of snapshots to keep stored in the database; defaults to
4. If set to 0 an unlimited number of snapshots will be kept. Only applies to Python 3.7 or higher and only works if
the defaultsqlite3
database is being used.no_redirects
job directive (forurl
jobs) to disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection
(true/false). Suggested bysnowman <https://github.com/snowman>
__ upstreamhere <https://github.com/thp/urlwatch/issues/635>
__.- Reporter
prowl
for theProwl <https://prowlapp.com>
__ push notification client for iOS (only). Contributed
bynitz <https://github.com/nitz>
__ upstream in PR633 <https://github.com/thp/urlwatch/pull/633>
__. - Filter
jq
to parse, transform, and extract ASCII JSON data. Contributed byrobgmills <https://github.com/robgmills>
__ upstream in PR626 <https://github.com/thp/urlwatch/pull/626>
__. - Filter
pretty-xml
as an alternative toformat-xml
(backwards-compatible withurlwatch
2.23) - Alert user when the jobs file contains unrecognized directives (e.g. typo)
Changed
- Job name is truncated to 60 characters when derived from the title of a page (no directive
name
is found in a
url
job) --test-diff
command line argument displays all saved snapshots (no longer limited to 10)
Fixed
- Diff (change) data is no longer lost if
webchanges
is interrupted mid-execution or encounters an error in reporting:
the permanent database is updated only at the very end (after reports are dispatched) use_browser: false
was not being interpreted correctly- Jobs file (e.g.
jobs.yaml
) is now loaded only once per run
Internals
- Database
sqlite3
engine now saves new snapshots to a temporary database, which is copied over to the permanent one
at execution end (i.e. database.close()) - Upgraded SMTP email message internals to use Python's
email.message.EmailMessage <https://docs.python.org/3/library/email.message.html#email.message.EmailMessage>
__ instead ofemail.mime
(obsolete) - Pre-commit documentation linting using
doc8
- Added logging to
sqlite3
database engine - Additional testing increasing overall code coverage by an additional 4 percentage points to 65%
- Renamed legacy module browser.py to jobs_browser.py for clarity
- Renamed class JobsYaml to YamlJobsStorage for consistency and clarity
Known issues
-
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout
(terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed in the
future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
v3.2.6
Changed
- Tweaked colors (esp. green) of HTML reporter to work with Dark Mode
- Restored API documentation using Sphinx's autodoc (removed in 3.2.4 as it was not building correctly)
Internal
- Replaced custom atomic_rename function with built-in
os.replace() <https://docs.python.org/3/library/os.html#os.replace>
__ (new in Python 3.3) that does the same thing - Added type hinting to the entire code
- Added new tests, increasing coverage to 57%
- GitHub Actions CI now runs faster as it's set to cache required packages from prior runs
Known issues
-
Discovered that upstream (legacy)
urlwatch
2.22 code has the database growing to infinity; runwebchanges --clean-cache
periodically to discard old snapshots until this is addressed in a future release -
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout
(terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed in the
future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved
v3.2.4
.. Categories used (in order):
⚠ Breaking Changes for changes that break existing functionality.
Added for new features.
Changed for changes in existing functionality.
Deprecated for soon-to-be removed features.
Removed for now removed features.
Fixed for any bug fixes.
Security in case of vulnerabilities.
Internals for changes that don't affect users.
Added
- Job directive
note
: adds a freetext note appearing in the report after the job header - Job directive
wait_for_navigation
for URL jobs withuse_browser: true
(i.e. using Pyppeteer): wait for
navigation to reach a URL starting with the specified one before extracting content. Useful when the URL redirects
elsewhere before displaying content you're interested in and Pyppeteer would capture the intermediate page. - Command line switch
--rollback-cache TIMESTAMP
: rollback the snapshot database to a previous time, useful when
you miss notifications; seehere <https://webchanges.readthedocs.io/en/stable/cli.html#rollback-cache>
__ - Command line switch
--cache-engine ENGINE
: specifyminidib
to continue using the database structure used
in prior versions andurlwatch
2. Defaultsqlite3
creates a smaller database due to data compression with
msgpack <https://msgpack.org/index.html>
__; migration from old minidb database is done automatically and the old
database preserved for manual deletion - Job directive
block_elements
for URL jobs withuse_browser: true
(i.e. using Pyppeteer) (⚠ ignored in Python
< 3.7) (experimental feature): specifyresource types <https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/ResourceType>
__ (elements) to
skip requesting (downloading) in order to speed up retrieval of the content; only resource typessupported by Chromium <https://developer.chrome.com/docs/extensions/reference/webRequest/#type-ResourceType>
__ are allowed
(typical list includesstylesheet
,font
,image
, andmedia
). ⚠ On certain sites it seems to totally
freeze execution; test before use.
Changes
- A new, more efficient indexed database is used and only the most recent saved snapshot is migrated the first time you
run this version. This has no effect on the ordinary use of the program other than reducing the number of historical
results from--test-diffs
util more snapshots are captured. To continue using the legacy database format, launch
withdatabase-engine minidb
and ensure that the packageminidb
is installed. - If any jobs have
use_browser: true
(i.e. are using Pyppeteer), the maximum number of concurrent threads is set to
the number of available CPUs instead of thedefault <https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor>
__ to avoid
instability due to Pyppeteer's high usage of CPU - Default configuration now specifies the use of Chromium revisions equivalent to Chrome 89.0.4389.72 827102
for URL jobs withuse_browser: true
(i.e. using Pyppeteer) to increase stability. Note: if you already have a
configuration file and want to upgrade to this version, seehere <https://webchanges.readthedocs.io/en/stable/advanced.html#using-a-chromium-revision-matching-a-google-chrome-chromium-release>
__
The Chromium revisions used now are 'linux': 843831, 'win64': 843846, 'win32': 843832, and 'macos': 843846. - Temporarily removed code autodoc from the documentation as it's wasn't building correctly
Fixed
- Specifying
chromium_revision
had no effect (bug introduced in version 3.1.0) - Improved the text of the error message when
jobs.yaml
has a mistake in the job parameters
Internals
- Removed dependency on
minidb
package and are now directly using Python's built-insqlite3
without additional
layer allowing for better control and increased functionality - Database is now smaller due to data compression with
msgpack <https://msgpack.org/index.html>
__ - An old schema database is automatically detected and the last snapshot for each job will be migrated to the new one,
preserving the old database file for manual deletion - No longer backing up database to
*.bak
(introduced in version 3.0.0) now that it can be rolled back - New command line argument
--database-engine
allows selecting engine and acceptssqlite3
(default),
minidb
(legacy compatibility, requires package by the same name) andtextfiles
(creates a text file of the
latest snapshot for each job) - When running in Python 3.7 or higher, jobs with
use_browser: true
(i.e. using Pyppeteer) are a bit more reliable
as they are now launched usingasyncio.run()
, and therefore Python takes care of managing the asyncio event loop,
finalizing asynchronous generators, and closing the threadpool, tasks that previously were handled by custom code - 11 percentage point increase in code testing coverage, now also testing jobs that retrieve content from the internet
and (for Python 3.7 and up) use Pyppeteer
Known issues
-
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout
(terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed in the
future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved