v3.2.4
.. Categories used (in order):
⚠ Breaking Changes for changes that break existing functionality.
Added for new features.
Changed for changes in existing functionality.
Deprecated for soon-to-be removed features.
Removed for now removed features.
Fixed for any bug fixes.
Security in case of vulnerabilities.
Internals for changes that don't affect users.
Added
- Job directive
note
: adds a freetext note appearing in the report after the job header - Job directive
wait_for_navigation
for URL jobs withuse_browser: true
(i.e. using Pyppeteer): wait for
navigation to reach a URL starting with the specified one before extracting content. Useful when the URL redirects
elsewhere before displaying content you're interested in and Pyppeteer would capture the intermediate page. - Command line switch
--rollback-cache TIMESTAMP
: rollback the snapshot database to a previous time, useful when
you miss notifications; seehere <https://webchanges.readthedocs.io/en/stable/cli.html#rollback-cache>
__ - Command line switch
--cache-engine ENGINE
: specifyminidib
to continue using the database structure used
in prior versions andurlwatch
2. Defaultsqlite3
creates a smaller database due to data compression with
msgpack <https://msgpack.org/index.html>
__; migration from old minidb database is done automatically and the old
database preserved for manual deletion - Job directive
block_elements
for URL jobs withuse_browser: true
(i.e. using Pyppeteer) (⚠ ignored in Python
< 3.7) (experimental feature): specifyresource types <https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/ResourceType>
__ (elements) to
skip requesting (downloading) in order to speed up retrieval of the content; only resource typessupported by Chromium <https://developer.chrome.com/docs/extensions/reference/webRequest/#type-ResourceType>
__ are allowed
(typical list includesstylesheet
,font
,image
, andmedia
). ⚠ On certain sites it seems to totally
freeze execution; test before use.
Changes
- A new, more efficient indexed database is used and only the most recent saved snapshot is migrated the first time you
run this version. This has no effect on the ordinary use of the program other than reducing the number of historical
results from--test-diffs
util more snapshots are captured. To continue using the legacy database format, launch
withdatabase-engine minidb
and ensure that the packageminidb
is installed. - If any jobs have
use_browser: true
(i.e. are using Pyppeteer), the maximum number of concurrent threads is set to
the number of available CPUs instead of thedefault <https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor>
__ to avoid
instability due to Pyppeteer's high usage of CPU - Default configuration now specifies the use of Chromium revisions equivalent to Chrome 89.0.4389.72 827102
for URL jobs withuse_browser: true
(i.e. using Pyppeteer) to increase stability. Note: if you already have a
configuration file and want to upgrade to this version, seehere <https://webchanges.readthedocs.io/en/stable/advanced.html#using-a-chromium-revision-matching-a-google-chrome-chromium-release>
__
The Chromium revisions used now are 'linux': 843831, 'win64': 843846, 'win32': 843832, and 'macos': 843846. - Temporarily removed code autodoc from the documentation as it's wasn't building correctly
Fixed
- Specifying
chromium_revision
had no effect (bug introduced in version 3.1.0) - Improved the text of the error message when
jobs.yaml
has a mistake in the job parameters
Internals
- Removed dependency on
minidb
package and are now directly using Python's built-insqlite3
without additional
layer allowing for better control and increased functionality - Database is now smaller due to data compression with
msgpack <https://msgpack.org/index.html>
__ - An old schema database is automatically detected and the last snapshot for each job will be migrated to the new one,
preserving the old database file for manual deletion - No longer backing up database to
*.bak
(introduced in version 3.0.0) now that it can be rolled back - New command line argument
--database-engine
allows selecting engine and acceptssqlite3
(default),
minidb
(legacy compatibility, requires package by the same name) andtextfiles
(creates a text file of the
latest snapshot for each job) - When running in Python 3.7 or higher, jobs with
use_browser: true
(i.e. using Pyppeteer) are a bit more reliable
as they are now launched usingasyncio.run()
, and therefore Python takes care of managing the asyncio event loop,
finalizing asynchronous generators, and closing the threadpool, tasks that previously were handled by custom code - 11 percentage point increase in code testing coverage, now also testing jobs that retrieve content from the internet
and (for Python 3.7 and up) use Pyppeteer
Known issues
-
url
jobs withuse_browser: true
(i.e. using Pyppeteer) will at times display the below error message in stdout
(terminal console). This does not affectwebchanges
as all data is downloaded, and hopefully it will be fixed in the
future (seePyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>
__):future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Future exception was never retrieved