Skip to content

Releases: mborsetti/webchanges

v3.7.1

27 Jun 10:28
Compare
Choose a tag to compare

⚠ Breaking Changes

  • Removed Python 3.6 support to simplify code. Older Python versions are supported for 3 years after being obsoleted by
    a new major release; as Python 3.7 was released on 27 June 2018, the last date of Python 3.6 support was 26 June 2021

Changed

  • Improved telegram reporter now uses MarkdownV2 and preserves most formatting of HTML sites processed by the
    html2text filter, e.g. clickable links, bolding, underlining, italics and strikethrough

Added

  • New filter execute to filter the data using an executable without invoking the shell (as shellpipe does)
    and therefore exposing to additional security risks
  • New sub-directive silent for telegram reporter to receive a notification with no sound (true/false) (default:
    false)
  • Github Issues templates for bug reports and feature requests

Fixed

  • Job headers stored in the configuration file (config.yaml) are now merged correctly and case-insensitively
    with those present in the job (in jobs.yaml). A header in the job replaces a header by the same name if already
    present in the configuration file, otherwise is added to the ones present in the configuration file.
  • Fixed TypeError: expected string or bytes-like object error in cookiejar (called by requests module) caused by
    some cookies being read from the jobs YAML file in other formats

Internals

  • Strengthened security with bandit <https://pypi.org/project/bandit/>__ to catch common security issues
  • Standardized code formatting with black <https://pypi.org/project/black/>__
  • Improved pre-commit speed by using local libraries when practical
  • More improvements to type hinting (moving towards testing with mypy <https://pypi.org/project/mypy/>__)
  • Removed module jobs_browser.py (needed only for Python 3.6)

v3.7.0

27 Jun 06:20
Compare
Choose a tag to compare

⚠ Breaking Changes

  • Removed Python 3.6 support to simplify code. Older Python versions are supported for 3 years after being obsoleted by
    a new major release; as Python 3.7 was released on 27 June 2018, the last date of Python 3.6 support was 26 June 2021

Changed

  • Improved telegram reporter now uses MarkdownV2 and preserves most formatting of HTML sites processed by the
    html2text filter, e.g. clickable links, bolding, underlining, italics and strikethrough

Added

  • New filter execute to filter the data using an executable without invoking the shell (as shellpipe does)
    and therefore exposing to additional security risks
  • New sub-directive silent for telegram reporter to receive a notification with no sound (true/false) (default:
    false)
  • Github Issues templates for bug reports and feature requests

Fixed

  • Job headers stored in the configuration file (config.yaml) are now merged correctly and case-insensitively
    with those present in the job (in jobs.yaml). A header in the job replaces a header by the same name if already
    present in the configuration file, otherwise is added to the ones present in the configuration file.
  • Fixed TypeError: expected string or bytes-like object error in cookiejar (called by requests module) caused by
    some cookies being read from the jobs YAML file in other formats

Internals

  • Strengthened security with bandit <https://pypi.org/project/bandit/>__ to catch common security issues
  • Standardized code formatting with black <https://pypi.org/project/black/>__
  • Improved pre-commit speed by using local libraries when practical
  • More improvements to type hinting (moving towards testing with mypy <https://pypi.org/project/mypy/>__)
  • Removed module jobs_browser.py (needed only for Python 3.6)

v3.6.1

28 May 09:31
Compare
Choose a tag to compare

Reminder

Older Python versions are supported for 3 years after being obsoleted by a new major release. As Python 3.7 was
released on 7 June 2018, the codebase will be streamlined by removing support for Python 3.6 on or after 7 June 2021.

Added

  • Clearer results messages for --delete-snapshot command line argument

Fixed

  • First run would fail when creating new config.yaml file. Thanks to David <https://github.com/notDavid>__ in
    issue #10 <https://github.com/mborsetti/webchanges/issues/10>__.
  • Use same duration precision in all reports

v3.6.0

14 May 13:19
Compare
Choose a tag to compare

Added

  • Run a subset of jobs by adding their index number(s) as command line arguments. For example, run webchanges 2 3 to
    only run jobs #2 and #3 of your jobs list. Run webchanges --list to find the job numbers. Suggested by Dan Brown <https://github.com/dbro>__ upstream here <https://github.com/thp/urlwatch/pull/641>__. API is experimental and
    may change in the near future.
  • Support for ftp:// URLs to download a file from an ftp server

Fixed

  • Sequential job numbering (skip numbering empty jobs). Suggested by Markus Weimar <https://github.com/Markus00000>__ in issue #9 <https://github.com/mborsetti/webchanges/issues/9>__.
  • Readthedocs.io failed to build autodoc API documentation
  • Error processing jobs with URL/URIs starting with file:///

Internals

  • Improvements of errors and DeprecationWarnings during the processing of job directives and their inclusion in tests
  • Additional testing adding 3 percentage points of coverage to 75%
  • Temporary database being written during run is now in memory-first (handled by SQLite3) (speed improvement)
  • Updated algorithm that assigns a job to a subclass based on directives found
  • Migrated to using the pathlib <https://docs.python.org/3/library/pathlib.html>__ standard library

Known issues

  • url jobs with use_browser: true (i.e. using Pyppeteer) will at times display the below error message in
    stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed
    in the future (see Pyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>__):

    future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
    pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
    Future exception was never retrieved

v3.5.1

06 May 01:58
Compare
Choose a tag to compare

Fixed

  • Crash in RuntimeError: dictionary changed size during iteration with custom headers; updated testing scenarios
  • Autodoc not building API documentation

v3.5.0

04 May 14:48
Compare
Choose a tag to compare

Added

  • New sub-directives to the strip filter:

    • chars: Set of characters to be removed (default: whitespace)
    • side: One-sided removal, either left (leading characters) or right (trailing characters)
    • splitlines: Whether to apply the filter on each line of text (true/false) (default: false, i.e. apply to
      the entire data)
  • --delete-snapshot command line argument: Removes the latest saved snapshot of a job from the database; useful
    if a change in a website (e.g. layout) requires modifying filters as invalid snapshot can be deleted and webchanges
    rerun to create a truthful diff

  • --log-level command line argument to control the amount of logging displayed by the -v argument

  • ignore_connection_errors, ignore_timeout_errors, ignore_too_many_redirects and ignore_http_error_codes
    directives now work with url jobs having use_browser: true (i.e. using Pyppeteer)

Changed

  • Diff-filter additions_only will no longer report additions that consist exclusively of added empty lines
    (issue #6 <https://github.com/mborsetti/webchanges/issues/6>, contributed by Fedora7 <https://github.com/Fedora7>)
  • Diff-filter deletions_only will no longer report deletions that consist exclusively of deleted empty lines
  • The job's index number is included in error messages for clarity
  • --smtp-password now checks that the credentials work with the SMTP server (i.e. logs in)

Fixed

  • First run after install was not creating new files correctly (inherited from urlwatch); now webwatcher creates
    the default directory, config and/or jobs files if not found when running (issue #8 <https://github.com/mborsetti/webchanges/issues/8>, contributed by rtfgvb01 <https://github.com/rtfgvb01>)
  • test-diff command line argument was showing historical diffs in wrong order; now showing most recent first
  • An error is now raised when a url job with use_browser: true returns no data due to an HTTP error (e.g.
    proxy_authentication_required)
  • Jobs were included in email subject line even if there was nothing to report after filtering with additions_only
    or deletions_only
  • hexdump filter now correctly formats lines with less than 16 bytes
  • sha1sum and hexdump filters now accept data that is bytes (not just text)
  • An error is now raised when a legacy minidb database is found but cannot be converted because the minidb
    package is not installed
  • Removed extra unneeded file from being installed
  • Wrong ETag was being captured when a URL redirection took place

Internals

  • Pyppeteer (url jobs using use_browser: true) now capture and save the ETag
  • Snapshot timestamps are more accurate (reflect when the job was launched)
  • Each job now has a run-specific unique index_number, which is assigned sequentially when loading jobs, to use in
    errors and logs for clarity
  • Improvements in the function chunking text into numbered lines, which used by certain reporters (e.g. Telegram)
  • More tests, increasing code coverage by an additional 7 percentage points to 72% (although keyring testing had to be
    dropped due to issues with GitHub Actions)
  • Additional cleanup of code and documentation

Known issues

  • url jobs with use_browser: true (i.e. using Pyppeteer) will at times display the below error message in
    stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed
    in the future (see Pyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>__):

    future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
    pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
    Future exception was never retrieved

v3.4.1

17 Apr 06:45
Compare
Choose a tag to compare

Internals

  • Temporary database (sqlite3 database engine) is copied to permanent one exclusively using SQL code instead of
    partially using a Python loop

Known issues

  • url jobs with use_browser: true (i.e. using Pyppeteer) will at times display the below error message in
    stdout (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed
    in the future (see Pyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>__):

    future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
    pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
    Future exception was never retrieved

v3.4.0

13 Apr 04:03
Compare
Choose a tag to compare

⚠ Breaking Changes

  • Fixed the database from growing unbounded to infinity. Fix only works when running in Python 3.7 or higher and using
    the new, default, sqlite3 database engine. In this scenario only the latest 4 snapshots are kept, and older ones
    are purged after every run; the number is selectable with the new --max-snapshots command line argument. To keep
    the existing grow-to-infinity behavior, run webchanges with --max-snapshots 0.

Added

  • --max-snapshots command line argument sets the number of snapshots to keep stored in the database; defaults to
    4. If set to 0 an unlimited number of snapshots will be kept. Only applies to Python 3.7 or higher and only works if
    the default sqlite3 database is being used.
  • no_redirects job directive (for url jobs) to disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection
    (true/false). Suggested by snowman <https://github.com/snowman>__ upstream here <https://github.com/thp/urlwatch/issues/635>__.
  • Reporter prowl for the Prowl <https://prowlapp.com>__ push notification client for iOS (only). Contributed
    by nitz <https://github.com/nitz>__ upstream in PR 633 <https://github.com/thp/urlwatch/pull/633>__.
  • Filter jq to parse, transform, and extract ASCII JSON data. Contributed by robgmills <https://github.com/robgmills>__ upstream in PR 626 <https://github.com/thp/urlwatch/pull/626>__.
  • Filter pretty-xml as an alternative to format-xml (backwards-compatible with urlwatch 2.23)
  • Alert user when the jobs file contains unrecognized directives (e.g. typo)

Changed

  • Job name is truncated to 60 characters when derived from the title of a page (no directive name is found in a
    url job)
  • --test-diff command line argument displays all saved snapshots (no longer limited to 10)

Fixed

  • Diff (change) data is no longer lost if webchanges is interrupted mid-execution or encounters an error in reporting:
    the permanent database is updated only at the very end (after reports are dispatched)
  • use_browser: false was not being interpreted correctly
  • Jobs file (e.g. jobs.yaml) is now loaded only once per run

Internals

  • Database sqlite3 engine now saves new snapshots to a temporary database, which is copied over to the permanent one
    at execution end (i.e. database.close())
  • Upgraded SMTP email message internals to use Python's email.message.EmailMessage <https://docs.python.org/3/library/email.message.html#email.message.EmailMessage>__ instead of email.mime
    (obsolete)
  • Pre-commit documentation linting using doc8
  • Added logging to sqlite3 database engine
  • Additional testing increasing overall code coverage by an additional 4 percentage points to 65%
  • Renamed legacy module browser.py to jobs_browser.py for clarity
  • Renamed class JobsYaml to YamlJobsStorage for consistency and clarity

Known issues

  • url jobs with use_browser: true (i.e. using Pyppeteer) will at times display the below error message in stdout
    (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the
    future (see Pyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>__):

    future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
    pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
    Future exception was never retrieved

v3.2.6

26 Mar 13:44
Compare
Choose a tag to compare

Changed

  • Tweaked colors (esp. green) of HTML reporter to work with Dark Mode
  • Restored API documentation using Sphinx's autodoc (removed in 3.2.4 as it was not building correctly)

Internal

  • Replaced custom atomic_rename function with built-in os.replace() <https://docs.python.org/3/library/os.html#os.replace>__ (new in Python 3.3) that does the same thing
  • Added type hinting to the entire code
  • Added new tests, increasing coverage to 57%
  • GitHub Actions CI now runs faster as it's set to cache required packages from prior runs

Known issues

  • Discovered that upstream (legacy) urlwatch 2.22 code has the database growing to infinity; run webchanges --clean-cache periodically to discard old snapshots until this is addressed in a future release

  • url jobs with use_browser: true (i.e. using Pyppeteer) will at times display the below error message in stdout
    (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the
    future (see Pyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>__):

    future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
    pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
    Future exception was never retrieved

v3.2.4

08 Mar 05:29
Compare
Choose a tag to compare

.. Categories used (in order):
⚠ Breaking Changes for changes that break existing functionality.
Added for new features.
Changed for changes in existing functionality.
Deprecated for soon-to-be removed features.
Removed for now removed features.
Fixed for any bug fixes.
Security in case of vulnerabilities.
Internals for changes that don't affect users.

Added

  • Job directive note: adds a freetext note appearing in the report after the job header
  • Job directive wait_for_navigation for URL jobs with use_browser: true (i.e. using Pyppeteer): wait for
    navigation to reach a URL starting with the specified one before extracting content. Useful when the URL redirects
    elsewhere before displaying content you're interested in and Pyppeteer would capture the intermediate page.
  • Command line switch --rollback-cache TIMESTAMP: rollback the snapshot database to a previous time, useful when
    you miss notifications; see here <https://webchanges.readthedocs.io/en/stable/cli.html#rollback-cache>__
  • Command line switch --cache-engine ENGINE: specify minidib to continue using the database structure used
    in prior versions and urlwatch 2. Default sqlite3 creates a smaller database due to data compression with
    msgpack <https://msgpack.org/index.html>__; migration from old minidb database is done automatically and the old
    database preserved for manual deletion
  • Job directive block_elements for URL jobs with use_browser: true (i.e. using Pyppeteer) (⚠ ignored in Python
    < 3.7) (experimental feature): specify resource types <https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/ResourceType>__ (elements) to
    skip requesting (downloading) in order to speed up retrieval of the content; only resource types supported by Chromium <https://developer.chrome.com/docs/extensions/reference/webRequest/#type-ResourceType>__ are allowed
    (typical list includes stylesheet, font, image, and media). ⚠ On certain sites it seems to totally
    freeze execution; test before use.

Changes

  • A new, more efficient indexed database is used and only the most recent saved snapshot is migrated the first time you
    run this version. This has no effect on the ordinary use of the program other than reducing the number of historical
    results from --test-diffs util more snapshots are captured. To continue using the legacy database format, launch
    with database-engine minidb and ensure that the package minidb is installed.
  • If any jobs have use_browser: true (i.e. are using Pyppeteer), the maximum number of concurrent threads is set to
    the number of available CPUs instead of the default <https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor>__ to avoid
    instability due to Pyppeteer's high usage of CPU
  • Default configuration now specifies the use of Chromium revisions equivalent to Chrome 89.0.4389.72 827102
    for URL jobs with use_browser: true (i.e. using Pyppeteer) to increase stability. Note: if you already have a
    configuration file and want to upgrade to this version, see here <https://webchanges.readthedocs.io/en/stable/advanced.html#using-a-chromium-revision-matching-a-google-chrome-chromium-release>__
    The Chromium revisions used now are 'linux': 843831, 'win64': 843846, 'win32': 843832, and 'macos': 843846.
  • Temporarily removed code autodoc from the documentation as it's wasn't building correctly

Fixed

  • Specifying chromium_revision had no effect (bug introduced in version 3.1.0)
  • Improved the text of the error message when jobs.yaml has a mistake in the job parameters

Internals

  • Removed dependency on minidb package and are now directly using Python's built-in sqlite3 without additional
    layer allowing for better control and increased functionality
  • Database is now smaller due to data compression with msgpack <https://msgpack.org/index.html>__
  • An old schema database is automatically detected and the last snapshot for each job will be migrated to the new one,
    preserving the old database file for manual deletion
  • No longer backing up database to *.bak (introduced in version 3.0.0) now that it can be rolled back
  • New command line argument --database-engine allows selecting engine and accepts sqlite3 (default),
    minidb (legacy compatibility, requires package by the same name) and textfiles (creates a text file of the
    latest snapshot for each job)
  • When running in Python 3.7 or higher, jobs with use_browser: true (i.e. using Pyppeteer) are a bit more reliable
    as they are now launched using asyncio.run(), and therefore Python takes care of managing the asyncio event loop,
    finalizing asynchronous generators, and closing the threadpool, tasks that previously were handled by custom code
  • 11 percentage point increase in code testing coverage, now also testing jobs that retrieve content from the internet
    and (for Python 3.7 and up) use Pyppeteer

Known issues

  • url jobs with use_browser: true (i.e. using Pyppeteer) will at times display the below error message in stdout
    (terminal console). This does not affect webchanges as all data is downloaded, and hopefully it will be fixed in the
    future (see Pyppeteer issue #225 <https://github.com/pyppeteer/pyppeteer/issues/225>__):

    future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
    pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
    Future exception was never retrieved