Skip to content

Releases: mborsetti/webchanges

v3.17.2

11 Dec 17:44
Compare
Choose a tag to compare

Fixed

  • Exception in error handling when requests is not installed (reported by
    yubiuser <https://github.com/yubiuser>__ in #66 <https://github.com/mborsetti/webchanges/issues/66>__).

v3.17.1

10 Dec 20:19
Compare
Choose a tag to compare

Fixed

  • Removed dependency on requests library inadvertently left behind (reported by
    yubiuser <https://github.com/yubiuser>__ in #65 <https://github.com/mborsetti/webchanges/issues/65>__).

v3.17

10 Dec 04:23
Compare
Choose a tag to compare

Added

  • You can now specify a reporter name after the command line argument --errors to send the output to the reporter
    specified. For example, to be notified by email of any jobs that result in an error or who, after filtering,
    return no data (indicating they may no longer be monitoring resources as expected), run webchanges --errors email.
  • You can now suppress the footer in an html report using the new footer: false sub-directive in
    config.yaml (same as the one already existing with text and markdown).

Internal

  • Fixed a regression on the default User-Agent header for url jobs with the use_browser: true directive.

v3.16

07 Dec 18:23
Compare
Choose a tag to compare

Added

  • The HTTP/2 network protocol (the same used by major browsers) is now used in url jobs. This allows the
    monitoring of certain websites who block requests made with older protocols like HTTP/1.1. This is implemented by
    using the HTTPX and h2 HTTP client libraries instead of the requests one used previously.

    Notes:

    • Handling of data served by sites whose encoding is misconfigured is done slightly differently by HTTPX, and if
      you newly encounter instances where extended characters are rendered as try adding encoding: ISO-8859-1 to that job.
    • To revert to the use of the requests HTTP client library, use the new job sub-directive http_client: requests (in individual jobs or in the configuration file for all url jobs) and install requests by
      running pip install --upgrade webchanges[requests].
    • If the system is misconfigured and the HTTPX HTTP client library is not found, an attempt to use the
      requests one will be made. This behaviour is transitional and will be removed in the future.
    • HTTP/2 is theoretically faster than HTTP/1.1 and preliminary testing confirmed this.
  • New pypdf filter to convert pdf to text without having to separately install OS dependencies. If you're
    using pdf2text (and its OS dependencies), I suggest you switch to pypdf as it's much faster; however do note
    that the raw and physical sub-directives are not supported. Install the required library by running pip install --upgrade webchanges[pypdf].

  • New absolute_links filter to convert relative links in HTML <a> tags to absolute ones. This filter is not
    needed if you are already using the beautify or html2text filters. Requested by pawelpbm in issue #62.

  • New {jobs_files} substitution for the subject of the email reporter. This will be replaced by the
    name of the jobs file(s) different than the default jobs.yaml in parentheses, with a prefix of jobs- in the
    name removed. To use, replace the subject line for your reporter(s) in config.yaml with e.g. [webchanges] {count} changes{jobs_files}: {jobs}.

  • html reports now have a configurable title to set the HTML document title, defaulting to
    [webchanges] {count} changes{jobs_files}: {jobs}.

  • Added reference to a Docker implementation to the documentation (contributed by yubiuser in #64).

Changed

  • url jobs will use the HTTPX library instead of requests` if it's installed since it uses the HTTP/2 network protocol (when the h2` library is also installed) as browsers do. To revert to the use of requests even if
    ``HTTPX`` is installed on the system, add ``http_client: requests`` to the relevant jobs or make it a default by
    editing the configuration file to add the sub-directive ``http_client: requests`` for ``url`` jobs under
    ``job_defaults``.
  • The beautify filter converts relative links to absolute ones; use the new absolute_links: false
    sub-directive to disable.

Internal

  • Removed transitional support for beautifulsoup <4.11 library (i.e. older than 7 April 2022) for the beautify
    filter.
  • Removed dependency on the requests library and its own dependency on the urllib3 library.
  • Code cleanup, including removing support for Python 3.8.

v3.15

26 Oct 02:13
Compare
Choose a tag to compare

Added

  • Support for Python 3.12.
  • data_as_json job directive for url jobs to indicate that data entered as a dict should be
    serialized as JSON instead of urlencoded and, if missing, the header Content-Type set to application/json
    instead of application/x-www-form-urlencoded.

Changed

  • Improved error handling and documentation on the need of an external install when using parser: html5lib with the
    bs4 method of the html2text filter and added html5lib as an optional dependency keyword (thanks to
    101Dude <https://github.com/101Dude>'s report in 59 <https://github.com/mborsetti/webchanges/issues/59>).

Removed

  • Support for Python 3.8. A reminder that older Python versions are supported for 3 years after being obsoleted by a
    new major release (i.e. about 4 years since their original release).

Internals

  • Upgraded build environment to use the build frontend and pyproject.toml, eliminating setup.py.
  • Migrated to pyproject.toml the configuration of all tools who support it.
  • Increased the default timeout for url jobs with use_browser: true (i.e. using Playwright) to 120 seconds.

v.3.14

01 Sep 17:04
Compare
Choose a tag to compare

Notice

Support for Python 3.8 will be removed on or about 5 October 2023. A reminder that older Python versions are
supported for 3 years after being obsoleted by a new major release (i.e. about 4 years since their original release).

Added

  • When running in verbose (-v) mode, if a url job with use_browser: true fails with a Playwright error,
    capture and save in the temporary folder a screenshot, a full page image, and the HTML contents of the page at the
    moment of the error (see log file for filenames).

v3.13

28 Aug 21:56
Compare
Choose a tag to compare

Notice

Support for Python 3.8 will be removed on or about 5 October 2023. A reminder that older Python versions are
supported for 3 years after being obsoleted by a new major release (i.e. about 4 years since their original release).

Added

  • Reports have a new separate configuration option to split reports into one-per-job.

  • url jobs without use_browser have a new retries directive to specify the number of times to retry a
    job that errors before giving up. Using retries: 1 or higher will often solve the ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) error received from a misconfigured server at the first
    connection.

  • remove_duplicates filter has a new adjacent sub-directive to de-duplicate non-adjacent lines or items.

  • css and xpath have a new sort subfilter to sort matched elements lexicographically.

  • Command line arguments:

    • New --footnote to add a custom footnote to reports.
    • New --change-location to keep job history when the url or command changes.
    • --gc-database and --clean-database now have optional argument RETAIN-LIMIT to allow increasing
      the number of retained snapshots from the default of 1.
    • New --detailed-versions to display detailed version and system information, inclusive of the versions of
      dependencies and, in certain Linux distributions (e.g. Debian), of system libraries. It also reports available
      memory and disk space.

Changed

  • command jobs now have improved error reporting which includes the error text from the failed command.
  • --rollback-database now confirms the date (in ISO-8601 format) to roll back the database to and, if
    webchanges is being run in interactive mode, the user will be asked for positive confirmation before proceeding
    with the un-reversible deletion.

Internals

  • Added bandit <https://github.com/PyCQA/bandit>__ testing to improve the security of code.
  • headers are now turned into strings before being passed to Playwright (addresses the error
    playwright._impl._api_types.Error: extraHTTPHeaders[13].value: expected string, got number).
  • Exclude tests from being recognized as package during build (contributed by Max <https://github.com/aragon999>__ in #54 <https://github.com/mborsetti/webchanges/pull/54>__).
  • Refactored and cleaned up some tests.
  • Initial testing with Python 3.12.0-rc1, but a reported bug in typing.TypeVar prevents the pyee dependency
    of playwright from loading, causing a failure. Awaiting for fix in Python 3.12.0-rc2 to retry.

v3.12

19 Nov 00:59
Compare
Choose a tag to compare

Added

  • Support for Python 3.11. Please note that the dependency lxml may fail to install on Windows due to
    this <https://bugs.launchpad.net/lxml/+bug/1977998>__ bug and that therefore for now webchanges can only be
    run in Python 3.10 on Windows.

Removed

  • Support for Python 3.7. As a reminder, older Python versions are supported for 3 years after being obsoleted by a new
    major release; support for Python 3.8 will be removed on or about 5 October 2023.

Fixed

  • Job sorting for reports is now case-insensitive.
  • Documentation on how to anonymously monitor GitHub releases (due to changes in GitHub) (contributed by Luis Aranguren <https://github.com/mercurytoxic>__ upstream <https://github.com/thp/urlwatch/issues/723>__).
  • Handling of method subfilter for filter html2text (reported by kongomondo <https://github.com/kongomondo>__
    upstream <https://github.com/thp/urlwatch/issues/588>__).

v3.11

25 Sep 12:44
Compare
Choose a tag to compare

Notice

Support for Python 3.7 will be removed on or about 22 October 2022 as older Python versions are supported for 3
years after being obsoleted by a new major release.

Added

  • The new no_conditional_request directive for url jobs turns off conditional requests for those extremely rare
    websites that don't handle it (e.g. Google Flights).
  • Selecting the database engine and the maximum number of changed snapshots saved is now set through the configuration
    file, and the command line arguments --database-engine and --max-snapshots are used to override such
    settings. See documentation for more information. Suggested by jprokos <https://github.com/jprokos>__ in #43 <https://github.com/mborsetti/webchanges/issues/43>__.
  • New configuration setting empty-diff within the display configuration for backwards compatibility only:
    use the additions_only job directive instead to achieve the same result. Reported by
    bbeevvoo <https://github.com/bbeevvoo>__ in #47 <https://github.com/mborsetti/webchanges/issues/47>__.
  • Aliased the command line arguments --gc-cache with --gc-database, --clean-cache with --clean-database
    and --rollback-cache with --rollback-database for clarity.
  • The configuration file (e.g. conf.yaml) can now contain keys starting with a _ (underscore) for remarks (they
    are ignored).

Changed

  • Reports are now sorted alphabetically and therefore you can use the name directive to affect the order by which
    your jobs are displayed in reports.
  • Implemented measures for url jobs using browser: true to avoid being detected: webchanges now passes all
    the headless Chrome detection tests here <https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html>.
    Brought to my attention by amammad <https://github.com/amammad>
    in #45 <https://github.com/mborsetti/webchanges/issues/45>__.
  • Running webchanges --test (without specifying a JOB) will now check the hooks file (if any) for syntax errors in
    addition to the config and jobs file. Error reporting has also been improved.
  • No longer showing the the text returned by the server when a 404 - Not Found error HTTP status code is returned by for
    all url jobs (previously only for jobs with use_browser: true).

Fixed

  • Bug in command line arguments --config and --hooks. Contributed by
    Klaus Sperner <https://github.com/klaus-tux>__ in PR #46 <https://github.com/mborsetti/webchanges/pull/46>__.
  • Job directive compared_versions now works as documented and testing has been added to the test suite. Reported by
    jprokos <https://github.com/jprokos>__ in #43 <https://github.com/mborsetti/webchanges/issues/43>__.
  • The output of command line argument --test-diff now takes into consideration compared_versions.
  • Markdown containing code in a link text now converts correctly in HTML reports.

Internals

  • The job kind of shell has been renamed command to better reflect what it does and the way it's described
    in the documentation, but shell is still recognized for backward compatibility.
  • Readthedocs build upgraded to Python 3.10

v3.10.3

11 Jul 22:49
Compare
Choose a tag to compare

Added

  • URL jobs with use_browser: true that receive an error HTTP status code from the server will now include the text
    returned by the website in the error message (e.g. "Rate exceeded.", "upstream request timeout", etc.), except for
    HTTP status code 404 - Not Found.

Changed

  • The command line argument --jobs used to specify a jobs file will now accept a glob pattern <https://en.wikipedia.org/wiki/Glob_(programming)>__, e.g. wildcards, to specify multiple files. If more than one
    file matches the pattern, their contents will be concatenated before a job list is built. Useful e.g. if you have
    multiple jobs files that run on different schedules and you want to clean the snapshot database of URLs/commands no
    longer monitored ("garbage collect") using --gc-cache.
  • The command line argument --list will now list the full path of the jobs file(s).
  • Traceback information for Python Exceptions is suppressed by default. Use the command line argument --verbose
    (or -v) to display it.

Fixed

  • Fixed Unicode strings with encoding declaration are not supported. error in the xpath filter using
    method: xml under certain conditions (MacOS only). Reported by jprokos <https://github.com/jprokos>__ in #42 <https://github.com/mborsetti/webchanges/issues/42>__.

Internals

  • The source distribution is now available on PyPI to support certain packagers like fpm.
  • Improved handling and reporting of Playwrigt browser errors (for URL jobs with use_browser: true).