Releases: mborsetti/webchanges
v3.17.2
Fixed
- Exception in error handling when
requests
is not installed (reported by
yubiuser <https://github.com/yubiuser>
__ in#66 <https://github.com/mborsetti/webchanges/issues/66>
__).
v3.17.1
Fixed
- Removed dependency on
requests
library inadvertently left behind (reported by
yubiuser <https://github.com/yubiuser>
__ in#65 <https://github.com/mborsetti/webchanges/issues/65>
__).
v3.17
Added
- You can now specify a reporter name after the command line argument
--errors
to send the output to the reporter
specified. For example, to be notified by email of any jobs that result in an error or who, after filtering,
return no data (indicating they may no longer be monitoring resources as expected), runwebchanges --errors email
. - You can now suppress the
footer
in anhtml
report using the newfooter: false
sub-directive in
config.yaml
(same as the one already existing withtext
andmarkdown
).
Internal
- Fixed a regression on the default
User-Agent
header forurl
jobs with theuse_browser: true
directive.
v3.16
Added
-
The HTTP/2 network protocol (the same used by major browsers) is now used in
url
jobs. This allows the
monitoring of certain websites who block requests made with older protocols like HTTP/1.1. This is implemented by
using theHTTPX
andh2
HTTP client libraries instead of therequests
one used previously.Notes:
- Handling of data served by sites whose encoding is misconfigured is done slightly differently by
HTTPX
, and if
you newly encounter instances where extended characters are rendered as�
try addingencoding: ISO-8859-1
to that job. - To revert to the use of the
requests
HTTP client library, use the new job sub-directivehttp_client: requests
(in individual jobs or in the configuration file for allurl
jobs) and installrequests
by
runningpip install --upgrade webchanges[requests]
. - If the system is misconfigured and the
HTTPX
HTTP client library is not found, an attempt to use the
requests
one will be made. This behaviour is transitional and will be removed in the future. - HTTP/2 is theoretically faster than HTTP/1.1 and preliminary testing confirmed this.
- Handling of data served by sites whose encoding is misconfigured is done slightly differently by
-
New
pypdf
filter to convert pdf to text without having to separately install OS dependencies. If you're
usingpdf2text
(and its OS dependencies), I suggest you switch topypdf
as it's much faster; however do note
that theraw
andphysical
sub-directives are not supported. Install the required library by runningpip install --upgrade webchanges[pypdf]
. -
New
absolute_links
filter to convert relative links in HTML<a>
tags to absolute ones. This filter is not
needed if you are already using thebeautify
orhtml2text
filters. Requested by pawelpbm in issue #62. -
New
{jobs_files}
substitution for thesubject
of theemail
reporter. This will be replaced by the
name of the jobs file(s) different than the defaultjobs.yaml
in parentheses, with a prefix ofjobs-
in the
name removed. To use, replace thesubject
line for your reporter(s) inconfig.yaml
with e.g.[webchanges] {count} changes{jobs_files}: {jobs}
. -
html
reports now have a configurabletitle
to set the HTML document title, defaulting to
[webchanges] {count} changes{jobs_files}: {jobs}
. -
Added reference to a Docker implementation to the documentation (contributed by yubiuser in #64).
Changed
url
jobs will use theHTTPX
library instead ofrequests` if it's installed since it uses the HTTP/2 network protocol (when the
h2` library is also installed) as browsers do. To revert to the use ofrequests
even if
``HTTPX`` is installed on the system, add ``http_client: requests`` to the relevant jobs or make it a default by
editing the configuration file to add the sub-directive ``http_client: requests`` for ``url`` jobs under
``job_defaults``.- The
beautify
filter converts relative links to absolute ones; use the newabsolute_links: false
sub-directive to disable.
Internal
- Removed transitional support for
beautifulsoup <4.11
library (i.e. older than 7 April 2022) for thebeautify
filter. - Removed dependency on the
requests
library and its own dependency on theurllib3
library. - Code cleanup, including removing support for Python 3.8.
v3.15
Added
- Support for Python 3.12.
data_as_json
job directive forurl
jobs to indicate thatdata
entered as a dict should be
serialized as JSON instead of urlencoded and, if missing, the headerContent-Type
set toapplication/json
instead ofapplication/x-www-form-urlencoded
.
Changed
- Improved error handling and documentation on the need of an external install when using
parser: html5lib
with the
bs4
method of thehtml2text
filter and addedhtml5lib
as an optional dependency keyword (thanks to
101Dude <https://github.com/101Dude>
's report in59 <https://github.com/mborsetti/webchanges/issues/59>
).
Removed
- Support for Python 3.8. A reminder that older Python versions are supported for 3 years after being obsoleted by a
new major release (i.e. about 4 years since their original release).
Internals
- Upgraded build environment to use the
build
frontend andpyproject.toml
, eliminatingsetup.py
. - Migrated to
pyproject.toml
the configuration of all tools who support it. - Increased the default
timeout
forurl
jobs withuse_browser: true
(i.e. using Playwright) to 120 seconds.
v.3.14
Notice
Support for Python 3.8 will be removed on or about 5 October 2023. A reminder that older Python versions are
supported for 3 years after being obsoleted by a new major release (i.e. about 4 years since their original release).
Added
- When running in verbose (
-v
) mode, if aurl
job withuse_browser: true
fails with a Playwright error,
capture and save in the temporary folder a screenshot, a full page image, and the HTML contents of the page at the
moment of the error (see log file for filenames).
v3.13
Notice
Support for Python 3.8 will be removed on or about 5 October 2023. A reminder that older Python versions are
supported for 3 years after being obsoleted by a new major release (i.e. about 4 years since their original release).
Added
-
Reports have a new
separate
configuration option to split reports into one-per-job. -
url
jobs withoutuse_browser
have a newretries
directive to specify the number of times to retry a
job that errors before giving up. Usingretries: 1
or higher will often solve the('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
error received from a misconfigured server at the first
connection. -
remove_duplicates
filter has a newadjacent
sub-directive to de-duplicate non-adjacent lines or items. -
css
andxpath
have a newsort
subfilter to sort matched elements lexicographically. -
Command line arguments:
- New
--footnote
to add a custom footnote to reports. - New
--change-location
to keep job history when theurl
orcommand
changes. --gc-database
and--clean-database
now have optional argumentRETAIN-LIMIT
to allow increasing
the number of retained snapshots from the default of 1.- New
--detailed-versions
to display detailed version and system information, inclusive of the versions of
dependencies and, in certain Linux distributions (e.g. Debian), of system libraries. It also reports available
memory and disk space.
- New
Changed
command
jobs now have improved error reporting which includes the error text from the failed command.--rollback-database
now confirms the date (in ISO-8601 format) to roll back the database to and, if
webchanges is being run in interactive mode, the user will be asked for positive confirmation before proceeding
with the un-reversible deletion.
Internals
- Added
bandit <https://github.com/PyCQA/bandit>
__ testing to improve the security of code. headers
are now turned into strings before being passed to Playwright (addresses the error
playwright._impl._api_types.Error: extraHTTPHeaders[13].value: expected string, got number
).- Exclude tests from being recognized as package during build (contributed by
Max <https://github.com/aragon999>
__ in#54 <https://github.com/mborsetti/webchanges/pull/54>
__). - Refactored and cleaned up some tests.
- Initial testing with Python 3.12.0-rc1, but a reported bug in
typing.TypeVar
prevents thepyee
dependency
ofplaywright
from loading, causing a failure. Awaiting for fix in Python 3.12.0-rc2 to retry.
v3.12
Added
- Support for Python 3.11. Please note that the dependency
lxml
may fail to install on Windows due to
this <https://bugs.launchpad.net/lxml/+bug/1977998>
__ bug and that therefore for now webchanges can only be
run in Python 3.10 on Windows.
Removed
- Support for Python 3.7. As a reminder, older Python versions are supported for 3 years after being obsoleted by a new
major release; support for Python 3.8 will be removed on or about 5 October 2023.
Fixed
- Job sorting for reports is now case-insensitive.
- Documentation on how to anonymously monitor GitHub releases (due to changes in GitHub) (contributed by
Luis Aranguren <https://github.com/mercurytoxic>
__upstream <https://github.com/thp/urlwatch/issues/723>
__). - Handling of
method
subfilter for filterhtml2text
(reported bykongomondo <https://github.com/kongomondo>
__
upstream <https://github.com/thp/urlwatch/issues/588>
__).
v3.11
Notice
Support for Python 3.7 will be removed on or about 22 October 2022 as older Python versions are supported for 3
years after being obsoleted by a new major release.
Added
- The new
no_conditional_request
directive forurl
jobs turns off conditional requests for those extremely rare
websites that don't handle it (e.g. Google Flights). - Selecting the database engine and the maximum number of changed snapshots saved is now set through the configuration
file, and the command line arguments--database-engine
and--max-snapshots
are used to override such
settings. See documentation for more information. Suggested byjprokos <https://github.com/jprokos>
__ in#43 <https://github.com/mborsetti/webchanges/issues/43>
__. - New configuration setting
empty-diff
within thedisplay
configuration for backwards compatibility only:
use theadditions_only
job directive instead to achieve the same result. Reported by
bbeevvoo <https://github.com/bbeevvoo>
__ in#47 <https://github.com/mborsetti/webchanges/issues/47>
__. - Aliased the command line arguments
--gc-cache
with--gc-database
,--clean-cache
with--clean-database
and--rollback-cache
with--rollback-database
for clarity. - The configuration file (e.g.
conf.yaml
) can now contain keys starting with a_
(underscore) for remarks (they
are ignored).
Changed
- Reports are now sorted alphabetically and therefore you can use the
name
directive to affect the order by which
your jobs are displayed in reports. - Implemented measures for
url
jobs usingbrowser: true
to avoid being detected: webchanges now passes all
the headless Chrome detection testshere <https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html>
.
Brought to my attention byamammad <https://github.com/amammad>
in#45 <https://github.com/mborsetti/webchanges/issues/45>
__. - Running
webchanges --test
(without specifying a JOB) will now check the hooks file (if any) for syntax errors in
addition to the config and jobs file. Error reporting has also been improved. - No longer showing the the text returned by the server when a 404 - Not Found error HTTP status code is returned by for
allurl
jobs (previously only for jobs withuse_browser: true
).
Fixed
- Bug in command line arguments
--config
and--hooks
. Contributed by
Klaus Sperner <https://github.com/klaus-tux>
__ in PR#46 <https://github.com/mborsetti/webchanges/pull/46>
__. - Job directive
compared_versions
now works as documented and testing has been added to the test suite. Reported by
jprokos <https://github.com/jprokos>
__ in#43 <https://github.com/mborsetti/webchanges/issues/43>
__. - The output of command line argument
--test-diff
now takes into considerationcompared_versions
. - Markdown containing code in a link text now converts correctly in HTML reports.
Internals
- The job
kind
ofshell
has been renamedcommand
to better reflect what it does and the way it's described
in the documentation, butshell
is still recognized for backward compatibility. - Readthedocs build upgraded to Python 3.10
v3.10.3
Added
- URL jobs with
use_browser: true
that receive an error HTTP status code from the server will now include the text
returned by the website in the error message (e.g. "Rate exceeded.", "upstream request timeout", etc.), except for
HTTP status code 404 - Not Found.
Changed
- The command line argument
--jobs
used to specify a jobs file will now accept aglob pattern <https://en.wikipedia.org/wiki/Glob_(programming)>
__, e.g. wildcards, to specify multiple files. If more than one
file matches the pattern, their contents will be concatenated before a job list is built. Useful e.g. if you have
multiple jobs files that run on different schedules and you want to clean the snapshot database of URLs/commands no
longer monitored ("garbage collect") using--gc-cache
. - The command line argument
--list
will now list the full path of the jobs file(s). - Traceback information for Python Exceptions is suppressed by default. Use the command line argument
--verbose
(or-v
) to display it.
Fixed
- Fixed
Unicode strings with encoding declaration are not supported.
error in thexpath
filter using
method: xml
under certain conditions (MacOS only). Reported byjprokos <https://github.com/jprokos>
__ in#42 <https://github.com/mborsetti/webchanges/issues/42>
__.
Internals
- The source distribution is now available on PyPI to support certain packagers like
fpm
. - Improved handling and reporting of Playwrigt browser errors (for URL jobs with
use_browser: true
).