Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.12 release notes #6226

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
220 changes: 209 additions & 11 deletions docs/news.rst
Expand Up @@ -3,6 +3,198 @@
Release notes
=============

.. _release-2.12.0:

Scrapy 2.12.0 (unreleased)
--------------------------

Highlights:

- Added :class:`~scrapy.http.JsonResponse`

- Added ``items_per_minute`` and ``responses_per_minute`` stats

- Added component getters to :class:`scrapy.crawler.Crawler`

Deprecation removals
~~~~~~~~~~~~~~~~~~~~

- Removed the ``scrapy.utils.request.request_fingerprint`` function,
deprecated in Scrapy 2.7.0.
(:issue:`6212`, :issue:`6213`)

- Removed support for value ``"2.6"`` of setting
``REQUEST_FINGERPRINTER_IMPLEMENTATION``, deprecated in Scrapy 2.7.0.
(:issue:`6212`, :issue:`6213`)

- :class:`~scrapy.dupefilters.RFPDupeFilter` subclasses now require
supporting the ``fingerprinter`` parameter in their ``__init__`` method,
introduced in Scrapy 2.7.0.
(:issue:`6102`, :issue:`6113`)

- Removed the ``scrapy.downloadermiddlewares.decompression`` module,
deprecated in Scrapy 2.7.0.
(:issue:`6100`, :issue:`6113`)

- Removed the :func:`scrapy.utils.response.response_httprepr` function,
deprecated in Scrapy 2.6.0.
(:issue:`6111`, :issue:`6116`)

- Spiders with spider-level HTTP authentication, i.e. with the ``http_user``
or ``http_pass`` attributes, must now define ``http_auth_domain`` as well,
which was introduced in Scrapy 2.5.1.
(:issue:`6103`, :issue:`6113`)

- :ref:`Media pipelines <topics-media-pipeline>` methods ``file_path``,
``file_downloaded``, ``get_images``, ``image_downloaded``,
``media_downloaded``, ``media_to_download``, and ``thumb_path`` must now
support an ``item`` parameter, added in Scrapy 2.4.0.
(:issue:`6107`, :issue:`6113`)

- The ``__init__`` and ``from_crawler`` methods of :ref:`feed storage backend
classes <topics-feed-storage>` must now support the keyword-only
``feed_options`` parameter, introduced in Scrapy 2.4.0.
(:issue:`6105`, :issue:`6113`)

- Removed the ``scrapy.loader.common`` and ``scrapy.loader.processors``
modules, deprecated in Scrapy 2.3.0.
(:issue:`6106`, :issue:`6113`)

- Removed the ``scrapy.utils.misc.extract_regex`` function, deprecated in
Scrapy 2.3.0.
(:issue:`6106`, :issue:`6113`)

- Removed the ``scrapy.http.JSONRequest`` class, replaced with
``JsonRequest`` in Scrapy 1.8.0.
(:issue:`6110`, :issue:`6113`)

- ``scrapy.utils.log.logformatter_adapter`` no longer supports missing
``args``, ``level``, or ``msg`` parameters, and no longer supports a
``format`` parameter, all scenarios that were deprecated in Scrapy 1.0.0.
(:issue:`6109`, :issue:`6116`)

- A custom class assigned to the :setting:`SPIDER_LOADER_CLASS` setting that
does not implement the :class:`~scrapy.interfaces.ISpiderLoader` interface
will now raise a :exc:`zope.interface.verify.DoesNotImplement` exception at
run time. Non-compliant classes have been triggering a deprecation warning
since Scrapy 1.0.0.
(:issue:`6101`, :issue:`6113`)

Deprecations
~~~~~~~~~~~~

- The ``REQUEST_FINGERPRINTER_IMPLEMENTATION`` setting is now deprecated.
(:issue:`6212`, :issue:`6213`)

- The :ref:`Reppy <reppy-parser>`-based ``robots.txt`` parser,
``scrapy.robotstxt.ReppyRobotParser``, is now deprecated.
(:issue:`5230`, :issue:`6099`)

- The ``scrapy.utils.misc.create_instance`` function is now deprecated, it
should be replaced with one of its new replacements that provide a cleaner
signature: :func:`scrapy.utils.misc.build_from_crawler`,
:func:`scrapy.utils.misc.build_from_settings`.
(:issue:`5523`, :issue:`5884`, :issue:`6162`, :issue:`6169`)

New features
~~~~~~~~~~~~

- Added a new :class:`~scrapy.http.Response` subclass,
:class:`~scrapy.http.JsonResponse`, for responses with a `JSON MIME type
<https://mimesniff.spec.whatwg.org/#json-mime-type>`_.
(:issue:`6069`, :issue:`6171`, :issue:`6174`)

- The :class:`~scrapy.extensions.logstats.LogStats` extension now adds
``items_per_minute`` and ``responses_per_minute`` to the :ref:`stats
<topics-stats>` when the spider closes.
(:issue:`4110`, :issue:`4111`)

- Added component getters to :class:`~scrapy.crawler.Crawler`:
:meth:`~scrapy.crawler.Crawler.get_addon`,
:meth:`~scrapy.crawler.Crawler.get_downloader_middleware`,
:meth:`~scrapy.crawler.Crawler.get_extension`,
:meth:`~scrapy.crawler.Crawler.get_item_pipeline`,
:meth:`~scrapy.crawler.Crawler.get_spider_middleware`.
(:issue:`6181`)

Improvements
~~~~~~~~~~~~

- Extended the list of file extensions that
:class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
ignores by default.
(:issue:`6074`, :issue:`6125`)

Bug fixes
~~~~~~~~~

- Assigning an empty string to the :setting:`JOBDIR` setting no longer
triggers the initialization of the disk queue.
(:issue:`6121`, :issue:`6124`)

- ``media_to_download`` errors in :ref:`media pipelines
<topics-media-pipeline>` are now logged.
(:issue:`5067`, :issue:`5068`)

- When using the :command:`parse` command, callbacks specified on the command
line no longer see their signature stripped.
(:issue:`6182`)

Documentation
~~~~~~~~~~~~~

- :ref:`Documented how to create a a blank request <faq-blank-request>`.
(:issue:`6203`, :issue:`6208`)

- Other documentation improvements and fixes.
(:issue:`6094`,
:issue:`6177`,
:issue:`6200`,
:issue:`6207`,
:issue:`6216`)

Quality assurance
~~~~~~~~~~~~~~~~~

- Added ``py.typed``, in line with `PEP 561
<https://peps.python.org/pep-0561/>`_.
(:issue:`6058`, :issue:`6059`)

- Completed type hints for :class:`~scrapy.http.Request`,
:class:`~scrapy.http.Response`, :class:`~scrapy.http.headers.Headers`,
:ref:`spider middlewares <topics-spider-middleware>`, :ref:`downloader
middlewares <topics-downloader-middleware>`, and more.
(:issue:`5989`,
:issue:`6097`,
:issue:`6127`,
:issue:`6129`,
:issue:`6130`,
:issue:`6133`,
:issue:`6191`)

- CI and test improvements and fixes.
(:issue:`5454`,
:issue:`5997`,
:issue:`6078`,
:issue:`6084`,
:issue:`6087`,
:issue:`6132`,
:issue:`6153`,
:issue:`6154`,
:issue:`6201`)

- Code cleanups.
(:issue:`6196`,
:issue:`6197`,
:issue:`6198`,
:issue:`6199`)

Other
~~~~~

- Issue tracker improvements. (:issue:`6066`)


.. _release-2.11.1:

Scrapy 2.11.1 (2024-02-14)
Expand Down Expand Up @@ -60,17 +252,26 @@ Modified requirements
- The Twisted dependency is no longer restricted to < 23.8.0. (:issue:`6024`,
:issue:`6064`, :issue:`6142`)

Deprecations
~~~~~~~~~~~~

- Subclasses of
:class:`~scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware`
must now support the ``crawler`` keyword-only parameter in their
``__init__`` method.

Bug fixes
~~~~~~~~~

- The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
functions. (:issue:`6024`, :issue:`6030`, :issue:`6064`, :issue:`6112`)

Documentation
~~~~~~~~~~~~~

- Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6076`,
:issue:`6080`, :issue:`6147`)

- Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
(:issue:`5565`)
Expand All @@ -83,7 +284,7 @@ Documentation

- Added a missing note about backward-incompatible changes in
:class:`~scrapy.exporters.PythonItemExporter` to the 2.11.0 release notes.
(:issue:`6060`, :issue:`6081`)
(:issue:`6060`, :issue:`6062`, :issue:`6081`)

- Added a missing note about removing the deprecated
``scrapy.utils.boto.is_botocore()`` function to the 2.8.0 release notes.
Expand Down Expand Up @@ -161,9 +362,6 @@ Deprecation removals
1.0.0, use :attr:`CrawlerRunner.spider_loader
<scrapy.crawler.CrawlerRunner.spider_loader>` instead. (:issue:`6010`)

- The :func:`scrapy.utils.response.response_httprepr` function, deprecated in
Scrapy 2.6.0, has now been removed. (:issue:`6111`)

Deprecations
~~~~~~~~~~~~

Expand Down Expand Up @@ -2248,9 +2446,9 @@ New features
from protocol 2 to protocol 4, improving serialization capabilities and
performance (:issue:`4135`, :issue:`4541`)

* :func:`scrapy.utils.misc.create_instance` now raises a :exc:`TypeError`
exception if the resulting instance is ``None`` (:issue:`4528`,
:issue:`4532`)
* The ``scrapy.utils.misc.create_instance`` function now raises a
:exc:`TypeError` exception if the resulting instance is ``None``
(:issue:`4528`, :issue:`4532`)

.. _itemadapter: https://github.com/scrapy/itemadapter

Expand Down Expand Up @@ -2773,8 +2971,8 @@ Bug fixes
(:issue:`4123`)

* Fixed a typo in the message of the :exc:`ValueError` exception raised when
:func:`scrapy.utils.misc.create_instance` gets both ``settings`` and
``crawler`` set to ``None`` (:issue:`4128`)
the ``scrapy.utils.misc.create_instance`` function gets both ``settings``
and ``crawler`` set to ``None`` (:issue:`4128`)


Documentation
Expand Down
7 changes: 2 additions & 5 deletions docs/topics/addons.rst
Expand Up @@ -158,6 +158,7 @@ Use a fallback component:
.. code-block:: python

from scrapy.core.downloader.handlers.http import HTTPDownloadHandler
from scrapy.utils.misc import build_from_crawler


FALLBACK_SETTING = "MY_FALLBACK_DOWNLOAD_HANDLER"
Expand All @@ -168,11 +169,7 @@ Use a fallback component:

def __init__(self, settings, crawler):
dhcls = load_object(settings.get(FALLBACK_SETTING))
self._fallback_handler = create_instance(
dhcls,
settings=None,
crawler=crawler,
)
self._fallback_handler = build_from_crawler(dhcls, crawler)

def download_request(self, request, spider):
if request.meta.get("my_params"):
Expand Down
4 changes: 3 additions & 1 deletion docs/topics/api.rst
Expand Up @@ -26,7 +26,9 @@ contains a dictionary of all available extensions and their order similar to
how you :ref:`configure the downloader middlewares
<topics-downloader-middleware-setting>`.

.. class:: Crawler(spidercls, settings)
.. autoclass:: Crawler
:members: get_addon, get_downloader_middleware, get_extension,
get_item_pipeline, get_spider_middleware

The Crawler object must be instantiated with a
:class:`scrapy.Spider` subclass and a
Expand Down
19 changes: 17 additions & 2 deletions docs/topics/components.rst
Expand Up @@ -4,8 +4,9 @@
Components
==========

A Scrapy component is any class whose objects are created using
:func:`scrapy.utils.misc.create_instance`.
A Scrapy component is any class whose objects are built using
:func:`~scrapy.utils.misc.build_from_crawler` or
:func:`~scrapy.utils.misc.build_from_settings`.

That includes the classes that you may assign to the following settings:

Expand Down Expand Up @@ -84,3 +85,17 @@ If your requirement is a minimum Scrapy version, you may use
f"method of spider middlewares as an asynchronous "
f"generator."
)

API reference
=============

The following functions can be used to create an instance of a component class:

.. autofunction:: scrapy.utils.misc.build_from_crawler

.. autofunction:: scrapy.utils.misc.build_from_settings

The following function can also be useful when implementing a component, to
report the import path of the component class, e.g. when reporting problems:

.. autofunction:: scrapy.utils.python.global_object_name