Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To rollover the crawler logs by day end through settings #4465

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rindhane
Copy link

@rindhane rindhane commented Mar 31, 2020

Details of modification are :

  • Modified _get_handler method to return TimedRotatingFileHandler
    if LOG_ROTATION variable is set to True in settings.
  • This enhancement is for the issue: Crawler logs are cut by day Crawler logs are cut by day #3628
  • This modification is considered from this reference
  • The input variables of TimedRotatingFileHandler have to provide a dict() to settings parameter "LOG_ROTATION_VALUE"

An example for settings which need to be provided in settings.py are

LOG_ROTATION = True
LOG_ROTATION_VALUE = {
    when = "midnight",
    interval = 1,
    backupCount = 0,
    delay= False,
    utc= False,
    atTime=None,
}

Changes to be committed:
modified: scrapy/utils/log.py

Fixes #3628

	- Modified _get_handler method to return TimedRotatingFileHandler
	  if LOG_ROTATION variable is set to True in settings.
	- This enhancement is regards to issue : Crawler logs are cut by day scrapy#3628

 Changes to be committed:
	modified:   scrapy/utils/log.py
Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use snake_case for the dictionary keys, and we can probably move the reading/evaluation of some settings later in code, but the changes look fine to me. I think you can proceed with tests and documentation.

 - Information about settings details for log rotation included in the documentation
	modified:   docs/topics/logging.rst
	modified:   docs/topics/settings.rst
 - Modify _get_handler method to return TimedRotatingFileHandler
 - To use snake_case for variables related to log roatation in settings.
	modified:   scrapy/utils/log.py
 - Tests to confirm attributes and log rotation functionality.
	new file:   tests/test_log_rotate.py
@rindhane rindhane marked this pull request as ready for review April 18, 2020 20:40
@rindhane
Copy link
Author

rindhane commented Apr 18, 2020

As per feedback made the following changes:

  • Information about settings details for log rotation included in the documentation
    modified: docs/topics/logging.rst
    modified: docs/topics/settings.rst
  • To use snake_case for variables related to log rotation in settings and accordingly modified _get_handler method to return TimedRotatingFileHandler
    modified: scrapy/utils/log.py
  • Tests to confirm attributes and log rotation functionality.
    new file: tests/test_log_rotate.py

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general lines, I wonder if this is the right approach. What if we want to later support size-based log rotation?

Instead of implementing support for a specific log rotation approach, maybe what we need is to look into making it possible for users to write custom log file handling logic, and provide some built-in implementations for users to choose from. Something in line with https://docs.scrapy.org/en/latest/topics/settings.html#std:setting-SPIDER_LOADER_CLASS.

Comment on lines +22 to +23
#vscode temporary files
.vscode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably best suited for your global gitignore instead of Scrapy’s.

@@ -150,6 +150,7 @@ These settings can be used to configure the logging:
* :setting:`LOG_DATEFORMAT`
* :setting:`LOG_STDOUT`
* :setting:`LOG_SHORT_NAMES`
* :setting:`LOG_FILE_ROTATE`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably include the other new settings here.

I also wonder if it would make sense to sort these alphabetical, the list is rather large now.

@@ -172,6 +173,11 @@ If :setting:`LOG_SHORT_NAMES` is set, then the logs will not display the Scrapy
component that prints the log. It is unset by default, hence logs contain the
Scrapy component responsible for that log output.

To have rollover of the logs, :setting:`LOG_FILE_ROTATE` have to be set ``TRUE``. `TimedRotatingFileHandler <https://docs.python.org/3/library/logging.handlers.html#timedrotatingfilehandler>`_ is set as the file handler for the logger.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To have rollover of the logs, :setting:`LOG_FILE_ROTATE` have to be set ``TRUE``. `TimedRotatingFileHandler <https://docs.python.org/3/library/logging.handlers.html#timedrotatingfilehandler>`_ is set as the file handler for the logger.
To enable time-based log rotation, set the :setting:`LOG_FILE_ROTATE` setting to ``True``.

TimedRotatingFileHandler feels like an implementation detail that users do not need to know at this point.

@@ -172,6 +173,11 @@ If :setting:`LOG_SHORT_NAMES` is set, then the logs will not display the Scrapy
component that prints the log. It is unset by default, hence logs contain the
Scrapy component responsible for that log output.

To have rollover of the logs, :setting:`LOG_FILE_ROTATE` have to be set ``TRUE``. `TimedRotatingFileHandler <https://docs.python.org/3/library/logging.handlers.html#timedrotatingfilehandler>`_ is set as the file handler for the logger.
Following settings :setting:`LOG_FILE_ROTATE_WHEN`, :setting:`LOG_FILE_ROTATE_INTERVAL`, :setting:`LOG_FILE_ROTATE_BACKUP_COUNT`, :setting:`LOG_FILE_ROTATE_DELAY`, :setting:`LOG_FILE_ROTATE_UTC` & :setting:`LOG_FILE_ROTATE_AT_TIME` can be additionally provided to change the default parameter of the ``TimeRotatingFileHandler``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a list. It can be introduced from the paragraph above, with something like “Use the following settings to customize time-based log rotation:”.

@Gallaecio
Copy link
Member

Looking at #3628, I see that what was proposed was to cover in the documentation how to implement this, without the need to modify the Scrapy code base at all.

@rindhane
Copy link
Author

Looking at #3628, I see that what was proposed was to cover in the documentation how to implement this, without the need to modify the Scrapy code base at all.

@Gallaecio, this last comment has put me into a muddle. As, except the last one, other comments point to improve the previous implementation while the last comment directs to abandon everything and just document this ....

from logging.handlers import TimedRotatingFileHandler
from scrapy.utils.log import configure_logging

logHandler = TimedRotatingFileHandler('crawl.log', when='midnight', interval=1)
configure_logging(install_root_handler=False)
logging.basicConfig(
    format='%(asctime)s [%(name)s] %(levelname)s: %(message)s',
    level=logging.INFO,
    handlers=[logHandler]) 

....into the documentation.

Will you please elaborate again on which path to consider to implement this feature?

@Gallaecio
Copy link
Member

Yes, I started reviewing your implementation, and only after that, when re-reading the original issue report, I realized that @kmike had tagged it as a documentation improvement issue.

If the suggested example code does the job, adding it to the documentation may be better in the long term than increasing the complexity of the Scrapy code base.

That said, if you really wish to implement this as a built-in feature, I’m up for reviewing your pull request. But I think we need to make it flexible enough that people won’t take the suggested example code route as soon as they need something a bit different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crawler logs are cut by day
2 participants