Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request | 功能需求 #4

Closed
tuchief opened this issue Oct 19, 2018 · 35 comments
Closed

Feature Request | 功能需求 #4

tuchief opened this issue Oct 19, 2018 · 35 comments
Labels
feature request Request for new features

Comments

@tuchief
Copy link

tuchief commented Oct 19, 2018

1、Timed task
2、Online packaging and deployment
3、Log monitoring alarm

@tuchief tuchief changed the title 希望支持一下功能 希望支持以下功能 Oct 19, 2018
@tuchief tuchief changed the title 希望支持以下功能 I hope to support the following features Oct 19, 2018
@my8100 my8100 self-assigned this Oct 19, 2018
@my8100
Copy link
Owner

my8100 commented Oct 19, 2018

@tuchief

2、Online packaging and deployment

Have you tried the Projects > Deploy page?

@tuchief
Copy link
Author

tuchief commented Oct 19, 2018

I know what you said, but this method should be manually packaged into egg, and then uploaded, and can not directly package the source project into egg, automatically upload it?

@my8100
Copy link
Owner

my8100 commented Oct 19, 2018

I know what you said, but this method should be manually packaged into egg, and then uploaded, and can not directly package the source project into egg, automatically upload it?

Ok, I would try to figure out a better way to eggify local projects.

@tuchief
Copy link
Author

tuchief commented Oct 19, 2018

You can refer to https://github.com/Gerapy/Gerapy

@my8100
Copy link
Owner

my8100 commented Oct 24, 2018

v0.9.9: Add auto eggifying

@tuchief
Copy link
Author

tuchief commented Oct 26, 2018

Wow, the response is fast! Expecting the same timing tasks

@wymen2018
Copy link

  1. You can select multiple crawlers when you want to set a timed task. You can select multiple crawlers to start at each time period after selecting the time point.
    Imagine that I have 100 crawlers in a project. This feature does provide convenience and management.
    2, as above, because there are a hundred crawlers, so can you have the function of custom label, that is, I can use the label screen and view the running status of the specified category of crawlers, and timed tasks
    Thanks!

@my8100
Copy link
Owner

my8100 commented Nov 7, 2018

You can select multiple crawlers when you want to set a timed task. You can select multiple crawlers to start at each time period after selecting the time point.
Imagine that I have 100 crawlers in a project. This feature does provide convenience and management.

You mean there are 100 spiders in a project and you want to schedule some of them to run periodically?

as above, because there are a hundred crawlers, so can you have the function of custom label, that is, I can use the label screen and view the running status of the specified category of crawlers, and timed tasks

What about labeling some related jobs with a specific jobid?

@wymen2018
Copy link

You mean there are 100 spiders in a project and you want to schedule some of them to run periodically?

No, Simultaneously select multiple spiders to set one timed task.

What about labeling some related jobs with a specific jobid?

Labels will be better because they can be more visualized according to their own classification

@my8100
Copy link
Owner

my8100 commented Nov 7, 2018

OK, I would take it into account when implementing this feature. Thanks for your advice!

@my8100
Copy link
Owner

my8100 commented Nov 12, 2018

v1.0.0rc1: Add Email Notice, with multi-triggers provided, including:

  • ON_JOB_RUNNING_INTERVAL

  • ON_JOB_FINISHED

  • When reaching the threshold of specific kind of log: ['CRITICAL', 'ERROR', 'WARNING', 'REDIRECT', 'RETRY', 'IGNORE'], in the meanwhile, you can ask ScrapydWeb to stop/forcestop current job automatically.

Get it via the pip install scrapydweb==1.0.0rc1 command.

Email sample:

image

@LWsmile
Copy link

LWsmile commented Nov 27, 2018

ERROR in utils: !!!!! ConnectionError HTTPConnectionPool(host='127.0.0.1', port=6800): Max retries exceeded with url: /jobs (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6111cae9b0>: Failed to establish a new connection: [Errno 111] Connection refused',))

@my8100
Copy link
Owner

my8100 commented Nov 27, 2018

ERROR in utils: !!!!! ConnectionError HTTPConnectionPool(host='127.0.0.1', port=6800): Max retries exceeded with url: /jobs (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6111cae9b0>: Failed to establish a new connection: [Errno 111] Connection refused',))

Please new an issue with details.

@JerryChenn07
Copy link

This project is pretty good. If you can join Timed task, it will be a more complete project.

Come on!

@lidonghe
Copy link

timed task +1

@my8100
Copy link
Owner

my8100 commented Mar 12, 2019

v1.2.0: Support Timer Tasks to schedule a spider run periodically

@my8100 my8100 closed this as completed Mar 12, 2019
@my8100 my8100 added the feature request Request for new features label Mar 12, 2019
@lidonghe
Copy link

While adding a timer task, I have to choose a version, but actually I only want to use the latest version so that if the project is updated, I don't need to update my task accordingly.

@my8100
Copy link
Owner

my8100 commented Mar 21, 2019

OK, it would be fixed in the future release.
For the time being, go to the Jobs page and click either multinode or Start button to work around.

@my8100
Copy link
Owner

my8100 commented Mar 21, 2019

Or modify the code below to vm.versions = ['default: the latest version'].concat(obj.versions);

vm.versions = obj.versions;

@lidonghe
Copy link

Or modify the code below to vm.versions = ['default: the latest version'].concat(obj.versions);
scrapydweb/scrapydweb/templates/scrapydweb/schedule.html

Line 826 in 560e998

                         vm.versions = obj.versions;

It works, thanks

@my8100
Copy link
Owner

my8100 commented Mar 22, 2019

The modification has been commited.

@my8100 my8100 pinned this issue Mar 23, 2019
@my8100 my8100 changed the title I hope to support the following features Feature Request | 功能需求 Mar 23, 2019
@my8100 my8100 removed their assignment May 16, 2019
@heave-Rother
Copy link

For Logs categorization, can the same spiders distributed on different scrapyd be aggregated? @my8100

@my8100
Copy link
Owner

my8100 commented Jul 24, 2019

@heave-Rother
For the time being, you can switch to a specific page of the neighboring node
with the help of Node Scroller and Node Skipping.
image

If that cannot satisfy your need, could you draw a picture to show me your idea?

@heave-Rother
Copy link

@my8100 ok,
捕获

@my8100
Copy link
Owner

my8100 commented Jul 24, 2019

@heave-Rother

  1. What would you use the checkboxes in the drop-down list for?
    Did you notice the checkboxes for nodes in the Servers page?
  2. Stats aggregation would be followed up in PR Support aggregating stats of the same job from selected Scrapyd servers #72.

@heave-Rother
Copy link

@my8100
yes, I want to choose the server I want to join the statistics.

@my8100
Copy link
Owner

my8100 commented Jul 24, 2019

@heave-Rother
I see. Thanks for your suggestion.

@seozed
Copy link

seozed commented Aug 17, 2019

add a image of docker, please

@my8100
Copy link
Owner

my8100 commented Aug 20, 2019

add a image of docker, please

@seozed
Check out the docker image created by @luzihang123.

my8100/logparser#15 (comment)

@devxiaosong
Copy link

感谢作者,这是我找到的最好的爬虫集群操作平台。提几个需求:
1,给每个node加描述,方便自己看。
2,通过手机短信发送报警信息。
3,如何支持基于scrapy-redis的分布式爬虫的配置、启动?

@my8100
Copy link
Owner

my8100 commented Dec 9, 2019

@devxiaosong
Replied in #107.

@Tobeyforce
Copy link

Please add a short tutorial on how to switch from the flask development server to a production server with https enabled using letsencrypt. It would be much appreciated.

@my8100
Copy link
Owner

my8100 commented Aug 21, 2020

Please add a short tutorial on how to switch from the flask development server to a production server with https enabled using letsencrypt. It would be much appreciated.

Please try it out and share your result.

logger.info("For running Flask in production, check out http://flask.pocoo.org/docs/1.0/deploying/")

############################## ScrapydWeb #####################################
# The default is False, set it to True and add both CERTIFICATE_FILEPATH and PRIVATEKEY_FILEPATH
# to run ScrapydWeb in HTTPS mode.
# Note that this feature is not fully tested, please leave your comment here if ScrapydWeb
# raises any excepion at startup: https://github.com/my8100/scrapydweb/issues/18
ENABLE_HTTPS = False
# e.g. '/home/username/cert.pem'
CERTIFICATE_FILEPATH = ''
# e.g. '/home/username/cert.key'
PRIVATEKEY_FILEPATH = ''

@Jasonjk3
Copy link

可以在页面上动态添加/删除 scrapyd 节点吗

@my8100
Copy link
Owner

my8100 commented Apr 27, 2021

可以在页面上动态添加/删除 scrapyd 节点吗

Editing Scrapyd servers via GUI is not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for new features
Projects
None yet
Development

No branches or pull requests