Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for issue #182 #190

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open

Conversation

knirbhay
Copy link
Contributor

@knirbhay knirbhay commented Jul 2, 2018

Adding support for

1.Custom Headers and Cookies with Initial request
2.Shared cookies middleware to share cookies between crawl nodes

Linked Issue #182

… 2.Shared cookies middleware to share cookies between crawl nodes
if 'cookie' in item and item['cookie'] is not None:
if isinstance(item['cookie'], dict):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was not getting executed because kafka scraper_schema.json is only forcing it to be string. So I added attribute cookies to scraper_schema. Not sure if it make sense.

@knirbhay
Copy link
Contributor Author

knirbhay commented Jul 2, 2018

Custom feed will look like this

python kafka_monitor.py feed {
"url": "http://dmoztools.net",
"appid": "testapp",
"crawlid": "ABC123",
"spiderid": "myspiderid",
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,ima/webp,/;q=0.8",
"Accept-Encoding": "gzip, deflate",
"X-Requested-With": "dmoztools.net",
"User-Agent": "My Custom User Agent"
},
"cookies": {
"device_id": "1",
"app_token": "guid"
}
}

@coveralls
Copy link

coveralls commented Jul 2, 2018

Coverage Status

Coverage decreased (-0.4%) to 69.99% when pulling 1cd7940 on knirbhay:dev into 2c2075a on istresearch:dev.

@knirbhay
Copy link
Contributor Author

knirbhay commented Jul 2, 2018

Due to shared cookie middle ware the coverage has decreased by 0.4%. @madisonb do you think this can be managed? I improved few percentage in distributed_scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants