Skip to content

no output #264

Answered by eliasdabbas
caroheymes asked this question in Q&A
Jan 31, 2023 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

Just ran the code. It seems they block everything through their robots.txt. You can ignore that if you want using a special setting (it's up to you to make sure you are complying with their rules and consequences).

It's also very good to always save the logs of your crawl so you can check what might have happened if there are any issues:

import advertools as adv
adv.crawl('https://example.com', 'output.jl',
         custom_settings={'ROBOTSTXT_OBEY': False, 'LOG_FILE': 'mycrawl.log'})

Using your URL:

adv.crawl(
    'https://api-adresse.data.gouv.fr/search/?q=12+oulevard+de+reuilly+75012+paris',
    output_file='data_gouv_fr_result.jl',
    custom_settings={
         'CONCURRENT_ITEMS': 50,…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@caroheymes
Comment options

@eliasdabbas
Comment options

Answer selected by eliasdabbas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants