Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow config parameter in the gnews.get_full_article() #44

Open
sohaibrahman64 opened this issue Sep 21, 2022 · 1 comment
Open

Allow config parameter in the gnews.get_full_article() #44

sohaibrahman64 opened this issue Sep 21, 2022 · 1 comment
Projects

Comments

@sohaibrahman64
Copy link

I using GNews get_full_article() function to extract the top_image from the Article. However, when I run this on my production server it throws me the below error:

ERROR: Article download() failed with HTTPSConnectionPool(host='indianexpress.com', port=443): Max retries exceeded with url: /article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/ (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden'))) on URL https://indianexpress.com/article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/

I searched through Google and ended up with this solution:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent

url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"

page = Article(url, config=config)


page.download()
page.parse()
print(page.text)

As per the code above, I need to mention the user agent and get it assigned to config.browser_user_agent to prevent the server from getting banned. However, if I want to use gnews.get_full_article() I am not able to specify the config parameter inside. Is there any provision to mention this parameter? Am I missing something?

@ranahaani ranahaani added this to To do in GNews Oct 24, 2022
@ranahaani
Copy link
Owner

ranahaani commented Oct 31, 2022

A ticket has been created.
https://github.com/ranahaani/GNews/projects/2#card-86371640

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
GNews
To do
Development

No branches or pull requests

2 participants