Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeautifulSoup Not Scraping Tags #35

Open
sandersrc opened this issue Aug 12, 2022 · 1 comment
Open

BeautifulSoup Not Scraping Tags #35

sandersrc opened this issue Aug 12, 2022 · 1 comment

Comments

@sandersrc
Copy link

I have been following your SeautifulSoup tutorial (https://www.youtube.com/watch?v=87Gx3U0BDlo). When I run the code I get an error that indicates there are no attrs within the result. Please quickly browse through my problem below. Any push in the right direction would be much appreciated.

CODE

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []
for h2_tag in soup.find_all('h2'):
a_tag = h2_tag.find('a')
urls.append(a_tag.attrs['href'])

print(urls)
#####################

RESULT:
Traceback (most recent call last):
File "C:\Users\Rafiki\PycharmProjects\HelloWorld\WHExample.py", line 22, in
urls.append(a_tag.attrs['href'])
AttributeError: 'NoneType' object has no attribute 'attrs'

Process finished with exit code 1

I changed the code to the following so I could see what I was getting in the h2 tags and found that they were not including the nested information.

CODE

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefing-room/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

for h2_tag in soup.find_all('h2'):
print(h2_tag.attrs)
##############

RESULT:
{'id': 'dialog2Title'}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['news-item__title-container']}
{'class': ['h4alt', 'form-headline']}

Process finished with exit code 0

Please help me understand how to drill down through tags to find the information within.

Thank you,
Ryan

@ritik047
Copy link

yeahh facing same issue here !!
@vprusso pls resolve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants