Replies: 4 comments
This comment was marked as off-topic.
This comment was marked as off-topic.
-
It seems like the HTML structure of the webpage might have changed or there might be some issue with the selector. I would suggest trying to inspect the webpage directly to confirm the presence and structure of the h2 elements. Additionally, ensure that the webpage content is being fetched correctly. |
Beta Was this translation helpful? Give feedback.
-
Hi @anasadiek , I see you had an issue initially when trying to scrape the webpage for Your updated code, which includes headers, resolves this issue by mimicking a real browser request. This makes the website return the full content, including the Here is a breakdown of the working code: import requests
from bs4 import BeautifulSoup
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'cookie': '_ga=GA1.1.270690289.1716612177; _hjSession_2811765=eyJpZCI6IjYyMDAzMDBhLTY0YzUtNDlhNC1hOTJjLTU5ZjViMDdjOTc5NyIsImMiOjE3MTY2MTIxNzc3NDEsInMiOjAsInIiOjAsInNiIjowLCJzciI6MCwic2UiOjAsImZzIjoxLCJzcCI6MH0=; _clck=z3tdu%7C2%7Cfm2%7C0%7C1606; mp_f65e85d232fcb7d93f8de265b9818087_mixpanel=%7B%22distinct_id%22%3A%20%2218fae0f199dff8-0a5a80dd24dfc8-4c657b58-4b9600-18fae0f199e2497%22%2C%22%24device_id%22%3A%20%2218fae0f199dff8-0a5a80dd24dfc8-4c657b58-4b9600-18fae0f199e2497%22%2C%22%24initial_referrer%22%3A%20%22%24direct%22%2C%22%24initial_referring_domain%22%3A%20%22%24direct%22%7D; _ga_E9ENXX0G37=GS1.1.1716612176.1.1.1716612364.60.0.0; _hjSessionUser_2811765=eyJpZCI6Ijc3ZjJjNzkyLWU3ZTYtNTdjZS1hNGI2LTgxMTUwN2MwMDE5YyIsImNyZWF0ZWQiOjE3MTY2MTIxNzc3NDAsImV4aXN0aW5nIjp0cnVlfQ==; _clsk=136vdcb%7C1716612366520%7C2%7C1%7Cw.clarity.ms%2Fcollect; cto_bundle=smnOM193dHRtNEFmd0YlMkZvbjR5YWRVSndtcUZNQ3Z4WW5GSnd2YjNGU3clMkZUNXZiY1BmSUtpaFhYWFRoZjRGcHRLako1Zm9pQ0pWOWVORVd6dmpMQ2EzdG80YUFkVG52ZEJYeGpUcFFGOWpCZ2luaUxvTWxOcFlkNjJSakVUViUyRmNHanVCYw',
'if-none-match': 'W/"3346f-Yi3+sAbg6NVfMK5sAr3o3rcoEsM"',
'priority': 'u=0, i',
'sec-ch-ua': '"Not/A)Brand";v="8", "Chromium";v="126", "Microsoft Edge";v="126"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 Edg/126.0.0.0',
}
url = 'https://wuzzuf.net/jobs/p/j5IuFAHi1HH0-Sales-Manager-Automotive-spare-parts--Assiut-Assiut-Egypt?o=1&l=sp&t=sj&a=sales%20manager|search-v3|navbl&s=33737584'
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, 'html.parser')
h2_tags = soup.find_all('h2')
for tag in h2_tags:
print(tag.text.strip()) Output:
Key Points:
By including these headers, the server is more likely to treat your request as a legitimate one from a web browser, thereby providing you with the correct page content. Hope this helps! Best regards, |
Beta Was this translation helpful? Give feedback.
-
Hi there, The issue you're encountering likely stems from the fact that the content on the webpage you're trying to scrape is dynamically loaded via JavaScript. The Here are a few steps and a potential solution using
Using Let me know if this helps or if you need further assistance! Best regards, |
Beta Was this translation helpful? Give feedback.
-
Select Topic Area
Question
Body
I have a problem when scraping the following web page
https://wuzzuf.net/jobs/p/j5IuFAHi1HH0-Sales-Manager-Automotive-spare-parts--Assiut-Assiut-Egypt?o=1&l=sp&t=sj&a=sales%20manager|search-v3|navbl&s=33737584
i use the following code to get h2 text
but i get that result
Beta Was this translation helpful? Give feedback.
All reactions