Skip to content

A Jupyter Notebook that scrapes news articles from a website, storing them in a Pandas DataFrame. It uses BeautifulSoup and requests for web scraping. Install required dependencies and follow the code cells to extract news efficiently.

Notifications You must be signed in to change notification settings

AbdelrhmanSror/Fetch-News-with-Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Fetch-News-with-Web-Scraping Readme

This Jupyter Notebook provides a script for web scraping news articles from a website and storing them in a Pandas DataFrame. It uses the BeautifulSoup library for web scraping and the requests library for downloading web pages.

Installation

Before running the notebook, you need to install the required dependencies. You can do this by executing the following commands in a code cell:

!pip install html5lib
!pip install bs4

Usage

  1. Import the necessary libraries by executing the code cell:
from bs4 import BeautifulSoup
import requests
import pandas as pd
  1. Set up the configuration by defining the types of news and the base URL in the code cell:
types = {
    "politics": "سياسة",
    "ebusiness": "اقتصاد",
    "culture": "ثقافة",
    "sport": "رياضة",
    "arts": "فن",
    "tech": "تكنولوجيا",
    "turath": "تراث",
    "midan": "ميدان"
}

base_url = "https://1-a1072.azureedge.net"
  1. Download the HTML of the news pages by executing the code cell:
for key in types:
    downlaod_html_news(key)
  1. Extract the news articles for each type and append them to a Pandas DataFrame by executing the code cell:
def extract_news_with_type_and_append_to_dataframe(_type):
    # ...
    return data

data = pd.DataFrame(columns=["Type", "Header", "Body", "Date", "URL"])

for key in types:
    news_data = extract_news_with_type_and_append_to_dataframe(key)
    data = pd.concat([data, news_data], ignore_index=True)
  1. View the extracted news articles by executing the code cell:
data

About

A Jupyter Notebook that scrapes news articles from a website, storing them in a Pandas DataFrame. It uses BeautifulSoup and requests for web scraping. Install required dependencies and follow the code cells to extract news efficiently.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published