Skip to content

This project focuses on data extraction of Persian news articles from the Fars News website. The extracted data can be utilized for various purposes, particularly in the field of artificial intelligence research at the Telecommunications Research Center.

Notifications You must be signed in to change notification settings

parvvaresh/Iranian-news-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Iranian news dataset

image

Description:

This project focuses on data extraction of Persian news articles from the Fars News website. The extracted data can be utilized for various purposes, particularly in the field of artificial intelligence research at the Telecommunications Research Center.

Table of Contents:

Introduction

Requirements

Installation

Usage

Data Structure

License

Contact Information

Introduction: This project aims to collect Persian news data from the Fars News website. The extracted data can be used for research and analysis in the field of artificial intelligence, specifically within the Telecommunications Research Center. The availability of a large corpus of Persian news articles can facilitate tasks such as text classification, sentiment analysis, topic modeling, and information retrieval.

Requirements: To execute this project, the following requirements must be met:

Python (version 3.6 or higher) Beautiful Soup library (for web scraping) Requests library (for making HTTP requests) Pandas library (for data manipulation and analysis) Installation: To install the necessary libraries, execute the following command in your command-line interface: Copy pip install beautifulsoup4 requests pandas Usage: To utilize this project, follow the steps below: a. Clone or download the project repository from GitHub. b. Open the project directory in your preferred Python IDE or text editor. c. Run the Python script data_extraction.py. d. The script will initiate web scraping of the Fars News website and extract the required data. e. The extracted data will be saved in a CSV file named news_data.csv.

Note: Ensure that you have an active internet connection to enable web scraping.

Data Structure: The extracted data is stored in a CSV file named news_data.csv. The file contains the following columns: Title: The title of the news article. Date: The publication date of the news article. Summary: A brief summary or excerpt from the news article. Link: The URL of the news article. The structured data can be further utilized for analysis or imported into other tools for processing.

License: This project is released under the MIT License. You are free to modify, distribute, and use the code for both commercial and non-commercial purposes. However, it is required to attribute the original authors and provide a link to the project repository.

Contact Information: For any inquiries, suggestions, or issues regarding this project, please contact us at parvvaresh@gmail.com. We appreciate your feedback and contributions to the project.

ITRC

The Telecommunications Research Center of Iran (TRCI) is a prominent institution dedicated to telecommunications research and innovation. Located in Iran, TRCI plays a pivotal role in advancing the telecommunications sector within the country and contributing to the global telecommunications landscape.

image

TRCI focuses on a wide array of research areas, including but not limited to telecommunications network development, satellite communications, mobile technologies, and cybersecurity. It collaborates with both domestic and international partners to stay at the forefront of technology and ensure that Iran's telecommunications infrastructure is robust and competitive.

With a team of skilled researchers, state-of-the-art laboratories, and a commitment to knowledge sharing, TRCI is a significant driver of progress in the field of telecommunications in Iran. It plays a crucial role in enhancing connectivity, promoting technological advancements, and addressing the evolving needs of a digitally connected society.

By conducting cutting-edge research and fostering collaborations, TRCI contributes to the growth and innovation of the telecommunications industry in Iran, making it an essential hub for advancing communication technologies in the region.

About

This project focuses on data extraction of Persian news articles from the Fars News website. The extracted data can be utilized for various purposes, particularly in the field of artificial intelligence research at the Telecommunications Research Center.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages