Aggregation Microservices

1. News Aggregation : Data Scraping

This application tracks a news based on the keyword we're inputting. It has two modes while tracking a particular keyword. It fetches articles based on the time filters passed and the keywords passed to fetch news. The news for a particular entity is scraped based on the alias table. After aggregating, this service dumps the raw file into google storage.

historic: fetches the past one year of news articles for the particular entity
daily: once we start monitoring a company, we fetch and store the news daily for that entity

Databases

Right now, noel the news aggregator monitors two open databases.

GDELT: We can fetch news from GDELT on multiple languages, regions and we can get upto 250 articles per request. If we time that request properly, we can get all of it.
Google News: Google News doesn't provide any official APIs. All we have is access to the RSS links where we monitor multiple regions for a particular keyword using URLS. To load a new database, write a class and put it in workers/sources. The class should have the following format:

class Source:
    def __init__(self):
        pass

    def history(self):
        pass

    def daily(self):
        pass

    def write_data(self):
        pass

Subscribes to: news_aggregator Publishes to: index_articles

2. News Indexing

News indexing listens to the the index_articles subscription and normalizes the format and inputs the rows into our Cloud SQL PostgreSQL database.

Subscribes to: index_articles Publishes to: None

3. Body Extraction

Microservice to extract body from the new data sources.

Subscribes to: body_extractor Publishes to: None

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
body_extraction		body_extraction
feed_generation		feed_generation
indexing		indexing
insertion		insertion
news_aggregator		news_aggregator
.dockerignore		.dockerignore
.gcloudignore		.gcloudignore
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

body_extraction

body_extraction

feed_generation

feed_generation

indexing

indexing

insertion

insertion

news_aggregator

news_aggregator

.dockerignore

.dockerignore

.gcloudignore

.gcloudignore

.gitignore

.gitignore

README.md

README.md

docker-compose.yml

docker-compose.yml

requirements.txt

requirements.txt

Repository files navigation

Aggregation Microservices

1. News Aggregation : Data Scraping

Databases

2. News Indexing

3. Body Extraction

About

Releases

Packages

Languages

guyandtheworld/fetch-and-index-global-news

Folders and files

Latest commit

History

Repository files navigation

Aggregation Microservices

1. News Aggregation : Data Scraping

Databases

2. News Indexing

3. Body Extraction

About

Resources

Stars

Watchers

Forks

Languages