New type of harvesting peers #1522

singhpratyush · 2017-09-07T08:30:46Z

Issue type: Feature request/Discussion

Problem

The loklak server requires a lot of resources to run properly. There are two main reasons for this -

Elasticsearch
Java Threads

Due to this, we commit a lot of resources for the peers which are just meant for collecting more data.

Proposed Solution

The idea is to have a loklak wok like project which can be deployed to the cloud with very low resource utilization.

It would also provide a basic search feature, without any ES index (direct scraping) at similar endpoint with similar parameters (whichever applicable) - /api/search.json.

Advantages

Such peer will have following advantages over other wok peers -

Nothing else to do: Such peers will only be used to scrape data and push to the backend, and nothing else. So, these peers will be continuously collecting data without any interruptions.
Reliable resources: The resources will not be shared with other things on the peer. Moreover, high-speed Internet would be available for the peer as they would generally be hosted on the cloud.

Deploying such peers in place of loklak server peers would be more beneficial if one wishes to just collect data from it.

The text was updated successfully, but these errors were encountered:

AnshulMalik · 2017-09-07T08:37:56Z

Are you saying that instead of one big application, we divide the app in independent micro services?

If yes, that will be really good. Since it'll be much easier to maintain them separately and separation of concens, easy deployment.

vibhcool · 2017-09-07T09:11:53Z

or we can just add option to disable pushing data to backend, disable writing to index, disable dump, set it as peer to main loklak server and make some changes while installation. This will act as same then.

AnshulMalik · 2017-09-07T09:24:37Z

That would work too @vibhcool. But going for the above solution would provide more benefits, and even if we disable some features, we have to eventually move after some time.
Because the complexity will keep growing. So I recommend what @singhpratyush has to say.

mariobehling · 2017-09-07T10:08:26Z

@singhpratyush @vibhcool @AnshulMalik Yes, we had discussed something like that some time ago, but did not have the resources to follow up.

The pro I see with simply disabling features for peers in order to achieve the desired outcome is that we maintain the same loklak server and only the configuration is different.

Alternatively a solution with microservices sounds very good too. How would the setup be in that case? What components would we need? Would we split the loklak server?

What path do you guys prefer?

yukiisbored · 2017-09-07T12:50:54Z

We can break Loklak up to a bunch of little replaceable services. Mainly the harvester, the collector, and the search indexer.

yukiisbored · 2017-09-07T12:51:39Z

The harvester collects tweets and push it to the collector. Collector is in charge keeping the tweets and may be do some other methods of gathering tweets (something like P2P would be nice)

yukiisbored · 2017-09-07T12:51:56Z

And the search indexer can handle all of the elastic work

singhpratyush · 2017-09-08T05:04:12Z

My point here is having a type harvesting peers which don't use ES index and hence takes up fewer resources.

The motivation behind micro services is reducing the complexity of the project.

But if the micro services intended are standalone, i.e. harvester/server/collector can run without indexer, then such setup would be nice.

Otherwise, I was just referring to something like a loklak-wok-cloud project.

AnshulMalik · 2017-09-09T09:58:53Z

We can take out the harvestor(Scraper) from loklak, so that we can have any number of lightweight scrapers

An additional entity as mentioned by @yukiisbored , collector can be used to communicate with harvestors, peers and populate elasticsearch.

For elasticsearch, I think we already have option to use another cluster in the config

Now the loklak job gets reduced to serving api requests.

mariobehling added Hacktoberfest parent issue labels Oct 20, 2017

mariobehling removed the Hacktoberfest label Jul 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New type of harvesting peers #1522

New type of harvesting peers #1522

singhpratyush commented Sep 7, 2017 •

edited

AnshulMalik commented Sep 7, 2017

vibhcool commented Sep 7, 2017 •

edited

AnshulMalik commented Sep 7, 2017

mariobehling commented Sep 7, 2017

yukiisbored commented Sep 7, 2017

yukiisbored commented Sep 7, 2017

yukiisbored commented Sep 7, 2017

singhpratyush commented Sep 8, 2017

AnshulMalik commented Sep 9, 2017

New type of harvesting peers #1522

New type of harvesting peers #1522

Comments

singhpratyush commented Sep 7, 2017 • edited

Problem

Proposed Solution

Advantages

AnshulMalik commented Sep 7, 2017

vibhcool commented Sep 7, 2017 • edited

AnshulMalik commented Sep 7, 2017

mariobehling commented Sep 7, 2017

yukiisbored commented Sep 7, 2017

yukiisbored commented Sep 7, 2017

yukiisbored commented Sep 7, 2017

singhpratyush commented Sep 8, 2017

AnshulMalik commented Sep 9, 2017

singhpratyush commented Sep 7, 2017 •

edited

vibhcool commented Sep 7, 2017 •

edited