Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New type of harvesting peers #1522

Open
singhpratyush opened this issue Sep 7, 2017 · 9 comments
Open

New type of harvesting peers #1522

singhpratyush opened this issue Sep 7, 2017 · 9 comments

Comments

@singhpratyush
Copy link
Member

singhpratyush commented Sep 7, 2017

  • Issue type: Feature request/Discussion

Problem

The loklak server requires a lot of resources to run properly. There are two main reasons for this -

  • Elasticsearch
  • Java Threads

Due to this, we commit a lot of resources for the peers which are just meant for collecting more data.

Proposed Solution

The idea is to have a loklak wok like project which can be deployed to the cloud with very low resource utilization.

It would also provide a basic search feature, without any ES index (direct scraping) at similar endpoint with similar parameters (whichever applicable) - /api/search.json.

Advantages

Such peer will have following advantages over other wok peers -

  • Nothing else to do: Such peers will only be used to scrape data and push to the backend, and nothing else. So, these peers will be continuously collecting data without any interruptions.
  • Reliable resources: The resources will not be shared with other things on the peer. Moreover, high-speed Internet would be available for the peer as they would generally be hosted on the cloud.

Deploying such peers in place of loklak server peers would be more beneficial if one wishes to just collect data from it.

@AnshulMalik
Copy link
Member

Are you saying that instead of one big application, we divide the app in independent micro services?

If yes, that will be really good. Since it'll be much easier to maintain them separately and separation of concens, easy deployment.

@vibhcool
Copy link
Member

vibhcool commented Sep 7, 2017

or we can just add option to disable pushing data to backend, disable writing to index, disable dump, set it as peer to main loklak server and make some changes while installation. This will act as same then.

@AnshulMalik
Copy link
Member

That would work too @vibhcool. But going for the above solution would provide more benefits, and even if we disable some features, we have to eventually move after some time.
Because the complexity will keep growing. So I recommend what @singhpratyush has to say.

@mariobehling
Copy link
Member

@singhpratyush @vibhcool @AnshulMalik Yes, we had discussed something like that some time ago, but did not have the resources to follow up.

The pro I see with simply disabling features for peers in order to achieve the desired outcome is that we maintain the same loklak server and only the configuration is different.

Alternatively a solution with microservices sounds very good too. How would the setup be in that case? What components would we need? Would we split the loklak server?

What path do you guys prefer?

@yukiisbored
Copy link
Member

We can break Loklak up to a bunch of little replaceable services. Mainly the harvester, the collector, and the search indexer.

@yukiisbored
Copy link
Member

The harvester collects tweets and push it to the collector. Collector is in charge keeping the tweets and may be do some other methods of gathering tweets (something like P2P would be nice)

@yukiisbored
Copy link
Member

And the search indexer can handle all of the elastic work

@singhpratyush
Copy link
Member Author

My point here is having a type harvesting peers which don't use ES index and hence takes up fewer resources.

The motivation behind micro services is reducing the complexity of the project.

But if the micro services intended are standalone, i.e. harvester/server/collector can run without indexer, then such setup would be nice.

Otherwise, I was just referring to something like a loklak-wok-cloud project.

@AnshulMalik
Copy link
Member

We can take out the harvestor(Scraper) from loklak, so that we can have any number of lightweight scrapers

An additional entity as mentioned by @yukiisbored , collector can be used to communicate with harvestors, peers and populate elasticsearch.

For elasticsearch, I think we already have option to use another cluster in the config

Now the loklak job gets reduced to serving api requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants