Persistently store scraped tweets #23

laurin · 2022-02-27T17:25:19Z

As discussed in #16, the current storage of scraped tweets is not optimal, because the newly scraped tweets will just be appended to the existing tweets.txt-file, creating a lot of duplicates.
Integrating a database is probably not necessary at this point, we could store the scraped tweets with their ID in a json-file and only add new ones in the run of the application.

The text was updated successfully, but these errors were encountered:

laurin · 2022-02-27T17:29:02Z

We should also store the time the tweet was created and discard tweets after a certain time or allow the user to select a time-range. The latter would probably require the map to be generated client-side.

kinshukdua · 2022-02-27T18:20:35Z

I agree a json-file is probably the best option. I don't think we should generate things client side, especially because that might add unnecessary lag, especially in places where there might be very slow internet because of the current circumstances. I want to serve a static html to keep the load times as low as possible. Lets just keep set discard tweet time as a parameter server side.

Krishna-Sivakumar · 2022-02-27T19:23:29Z

We can consider SQLite here too, since it's simple and file-based. It sounds like we're performing some conditional manipulation, and this will help us cut down on time complexity.

Krishna-Sivakumar · 2022-02-27T19:49:54Z

@DomiiBunn mentioned firebase, would work here.

DomiiBunn · 2022-02-27T20:36:59Z

@DomiiBunn mentioned firebase, which would work here.

It depends on the complexity you'd look for. Firebase is a nice balance between file storage(JSON files, SQLite, etc) and standalone databases as it's almost as flexible as and handles security, hosting, high availability and at the usage, we'd be expecting it should be fully free. As long as DB reads are cached that is.

kinshukdua · 2022-02-28T05:55:11Z

The reason I'm a little hesitant about firebase is that it adds another steps for developed looking to reproduce the repo and contribute. The simpler the project, the easier it is to contribute (as long as it doesn't impact performance or features).

DomiiBunn · 2022-02-28T10:53:01Z

Use a config file and specify

useDatabaseCache: false

That way for a larger deployment it's worth caching and for personal deployment it's still working fine without added complexity

DomiiBunn · 2022-02-28T11:59:57Z

Or using redis but idk how painful it is to implement with python

And i think it would be a bit of an over kill.

sahal-mulki · 2022-02-28T15:18:40Z

I am working on a fix for duplicate tweets.

Krishna-Sivakumar · 2022-03-01T08:11:27Z

Let's just go with a json file.

DomiiBunn · 2022-03-01T09:04:32Z

Sounds good to me

sahal-mulki · 2022-03-01T13:51:04Z

Nvm, I failed miserably at it.

DomiiBunn · 2022-03-01T14:52:48Z

I'd love to help but python ain't my coup of tea

sahal-mulki · 2022-03-02T14:53:28Z

Sure-a-mundo

DomiiBunn added this to the Beta 0.2.0 milestone Feb 28, 2022

DomiiBunn added the enhancement New feature or request label Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistently store scraped tweets #23

Persistently store scraped tweets #23

laurin commented Feb 27, 2022

laurin commented Feb 27, 2022

kinshukdua commented Feb 27, 2022

Krishna-Sivakumar commented Feb 27, 2022 •

edited

Krishna-Sivakumar commented Feb 27, 2022

DomiiBunn commented Feb 27, 2022

kinshukdua commented Feb 28, 2022

DomiiBunn commented Feb 28, 2022

DomiiBunn commented Feb 28, 2022 •

edited

sahal-mulki commented Feb 28, 2022

Krishna-Sivakumar commented Mar 1, 2022 •

edited

DomiiBunn commented Mar 1, 2022

sahal-mulki commented Mar 1, 2022

DomiiBunn commented Mar 1, 2022

sahal-mulki commented Mar 2, 2022

Persistently store scraped tweets #23

Persistently store scraped tweets #23

Comments

laurin commented Feb 27, 2022

laurin commented Feb 27, 2022

kinshukdua commented Feb 27, 2022

Krishna-Sivakumar commented Feb 27, 2022 • edited

Krishna-Sivakumar commented Feb 27, 2022

DomiiBunn commented Feb 27, 2022

kinshukdua commented Feb 28, 2022

DomiiBunn commented Feb 28, 2022

DomiiBunn commented Feb 28, 2022 • edited

sahal-mulki commented Feb 28, 2022

Krishna-Sivakumar commented Mar 1, 2022 • edited

DomiiBunn commented Mar 1, 2022

sahal-mulki commented Mar 1, 2022

DomiiBunn commented Mar 1, 2022

sahal-mulki commented Mar 2, 2022

Krishna-Sivakumar commented Feb 27, 2022 •

edited

DomiiBunn commented Feb 28, 2022 •

edited

Krishna-Sivakumar commented Mar 1, 2022 •

edited