Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make background DB cleaner more complete. #2281

Closed
wants to merge 3 commits into from
Closed

Make background DB cleaner more complete. #2281

wants to merge 3 commits into from

Conversation

BrownSlaughter
Copy link
Contributor

@BrownSlaughter BrownSlaughter commented Aug 29, 2017

add removal of gym and gym details over 24 hours old to clean db loop

Description

This will remove the gym and its deatils from the database if it has not been scanned in 24 hours

Motivation and Context

If you no longer scan an area or a gym is removed by niantic the old gyms stay on the map and in the db but never updated.

Seb: Fixes #2278.

How Has This Been Tested?

Had old gym info in db, the db cleaner loop removed them

Screenshots (if appropriate):

Types of changes

  • [x ] Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [x ] My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

add removal of gym and gym details over 24 hours old
@friscoMad
Copy link
Contributor

Please also add pokestops as they also move around or get deleted some times.

@elincognito
Copy link
Contributor

It would also be nice a wh notification when a gym or pokestop is removed.

@BrownSlaughter
Copy link
Contributor Author

BrownSlaughter commented Aug 29, 2017

@friscoMad pokestop dont have a last_scanned so removal this way is not possible

@BrownSlaughter
Copy link
Contributor Author

@elincognito that is not the scope of this PR

Copy link
Contributor

@tomballgithub tomballgithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one problem. Datetime.now is local time but the times stored in the database for last_scanned are UTC. So in my case this compare is only looking 17 hours back instead of 24.

Try datetime.utcnow?

@tomballgithub
Copy link
Contributor

But the feature works and I saw old gyms removed at first database scrub

@friscoMad
Copy link
Contributor

Ok some new thoughts pokestops have the last_updated field but the problem is that stops are not constantly updating like gyms so deleting based on this field would likely lead to false positives, they are only updated when a lure appears.

Anyway we could delete them anyway and let the map add them again, due to the long range for gyms/stops I don't see it a real problem but I would set the delta to at least 1 week to avoid doing this every day, anyway that would lead to having a lot of missing stops for some minutes just 1 week after the map was first started.

And on another issue, raids, gym members and gym pokes should also be cleaned with gym/gym details

@tomballgithub
Copy link
Contributor

tomballgithub commented Sep 3, 2017 via email

@sebastienvercammen
Copy link
Member

sebastienvercammen commented Sep 3, 2017

Temporarily on hold while we discuss some SQL/SQLite things on Discord that are related to this PR.

@tomballgithub
Copy link
Contributor

Also, when removing a gym we should also remove the pokemon assigned to those gyms. I am seeing orphaned rows in gymmember and gympokemon.

@andreacassani
Copy link

I think there should be at least an option to enable/disable gym-details removal.
I for example run gym-info only once for every new area to retrieve gym names. This would require a new gym-info scan every 24 hours.

@BrownSlaughter
Copy link
Contributor Author

@acassani91 gym names are from GMO not gym_info you will get those if you scan within 1000m of a gym. This PR is to remove gym details that dont get scanned because you no long scan the same area or a gym is removed.

@friscoMad
Copy link
Contributor

friscoMad commented Sep 18, 2017 via email

Copy link
Member

@sebastienvercammen sebastienvercammen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Raids, gym members and gym Pokémon should also be removed on cleanup.
  • Disabled/removed pokéstops should also be cleared.
  • Cleanup shouldn't be based on data being over a day old, it should be based on: 1. either the data is over a week old, and 2. we have multiple rows of data with the same ID (= everywhere we use INSERTs only, e.g. gym details), which means we can keep the last inserted and drop the rest.
  • Timestamps should be compared in UTC, not local times.
  • Tables missing timestamp data are easily fixed by adding a timestamp field with ON UPDATE CURRENT_TIMESTAMP, which doesn't even require changes to the scanner. Ideally all tables that need these "last scanned" timestamps get one with ON UPDATE CURRENT_TIMESTAMP so we can easily track our own timestamps rather than times reported by the API or the local Python instance.

If we have a scenario where data is being inserted without being cleared first (e.g. when scanning a gym, insert all gym members w/o cleaning out the old ones first), the data clearing query should be added right before the INSERTs rather than in the DB cleaner, because the DB cleaner is for cleaning inactive/background/stale data - not data we're actively touching with the scanning.

@sebastienvercammen sebastienvercammen changed the title add removal of gym and gym details to clean db loop Make background DB cleaner more complete. Sep 21, 2017
@sebastienvercammen sebastienvercammen added this to the 4.1.0 milestone Sep 21, 2017
@friscoMad
Copy link
Contributor

I think the only issue is with pokestops as we are not inserting them continously as we do with everything else, we would need to change the code to upsert everything instead of querying and only inserting the new ones. But I think it would be better to change the code to remove the missing ones instead of updating all and them removing the one that are not three anymore in a thread. That way we will avoid a lot of DB writes for no reason.

@tomballgithub
Copy link
Contributor

There is a tool out there called Worldpoole that queries the RM database to generate interesting reports using the data. I just noticed that this shows historical data from the database and so if we prune it after 7 days it would impact that tool and the people that use it. This PR has moved from just hiding gyms that are no longer in the game towards more database pruning. Need to ensure we consider all aspects of those decisions.

Here is a snippet from the trainer report:
image

@sebastienvercammen
Copy link
Member

sebastienvercammen commented Sep 22, 2017

@tomballgithub Solved with #2305. If someone doesn't want data to be cleaned, they shouldn't enable it.

Those who want more fine-grained control should look into setting up automated backups or MySQL triggers or a few options to control data removal (e.g. age of data).

This lets RM take care of its own needs (removing old data to reduce/limit performance impact and deadlocks) while the host has other options if they want to keep all old data.

@sebastienvercammen sebastienvercammen removed this from the 4.1.0 milestone Sep 30, 2017
@sebastienvercammen sebastienvercammen added this to the 4.1.1 milestone Sep 30, 2017
@KartulUdus
Copy link
Contributor

my problem with this:

Users have asked me to have a static map of the gyms available in the country.
I did a hex scan to find them and deleted the trainers|pokemon inside to have static points showing unassigned gyms.

would it be a horrible idea to have some kind of config switch to not delete old gyms if user opts for it?

@pogo-excalibur
Copy link
Contributor

pogo-excalibur commented Oct 10, 2017

Can a static map not be e.g. an image file?

@fosJoddie
Copy link
Contributor

I also have a much larger, static area with stops and gyms. It's about 2 counties, 50 000 square kilometers. A simple static picture wont really do it justice. Not to mention people who wanna geek out and see gym names.

I accept that when pokestops go become gyms, a cleaner needs to take care and remove such things (same when things get removed). But I'm scared to loose my "static" info that I scanned once and forgot.

@tomballgithub
Copy link
Contributor

Some of these last comments are interesting. Maybe the solution would be to have configurable timeframes for the pruning of each item (although I realize there is opposition to having more configuration options)

@friscoMad
Copy link
Contributor

friscoMad commented Oct 15, 2017

I don't see the issue here if you want static maps just create another DB with them and do whatever you want with them.
There is no reason reason to mix static and dynamic data in the same db, just enable clean in the dynamic instance and we are all fine.

If someone wants to do things like update gyms/stops once a months and then disable gym scanning/stop scanning during regular work while using the same db for everything, then just the solution is to not enable db-clean and do the cleanup by hand, trigger, cron job or whatever, he should be able to do so.

Adding a lot of new config options just for fringe use cases makes no sense.

Copy link
Contributor

@neskk neskk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be working as intended ;)

@fosJoddie
Copy link
Contributor

@friscoMad I don't agree with you that making a separate RM web instance and database to have static pokestop and gym info is trivial. ("just make a copy of the db").

That said, I can agree with you that there's maybe little to no practical use for having a bunch of more options just for some fringe use case like myself and @KartulUdus are discussing.

I am giving this PR the thumbs up, after a rebase.

@sebastienvercammen
Copy link
Member

This PR is being closed in favor of #2383 but the discussion is still relevant. #2383 adds a few options for better control of data removal (including an option to disable certain cleaners).

sebastienvercammen pushed a commit that referenced this pull request Mar 4, 2018
* add removal of gym and gym details

* Improved cleanup of old HashKeys.

* Fixes to status page (outdated worker status)
professorcotter added a commit to professorcotter/RocketMap that referenced this pull request Mar 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only query Gyms that have been scanned recently (and are still active).
10 participants