Skip to content
This repository has been archived by the owner on Nov 8, 2018. It is now read-only.

add timestamp for extracts #209

Open
kkowalsky opened this issue Aug 2, 2016 · 28 comments
Open

add timestamp for extracts #209

kkowalsky opened this issue Aug 2, 2016 · 28 comments

Comments

@kkowalsky
Copy link
Member

we should continue having our weekly extracts date on the website, I can anticipate some users wanting something more concrete than the vague "once a week"

@binx
Copy link
Contributor

binx commented Aug 3, 2016

@migurski where can we get this info?

@binx binx assigned migurski and unassigned binx Aug 3, 2016
@rmglennon
Copy link
Member

When we do this, also including the timestamp of the data itself should help make it clearer about whether to expect that changes you have made in OSM would be present in the download.

The old site only showed the date we created the files...which is more confusing if there was not a weekly planet file update released and yet the extracts had a new date.

@migurski
Copy link
Contributor

migurski commented Aug 3, 2016

I don't know that this information exists anywhere at the moment; would require info from @heffergm I think.

@migurski migurski removed their assignment Aug 3, 2016
@souperneon
Copy link
Member

@sleepylemur - I believe Grant is out. Are you able to help us find where this date would live in our system?

@sleepylemur
Copy link
Member

As Rhonda mentioned, we have the time of when the last batch of extracts was finished:
https://s3.amazonaws.com/metro-extracts.mapzen.com/LastUpdatedAt. It's not being done currently, but it's possible to have the planet timestamp logged as well. I'd prefer to let Grant take care of that once he gets back.

One complication is that while the extracts are being generated its hard to tell if a specific extract is this week or last week's version, but that's probably something we could just gloss over.

@souperneon
Copy link
Member

That's the reason the LastUpdatedAt is enough detail because not all extracts get regenerated every week but the ones that don't get verified for any new changes (as far as i understand)

@binx @migurski can we try to show this date for starters - https://s3.amazonaws.com/metro-extracts.mapzen.com/LastUpdatedAt

@migurski
Copy link
Contributor

migurski commented Aug 5, 2016

Looks good, is that a good canonical URL to use, or might it be available someplace better?

@sleepylemur
Copy link
Member

@migurski That url is a good one to use.

@migurski
Copy link
Contributor

When might we expect to see that URL updated?

@heffergm
Copy link
Contributor

heffergm commented Aug 22, 2016

That URL no longer exists now that we've switched to processing the fixed extract list as part of ODES. It's also largely irrelevant, given the fact that every object uploaded to S3 contains a timestamp. Is there a reason we're not just using those?

@migurski
Copy link
Contributor

Our users are curious about the freshness of the data, and many of them won't know how to interpret S3 timestamps. We'd like a way to reference the point when the data came from OSM.

@heffergm
Copy link
Contributor

heffergm commented Aug 22, 2016

That timestamp (LastUpdatedAt) never indicated when the data came from OSM. It was only intended to indicate when the data was last processed on our end.

With the current system, we process the cities.json extract list once a day, and the planet file that we use to cut the extracts is also updated daily. So generally, the extracts are cut from data that is ~24 hours old.

If there's now a requirement that we provide an OSM date relevant to the planet with each upload, I can look into doing something that will work with both types of jobs (odes and the bulk processed list).

@migurski
Copy link
Contributor

Do we create a new planet file from a regularly-updated database? They're normally weekly when pulled from planet.openstreetmap.org. If we get stuff every day and we know this, then we can just put a "fresh daily" message on the site. If there's a chance that it may be as old as week due to cyclical planet file updates, then we should do something more sophisticated.

@heffergm
Copy link
Contributor

heffergm commented Aug 22, 2016

In this implementation, the planet is downloaded on initial system setup (essentially from a local mirror) then updated to current with diffs (osmupdate) before being put into production. A cron job then runs daily to apply diffs to bring it up to date regularly.

@migurski
Copy link
Contributor

So, would you say it's safe for us to say "this data is refreshed from live OSM once daily" in all cases? That should be plenty of freshness message for our visitors. Exciting that we're doing it this frequently; it used to be weekly + weekly.

@heffergm
Copy link
Contributor

Well, we only used to cut extracts once a week, but the data was essentially as fresh as however long the extract run took, since we were pulling a planet and applying diffs as part of the process.

In any case, I think wording to the effect that the data used to create any given extract should be at most ~24 hours old is correct.

@heffergm
Copy link
Contributor

Coincidentally, I've discovered a bug related to planet updates, so we're a bit further out of date. Resolving now, and opened https://github.com/mapzen/operations-engineering/issues/361.

@migurski
Copy link
Contributor

K, I’m going to assign @binx on this issue, and it’s now just a front-end copy change.

@migurski migurski assigned binx and unassigned migurski Aug 22, 2016
@souperneon
Copy link
Member

Just to clarify @heffergm @migurski
The "popular" (pre-generated) extracts are also ~24 hours old? I understand the custom ones are.

@heffergm
Copy link
Contributor

Correct.

Il Lun 22 Ago 2016, 5:09 PM Ekta Daryanani notifications@github.com ha
scritto:

Just to clarify @heffergm https://github.com/heffergm @migurski
https://github.com/migurski
The "popular" (pre-generated) extracts are also ~24 hours old? I
understand the custom ones are.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#209 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAmb4aRUHyrmQlRQv-jQfKWQBxcAjf3cks5qig_xgaJpZM4JaMh4
.

  • Grant

@souperneon
Copy link
Member

fantastic! also, @heffergm Italian email?

@souperneon
Copy link
Member

How about "Fresh data daily!" Sounds like a news item or a baked goods store 😉

@louh
Copy link
Contributor

louh commented Aug 23, 2016

Day-old data! Half off!

@kkowalsky
Copy link
Member Author

Fresh data served daily, from server farm to data table...

@migurski
Copy link
Contributor

migurski commented Aug 23, 2016

That will go down in history as Ingrid’s Greatest Pun.

@souperneon
Copy link
Member

I was in the room when she came up with that ;) But @kkowalsky I like that. Can we use it @migurski?

@kkowalsky
Copy link
Member Author

@souperneon @migurski: the wording exists in the original Metro Extracts blog announcement and might have been in the old documentation...

@migurski
Copy link
Contributor

Yes we totally should use it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants