-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feed schema: Add metadata #4
Comments
I think this is a good proposal. But I am thinking about the best way of implementation. First it is important that these information do not have the same scope:
General developer information (Name, email, URL) could be implemented as profile fields. The other developer information are specific per parser. These information could but need not be delivered by the feed. As a author of multiple parsers I prefer to include this into the feed. The parser version is a very technical meta data and should be included within the feed. As the parser has already the source URL, it is the best to put this information also in the feed. The canteen accessibility is tricky: another attribute in the developer admin interface and feed attribute are reasonable. If we work on meta information I would think about a way to provide the current canteen meta data (address, name ...) via the feed. Maybe not with direct overwrite of the old data - but at least a semi-automatically way and a notification to the developer if the information have changed. In addition I think we should start to add the parser as separate entity to openmensa (a canteen is provided by a parser). This make multiple openmensa workflows easier. Last we have to decided which information are displayed to the user. The parser version is for no interest for other users, or? The developer information should be displayed to the user, but we should have an approval from the user (especially for the name and email). |
I would also like to think about a push API to allow "parsers" pushing data to OpenMensa. On the long run this would free us from needing to implement fetch strategies and update pulls etc. "Parsers" could send data of any kind (meals, status information, meta information, etc.) using the e.g. public HTTP API. |
I am strongly against dropping the pull API. The pull approach make it very easy to write parsers (many logic is implemented by openmensa) and this should remain our goal. We could think about an optional push API - to push meta data or trigger a new data fetch, but at the moment I really see a real advantage/any parser how would use this. |
I'm not for dropping now (or soon) - just for thinking about adding a push API. Otherwise I agree with you about the information handling. I would additionally highlight the problem how to display "restricted" canteens on the page? Should they be included in listings for Apps by default? |
Your point "On the long run this would free us from needing to implement fetch strategies and update pulls etc" is only correct, if we drop the PULL API. And I am against this, even on the long run. |
Then just take it as never told. Do we want to make separate issues for the different fields? |
I'm generally in favor of accumulating more meta data esp. when it's useful to the users in case of more direct error reporting to the developer etc. (A push API sounds nice btw.) |
I have thought the last couple of days about the best implementation. My proposal is as follows: We extend the data model: separate parser information from canteen information. Currently I do not think that we need a separate table for parser itself (for Potsdam, not a specific instance). But I keep this in mind. So we can create canteens for canteens we users ask for a parser (for state ask or something like that). And later we can support multiple parsers per canteen (like fallback/alternative parser). Add meta data url for parsers: Add states for many tables: What to do with the parser version: That are my thoughts for now. Any questions or other proposals @jgraichen, @kaifabian? Kai you implemented a basic address extracting in you parser, or? |
In any case we do need to extend the feed schema (v2): Kai and I discussed how to extend the feed and propose the following. Example for Potsdam Griebnitzsee: <openmensa>
<canteen>
<name>Mensa Griebnitzsee</name>
<address>August-Bebel-Str. 89, 14482 Potsdam</address>
<city>Potsdam</city>
<contact type="phone">(0331) 977 3749/3748</contact>
<location latitude="52.3935353446923" longitude="13.1278145313263" />
<accessibility>privileged</accessibility>
<feed name="today">
<!-- cron like schedule information -->
<schedule dayOfMonth="*" dayOfWeek="*" hour="8-14" retry="30m 1" />
<url>http://kaifabian.de/om/potsdam/griebnitzsee.xml?today</url>
<source>http://www.studentenwerk-potsdam.de/mensa-griebnitzsee.html</source>
</feed>
<feed name="full">
<schedule dayOfMonth="*" dayOfWeek="1" hour="8" retry="1h 5 1d" />
<url>http://kaifabian.de/om/potsdam/griebnitzsee.xml</url>
<source>http://www.studentenwerk-potsdam.de/speiseplan/</source>
</feed>
<!-- day attributes -->
</canteen>
<openmensa> Example for Ulf: <openmensa>
<version>93.3</version>
<canteen>
<name>Ulf's Café (HPI Cafeteria)</name>
<address>Prof.-Dr.-Helmert-Str. 2-3, 14482 Potsdam</address>
<contact type="phone">(0331) 5509-380</contact>
<city>Potsdam</city>
<location latitude="52.3932931010875" longitude="13.131183385849" />
<accessibility>public</accessibility>
<!-- day attributes -->
</canteen>
<openmensa> What do you think about it? |
@mswart do you plan to have any requirements regarding esp. formatting on the meta-info delivered in feeds like address or phone number? Or is it completely free text? There are dozen ways to format phone numbers by humans but only one I would like to accept (also prepend 0049 for germany yes/no?)... |
I would recommend enforcing E.123 maybe even limited to international format only. |
I'm not sure about some attribute vs tag usage. Why I'm also unsure about the wording of Personally I do not like the attribute scheme for |
If we want to support canteens in other countries it may also be good to have |
Do we want to make this an extension to v2 or call this v3 when it's done? We have to deal with missing data, any ideas? |
format restrictions: We have thought about restrictions on address and contact but what is the point? I mean this data are only displayed to the user. If require ensure a specific format, all parser developers are required to parse, reformat the telephone number from whatsoever format the canteen is using. I prefer rather to have telephone number in some (maybe only human readable) format than no telephone number at all. Add the address field has currently also no restriction. So I would not ensure a restriction but recommend a / some formats in the documentation. Yes, accessibility: I am free for a different word for accessibility - but I have implement something like this, I would support at least 3 different states: restricted, public and privileged. And I would prefer to have all meta data as attributes within the canteen not some as attributes for the canteen tag. country tag: At some type we should probably add a country flag, but I am not sure whether it is needed now: I mean the website is only in German. feed name: The name identifier is on the one hand a some description for the developer, but more important it is an identifier for better merging new feed data into the current database. The idea was to allow the developer to define how many feeds he provides. Maybe a today feed hourly, the current week daily and the future only once a week. Main point is that a crash parsing a later day/week should not influence the parsing of the current data. schedule-tag: For the schedule: I prefer to have a XML representation that is easy understandable as human by simple reading it and the cron format is not that intuitive. In addition we thought about not supporting the minute flag. Can you give an example what do you mean with the "ISO 8601 time or period formats". The retry attributes lists a time interval (many only in seconds without a suffix) and a retry limit. So you can say: retry 5 times in 5 hour intervals but afterwards only daily. v2 or v3: All changes are extensions so no need to create a V3 version and the developers would only be a little more confused. I do not see any real problem with missing data. I mean we have to convert the current parsers but that's all. If we get now metadata, we do not change anything. No problem. |
On 2015-03-28 12:42, Malte Swart wrote:
Still the question why not just
Can you elaborate what "restricted, public and privileged" means?
So, the main point is that the attribute text itself is only for the
I personally cannot say I understand the attribute above "intuitive" and
The ISO 8601 specifies format not only for date and times but also for [1] https://en.wikipedia.org/wiki/ISO_8601#Durations |
On Saturday 28 March 2015 05:06:45 Jan Graichen wrote:
Because Kai and I used both the contact version. But I have no problem with
restricted: only for limited group of people
It is in no way a merge strategy. It is an identifier. To much feed tags from Of course is it wise to choose descriptive names (from the developer point of
Of course there is no need. But we name the canteen tag also canteen and not c
Of course I love to use an standard, but only if it is applicable. The Therefore I asked for an example! Because I can not see how to use the stand |
For example, "repeat 5 times at every hour starting 8:00 UTC" could be coded as
I'm not sure if I understood the What's the meaning of
The attribute So it's like Different thing. How is scheduling interpreted? As "will run not before given time", but maybe after time passed? |
format restrictions: allowing a parser developer to simply output the raw contact information is a good point, didn't thought of that. My general attitude was to have as clear metadata as possible since I imagined that a parser developer would have to extract and hardcode that information manually... @mswart is it plausible that this extraction can be automated given that, as you argued, e.g. the telephone number format is so unpredictable? accessibility: I wonder whether it is actually useful to have the "privileged" state, since that information is somehow already present in the price information (only on a per-meal base, ofc). And yes, better name for this feature is necessary... schedule-tag: I agree that cron is standard-alike enough to be used, and a more verbose structure is always nice since it makes it easier to apply XML Schema constraints right in the definition (like a plausible hour range or '*', in addition to documentation), a combined format string like ISO 8601 is hard to decode at this level. I'm in favor to more verbosity since the overhead is negligible. Also I know cron but don't know ISO 8601 details (yet). v2 or v3: Right, so we add it v2. |
@cmur2 All these meta data are optional. You can still edit them directly for the canteen online. So the only point to serve them with the meta data feed is if you can extract them automatically. @jgraichen I get your idea. But I still do not see how to express schedule run that differ from day to day (e.g. only on Mondays ...). If we would need to add an additional dayOfWeek attribute, I prefer to use the cronlike syntax directly.
The idea is that the feed is retry first hourly. After 5 unsuccessful retries, OpenMensa should only retry once a day, as it is likely a permanent error. It is no retry limit passed, so retry until the next regular fetch time. I think two interpretation are reasonable, your one: fetch on 13:00 the next days (waiting one day after the last unsuccessful try from the "1h 5" interval). It would also be possible to wait until the next complete 1d duration from the original start time. So fetch on 8:00 on the following days. Which one to choose, is not so important. Also I like the second one in this case, I think the first one is more intuitive and also easier to implement. So I would go with this one. fetch times: Yes, the idea is that the specified times are the earliest time. So OpenMensa tries to fetch directly at this time, but e.g. depending on the other fetch tasks, I could be later. This is more ore less like today, OpenMensa fetches at full hour all required canteens, so depending on the work load a canteen could first at 8:25 instead of 8:00. |
I created a PR (openmensa/doc.openmensa.org#9) for the required feed changes. Please check whether it matches our discussed version. |
As discussed in #4 restructure the parser and feed model. This is also a preparation to support metadata extracted from feeds.
I think we missed one important meta data: opening times! Its a bit tricky because we have a few question to answer. First question is: how do we want to tread canteens with lunch menu and diner menu? Currently they are two canteens. At to moment at recommend to keep it this way. So it is now problem to favorite only the lunch menu but not the diner menu. Second question: are opening times normal meta data that are specified central per canteen and/or can they be specified per date within the normal feed? Third question: do we specify opening times, menu times or both? At least the general opening times should be central/meta data - e.g.: <openingTime monday="11-14" tuesday="11-14" wednesday="11-14" thursday="11-14" friday="11-14" saturday="11-13" sunday="" />
<!-- or -->
<times type="opening">
<weekday name="monday">11-14</weekday>
<weekday name="tuesday">11-14</weekday>
<weekday name="wednesday">11-14</weekday>
<weekday name="thursday">11-14</weekday>
<weekday name="friday">11-14</weekday>
<weekday name="saturday">11-13</weekday>
<weekday name="sunday"></weekday>
</times> With the @kaifabian @jgraichen opinions? I am very unsure what is the best way / or whether to postpone the or some of the decisions (e.g. for now not opening time override). |
Concerning the opening times, I would recommend a XML element such as <open>8-14</open> as a sub-element of a day. Surely, this makes feeds contain even more redundant information - but this would also allow feed providers to specify deviations from the usual opening schedule. The user is most likely interested in exactly that information (not: when is the canteen opened usually, but in particular at a given date). Another point in favor of this proposal is, that this would fit the style we already set with the <closed /> element. |
I like both ideas. Having a global I cannot say how hard it would be for parsers and developers but it looks good. |
Maybe I'm lacking imagination but I don't think that special opening times are so common. I would stick with the global declaration. |
As discussed in #4 restructure the parser and feed model. This is also a preparation to support metadata extracted from feeds.
The current state of implementation is as follows:
|
* Rename error_report to feedback * Extend developer information by public name, email, info url * Add maintainer wanted flag to parser * New parser info box with developer information and if wanted maintainer request
@cyroxx All proposed meta data are implemented/standardized within the feed v2.1 (availability, source url, parser version, information about the developer), but currently only the information about the developer are displayed. |
As a developer of a canteen parser, I would find it useful to include some (optional) metadata about the canteen and the parser itself.
The text was updated successfully, but these errors were encountered: