Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't parse multiple elements of an <entry> with the same name #435

Open
wsanders opened this issue Mar 28, 2024 · 2 comments
Open

Can't parse multiple elements of an <entry> with the same name #435

wsanders opened this issue Mar 28, 2024 · 2 comments

Comments

@wsanders
Copy link

wsanders commented Mar 28, 2024

NOAA publishes an ATOM feed of their weather alerts (example):
noaa-sample-event.txt

"Encapsulated" in this feed is XML-reformatted data in a spec called CAP, based on JSON. The CAP entries in the Atom XML show up in a particular way, with colons in the tag:
<cap:event>Wind Advisory</cap:event>
At the entry level, feedparser handles these in a predictable way:
{'id': ....
....
'cap_event': 'Wind Advisory',
.... etc

However, the NOAA entries include multiple instances of a "cap:parameter" item:
<cap:parameter>
<valueName>somename</valueName>
<value>somevalue</value>
</cap:parameter>
<cap:parameter>
<valueName>someothername</valueName>
<value>someothervalue</value>
</cap:parameter>

Feedburner's JSON only includes the last cap:parameter's valueName and value in the list, followed by a null cap_parameter:
{'id':........
'valuename': 'eventEndingTime',
'value': '2024-03-30T12:00:00+00:00',
'cap_geocode': '',
'cap_parameter': ''}

I don't know much about ATOM, so I don't know if this is a real issue of if the NOAA ATOM is nonstandard in some way.

I would expect output in something like the CAP format JSON:
"parameters": {
"somename": [
"somevalue"
],
"someothername": [
"someothervalue", "possibly a list etc",
],
etc

The workaround is to extract the URL of the CAP data that is part of the ATOM feed, from that you get useful JSON, but it's not in ATOM format.

@pishposhmcgee
Copy link

pishposhmcgee commented May 22, 2024

I am also seeing this issue for use with transcripts in podcast feeds. The Podcastindex specification allows for multiple of these entries for different formats of transcript. Feedparser seems to load each entry and overwrite any existing, with the effect being that the last entry is what is presented.

@pishposhmcgee
Copy link

pishposhmcgee commented May 22, 2024

With more searching this also seems to be related to #297 and #301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants