Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta tags below <item> tags missing #262

Open
Kezzo opened this issue Jul 12, 2018 · 3 comments
Open

Meta tags below <item> tags missing #262

Kezzo opened this issue Jul 12, 2018 · 3 comments
Assignees
Labels

Comments

@Kezzo
Copy link

Kezzo commented Jul 12, 2018

FeedParser version: 2.2.9
Node version: 8.10
Problematic feed link: https://feed.pippa.io/public/shows/clublife-by-tiesto

The above link feed includes meta tag below all item tags (shortened):

<?xml-stylesheet type="text/xsl" href="//feed.pippa.io/feed/rss.xslt" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <ttl>60</ttl>
        <generator>pippa.io</generator>
        <title>
            <![CDATA[CLUBLIFE]]>
        </title>
        <link>http://www.tiesto.com</link>
        <atom:link href="//feed.pippa.io/public/shows/clublife-by-tiesto" rel="self" type="application/rss+xml"/>
        <language>en</language>
        <copyright>Tiësto</copyright>
        <itunes:keywords>
            <![CDATA[tiesto, clublife, dance, edm, dj]]>
        </itunes:keywords>
        <itunes:author>Tiësto</itunes:author>
        ...
        <item>
           ...
        </item>
        <itunes:category text="Music"/>
    </channel>
</rss>

This is how we parse the feed:

const GetFeed = async function(feedUrl) {
  return new Promise((resolve, reject) => {
    const feedParserInstance = new feedparser();

    feedParserInstance.on('error', function(error) {
      return reject(error);
    });

    const feed = {
      meta: null,
      episodes: []
    };

    feedParserInstance.on('readable', function() {
      const context = this; // `this` is `feedparser`, which is a stream
      const episode = context.read();

      if (episode != null) {
        feed.episodes.push(episode);
      } else {
        feed.meta = this.meta;
        context.destroy();
        return resolve(feed);
      }
    });

    request.get(feedUrl)
    .on('error', (error) => {
      return reject(error);
    })
    .pipe(feedParserInstance);
  });
};

I expected that all tags which are not the tag or are in an tag are included in meta object. This seems to work for all tags above the tags, but doesn't seem to work when a non-item tag is stored below the tags. Because of this the non-item tag (in this case <itunes:category text="Music"/>) can't be retrieved in any way I'm aware of.

@danmactough danmactough self-assigned this Jul 12, 2018
@danmactough
Copy link
Owner

Thanks for reporting this @Kezzo. It may be a day or so before I can investigate this.

I would expect that when the parsing is done (on the done event), this.meta should contain any of the trailing metadata about the channel.

@danmactough
Copy link
Owner

Here's the only solution I think we can provide:
https://github.com/danmactough/node-feedparser/tree/parse-trailing-meta

The main deficiency with this ☝️solution (and it's really not solvable without destroying one of the main benefits of having a streaming interface) is that if you are handling the items in a streaming fashion, then you will have already processed all the items before than trailing metadata gets processed. So, you'll miss it unless you do some additional processing after the done event.

A related deficiency is that the meta that gets emitted could mutate on after you've handled it.

Another related deficiency is that if channel metadata is interspersed in between items and you and handling items in a streaming fashion, the items would have different meta values as you handle them.

But the good news is that with the way you're handling the feed (buffering all the items until you've finished parsing all of them), the meta values should be exactly as you expect.

@Kezzo
Copy link
Author

Kezzo commented Jul 13, 2018

Hey @danmactough , thanks a lot for looking into this problem!
I agree with your assumption that the meta data should be considered complete until either all items are parsed or the done event was invoked.

I looked at the solution you created and think this is a valid fix for the problem and will fulfill my assumption.

The deficiencies you mentioned could indeed be considered problematic. Possible solutions are:

  1. Only provide the meta information after all items were parsed or the done event was invoked to never allow access to potentially incomplete meta data. This is obviously detrimental to the streaming-behaviour and usage of the package.
  2. Explicitly document the possiblity of the mutation of the meta object so users are aware of it and can handle it more carefully. This is dangerous since many people might not read that part and create implementations based on false assumptions.
  3. Add a trailing meta data object that includes all meta fields parsed after the first item. With it users can consciously opt-in to handle additional meta data found during the item parsing, but the original meta field will not be changed the moment the first item field was found. This would also reduce issue with users updating to a newer version, since it's basically a pure add-on and will not change the existing functionality. Additonally a function can be made accessible that returns the combined current meta data from that point in the stream.

Do you think this is something that should be merged into master when a good solution was found, or would you like to keep it in a branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants