Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an angled bracket in title #165

Open
piptan opened this issue Apr 1, 2016 · 2 comments
Open

an angled bracket in title #165

piptan opened this issue Apr 1, 2016 · 2 comments
Assignees
Labels

Comments

@piptan
Copy link

piptan commented Apr 1, 2016

Hi,

If I put the following feed into the library -

`
<rss version="2.0">

<title>W3Schools Home Page</title> http://www.w3schools.com Free web building tutorials <title>RSS <<<Tutorial>>></title> http://www.w3schools.com/xml/xml_rss.asp New RSS tutorial on W3Schools `

The parsed output is -

{
  title: 'RSS >>',
  description: 'New RSS tutorial on W3Schools',
  summary: 'New RSS tutorial on W3Schools',
  date: null,
  pubdate: null,
  pubDate: null,
  link: 'http://www.w3schools.com/xml/xml_rss.asp',
  guid: 'http://www.w3schools.com/xml/xml_rss.asp',
  author: null,
  comments: null,
  origlink: null,
  image: {},
  source: {},
  categories: [],
  enclosures: [],
  'rss:@': {},
  'rss:title': { '@': {}, '#': 'RSS <<<Tutorial>>>' },
  'rss:link': { '@': {}, '#': 'http://www.w3schools.com/xml/xml_rss.asp' },
  'rss:description': { '@': {}, '#': 'New RSS tutorial on W3Schools' },
}

Please note how title contains the incorrect text, but rss:title has the right content.

@danmactough danmactough added the bug label Apr 1, 2016
@danmactough danmactough self-assigned this Dec 9, 2017
danmactough added a commit that referenced this issue Dec 11, 2017
Added option `strip_html` to restore old behavior.

Resolves #165, #243
danmactough added a commit that referenced this issue Jul 15, 2018
Added option `strip_html` to restore old behavior.

Resolves #165, #243
danmactough added a commit that referenced this issue Jul 15, 2018
Added option `strip_html` to restore old behavior.

Resolves #165, #243
@theasteve
Copy link

@danmactough is there a option to pass when calling feedparser to remove '{ '@': {}, '#': value} and just get the value?
So instead of 'rss:link': { '@': {}, '#': 'http://www.w3schools.com/xml/xml_rss.asp' } to get 'rss:link: 'http://www.w3schools.com/xml/xml_rss.asp'?

@danmactough
Copy link
Owner

@theasteve 'rss:link' is a "raw" element, meaning it isn't normalized and retains all the information in the original XML. As a result, we need to retain both the attributes (the @) and the text node (the #).

But generally, the item's link property will have the value you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants