You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Imho, sending it online and back seems wholy unnecesary.
Would it be possible to integrate something like this?
It's a script, needs axios, jsdom and @mozilla/readability npm modules as dependencies, takes site url as argument. Spits out extracted html.
const { Readability } = require('./node_modules/@mozilla/readability');
const { JSDOM } = require('./node_modules/jsdom');
// Check if a URL is provided as a command line argument
if (process.argv.length < 3) {
console.error('Please provide a URL as a command line argument.');
process.exit(1);
}
const url = process.argv[2];
// Using dynamic import to import axios
(async () => {
try {
// Dynamically import axios
const { default: axios } = await import('axios');
// Fetch HTML content from the given URL
const response = await axios.get(url);
// Create a JSDOM instance with the fetched HTML content
const doc = new JSDOM(response.data, { url: url });
// Use Readability to parse the document and extract the article content
const reader = new Readability(doc.window.document);
const article = reader.parse();
// Print the article content
console.log('Title:', article.title);
console.log('Content:', article.content);
} catch (error) {
console.error('Error fetching the URL:', error.message);
}
})();
and load it on article clicked instead of trying to extract all urls unnecessarily? There's already node based adblock implemented, from what I've seen. Parsing everything would be fine too, would it be enough to put it into post-processing just?
The text was updated successfully, but these errors were encountered:
This is something else, it's not a "reader mode", it extracts article content, even if rss only contains a headline or part of the article, without opening the whole page.
I use rss mainly to avoid opening the full webpage.
Brief description of the feature request
This is a followup to #399. Since this script stopped working https://github.com/martinrotter/rssguard/blob/master/resources/scripts/scrapers/scrape-full-articles.py [uses site to extract which is 404] I was experimenting with different solution.
Imho, sending it online and back seems wholy unnecesary.
Would it be possible to integrate something like this?
It's a script, needs axios, jsdom and @mozilla/readability npm modules as dependencies, takes site url as argument. Spits out extracted html.
and load it on article clicked instead of trying to extract all urls unnecessarily? There's already node based adblock implemented, from what I've seen. Parsing everything would be fine too, would it be enough to put it into post-processing just?
The text was updated successfully, but these errors were encountered: