Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to scrape using html files if the site did not declare any "class" #74

Open
jhnferraris opened this issue Sep 1, 2017 · 0 comments

Comments

@jhnferraris
Copy link

Hello,

I'm trying to review on my javascript skills here and would like to try out this neat scraper. I have this static website here: http://www.phivolcs.dost.gov.ph/html/update_SOEPD/EQLatest.html, I'm trying to scrape off the 2017 table.

Comparing to HackerNews website, my target site doesn't have any css classes to target which texts to scrape.

Example:
screen shot 2017-09-01 at 3 36 49 pm

For starters I tried to do this this way,

var scraperjs = require('scraperjs');

router.get('/bulletin', function(request, response, next){
    scraperjs.StaticScraper.create('http://www.phivolcs.dost.gov.ph/html/update_SOEPD/EQLatest.html')
        .scrape(function($) {
            // This is similar to an inspector on a scrapinghub service.
            return $("html > body > div > table > tbody > tr > td").map(function() {
                console.log($(this));
                return $(this).text();
            }).get();
        })
        .then(function(news) {
            response.send(news);
        })
});

But I can't get any data from the static page. How do can I achieve this?

Thanks for the assist!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant