Scraply

Scraply, is a very simple html scraping tool, if you know css & jQuery then you can use it!, scraply should be simple and tiny as well it could be used as a component in a large system something like this use-case

Overview

you can use scraply within your stack via cli or http.

# here is the CLI usage

# extracting the title and the description from scraply github repo page
$ scraply extract \
    -u "https://github.com/alash3al/scraply" \
    -x title='$("title").text()' \
    -x description='$("meta[name=description]").attr("content")'

# same thing but with custom user agent
$ scraply extract \
    -u "https://github.com/alash3al/scraply" \
    -ua "OptionalCustomUserAgent"\
    -x title='$("title").text()' \
    -x description='$("meta[name=description]").attr("content")'

# same thing but with asking scraply to return the response body for debugging purposes
$ scraply extract \
    --return-body \
    -u "https://github.com/alash3al/scraply" \
    -x title='$("title").text()' \
    -x description='$("meta[name=description]").attr("content")'

for http usage, we will run the http server then using any http client to interact with it.

# running the http server
# by default it listens on address ":8010" which equals to "0.0.0.0:8010"
# for more information execute `$ scraply help`
$ scraply serve

# then in another shell let's execute the following curl 
$ curl http://localhost:8010/extract \
    -H "Content-Type: application/json" \
    -s \
    -d '{"url": "https://github.com/alash3al/scraply", "extractors": {"title": "$(\"title\").text()"}, "return_body": false, "user_agent": "CustomeUserAgent"}'

for debugging, there is shell

$ scraply shell -u https://github.com/alash3al/scraply
➜ (scraply) > $("title").text()
GitHub - alash3al/scraply: Scraply a simple dom scraper to fetch information from any html based website and convert that info to JSON APIs

➜ (scraply) > request.url
https://github.com/alash3al/scraply

➜ (scraply) > response.status_code
200

➜ (scraply) > response.url
https://github.com/alash3al/scraply

➜ (scraply) > response.body
<html>.....

Download ?

you can go to the releases page and pick the latest version. or you can $ docker run --rm -it ghcr.io/alash3al/scraply scraply help

Contribution ?

for sure you can contribute, how?

clone the repo
create your fix/feature branch
create a pull request

nothing else, enjoy!

About

I'm Mohamed Al Ashaal, a software engineer :)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
cmd		cmd
pkg/fetch		pkg/fetch
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd

cmd

pkg/fetch

pkg/fetch

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

go.mod

go.mod

go.sum

go.sum

Repository files navigation

Scraply

Overview

Download ?

Contribution ?

About

About

Releases 6

Packages 1

Languages

License

alash3al/scraply

Folders and files

Latest commit

History

Repository files navigation

Scraply

Overview

Download ?

Contribution ?

About

About

Topics

Resources

License

Stars

Watchers

Forks

Languages