Skip to content
/ href Public

find all the URL of a site with a specific Regex

License

Notifications You must be signed in to change notification settings

shabane/href

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

href

this program will find all the link with a spesfic Regex pattern from a site.

what it will do

in any site there are a lots of url that may you need the file behind them, this program will find all the <a> tag, then list the href of the tags. you can use Regex to find the special link(s)

all the finded url's have some special charater's, so the Regex pattern will try to match with all finded url, if match, the url will return. if not match, try for next url in the list.

if you do not write any pattern the program will print all link of site, defualt pattern is: .*

and the last thing is that the program is case-insensitive

how to use

usage:

    --url 'the url of site'

    --pattern 'Regex pattern'

    --load-headers` path header file

    href.py --url 'URL' --pattern 'RegegPattern' --load-headers ./headers

example:

    href.py --url 'https://guitarmusic.ir/hayedeh-songs/' --pattern '.*mp3.*'

example:

href

note:

  1. all the switch have a small way to use

--help: -h

--url: -u

--pattern: -p

--load-headers: -l

  1. use pipe

to use the program some time you need to pipe or redirect the result

some site repeated their link to preview a video or music before download them, so you can pipe the result to uniq command for prevent link duplicate.

and for having the link in a text file, you should redirect the result to a file. href.py -u "URL" -p "patternt" > links.txt

  1. run easy

to run the program with out cd to the source dir or wite the full path each time, you can link it to your ~/<user>/.local/bin/href do it by this command: ln -s href.py ~/.local/bin/href

and do not forget to make it executable

  1. headers

if you got 4xx or 5xx http status code try to use a http header, or use the default header in the directory by this switch --load-header [file]

Releases

No releases published

Packages

No packages published

Languages