Skip to content

This repository follow my progress throught the book "Web Scrapping with Python" 📖🐍

Notifications You must be signed in to change notification settings

GoranTopic/Web-Scrapping-with-Python

Repository files navigation

Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

bold italics

REGEX Cheatsheet

Character Example Definition
* ab Matches the previous character 0 or more times
+ a+b+ Matches the previous character 1 or more times
[ ] [a-z] Matches any character from a to z
[^ ]] [a-z] Does not matches any character from a to z
() (ab) A grouped subexpression, this are executed first
| (foo|foot)s or Matches one of the other expression
{m,n} a{2,3} Matches the preceding character, m to n
. b.d Matches any charater
^ ^a Indicates an expression at the begining of the sting
\ ^ An escape charater
$ [A-Z]*$ Often at the of the expression it matches the end of the string
?! ^((?![A-Z]).)*$ Does not contain seomthing?? expand
? (swimming )? pool makes the previous expression optional
?? (swimming )? pool lazy
(?=) A(?=B) look ahead Matches an A followed by a B: AB, ABC,
(?!) A(?!B) look ahead negatice find a expression A where B *does not * follows
(?<=) (?<=B)A look behind Find Expresion A where B preceds it
(?<!) (?<!B)A look behind negatice find expression A where expression B does not precced
(?>) (?>foo|foot)s atomic groups a groupe which trows away altenative patterns if the first alternative does not match

###BeautifulSoup4

It is a Python libraby used for scrapping websites

It probably might have to be installed. I used pip-3.6 install beautifulsoup4

The beautifulSoup librabry creates a data structure out of the html document, enabiling the user to maniputale HTML tags a data objs. This is very useful if one is looking traverse links.

One can create a beautifulSoup object by passing the the html document and a parser.

soup = BaautifulSoup(html_doc, 'html_parser')

one can see the html page with:

print(soup.prettify())

About

This repository follow my progress throught the book "Web Scrapping with Python" 📖🐍

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published