Web Scraping Project (domain.com.au)

Welcome to the Domain.com.au (Web Scraping Project) repository! This project demonstrates web scraping techniques to extract data from the domain.com.au website, a platform for real estate listings in Australia. The scraped data can be used for various purposes, such as analysis, research, or data visualization, within the bounds of the website's terms of use.

Features

Utilizes Node.js and libraries like Crawlee and npm packages such as puppeteer-extra, puppeteer-extra-plugin-stealth, csv-writer.
Scrapes property listings, prices, details, and other information, take a look at csv-headers files for more scraped information.

Run Locally

Follow these steps to set up and run the scraping script locally:

Clone this repository: git clone https://github.com/anujbarochia/crawlee-puppeteer-domain.com.au
Navigate to the project directory: cd crawlee-puppeteer-domain.com.au
Install the required Node.js packages: npm install
Create .env file in your project directory, you will need to enter 3 variables that are being used in this project (if you wish you can change the way program uses it & apply your own logic)

PROXY =""
MANUAL_CAPTCHA = "0"
CRAWLEE_CHROME_EXECUTABLE_PATH = ""

PROXY => This variable takes takes a url from where you are accessing your pool of new proxy address when the website blocks your IP, crawlee will automatically take new proxy address from this pool
MANUAL_CAPTCHA => Refer to index.js and have a look at the code for better understanding on how it is being used.
CRAWLEE_CHROME_EXECUTABLE_PATH => By default our process would use chromium as a browser but here we are explicitly defining Chrome to be used as the browser, change the path address as per the location in your system, to find executable path type `chrome://version` in the search bar of chrome-in there you will be able to see a variable defined as Executable Path.

Run the scraping script: node index.js
Enjoy and feel free to make changes as per your requirement.

Contributing

Contributions and suggestions are encouraged! If you encounter issues, have enhancement ideas, or wish to contribute, feel free to open an issue or submit a pull request.

Warning

Please exercise caution and adhere to domain.com.au's terms of use and scraping guidelines when using this script.

Disclaimer: This project is intended for educational and personal use only. The author and contributors are not responsible for any misuse of the scraping code or the data collected.

Note: Web scraping activities should always be carried out responsibly, respecting the website's terms of use and applicable laws and regulations.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
routes		routes
.gitignore		.gitignore
README.md		README.md
csv-headers.js		csv-headers.js
data.csv		data.csv
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

routes

routes

.gitignore

.gitignore

README.md

README.md

csv-headers.js

csv-headers.js

data.csv

data.csv

index.js

index.js

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Web Scraping Project (domain.com.au)

Features

Run Locally

Contributing

Warning

About

Releases

Packages

Languages

anujbarochia/crawlee-puppeteer-domain.com.au

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project (domain.com.au)

Features

Run Locally

Contributing

Warning

About

Topics

Resources

Stars

Watchers

Forks

Languages