Skip to content

Scraping Hemnet.se website using scrapy and storing it to the MongoDB database.

Notifications You must be signed in to change notification settings

skaty5678/hemnet_mongo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Hemnet Spider

Description

The Hemnet Spider is a web crawler built using Scrapy, a Python-based web scraping framework. The spider crawls the hemnet.se website to extract information about villas for sale in Sweden. The extracted information includes the street name, price, and various attributes such as number of rooms, living area, and plot size. The extracted information is then saved to a Mongo database named 'hemnet' and to the collection 'properties'.

Usage

To use the Hemnet Spider, follow these steps:

  1. Install Scrapy by running the following command in your terminal: pip install scrapy.

  2. Clone this repository to your local machine: git clone https://github.com/skaty5678/hemnet_scrapy.git.

  3. Navigate to the hemnet directory: cd hemnet.

  4. Run the spider using the following command: scrapy crawl hemnet.

  5. Wait for the spider to finish crawling. The scraped data will be saved to a results.json file in the hemnet-spider directory.

Customization

To customize the spider to crawl a different location or type of property, modify the start_urls variable in the HemnetSpider class. For example, to crawl villas for sale in a different location, change the location_ids parameter in the URL:

start_urls = ['https://www.hemnet.se/bostader?location_ids%5B%5D=123456&item_types%5B%5D=villa']

To crawl a different type of property, change the item_types parameter in the URL:

start_urls = ['https://www.hemnet.se/bostader?location_ids%5B%5D=474361&item_types%5B%5D=radhus']

About

Scraping Hemnet.se website using scrapy and storing it to the MongoDB database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages