Skip to content

404notfound-3/ig-profile-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ig-profie-scraper

made-with-python GitHub license GitHub stars CodeFactor GitHub last commit GitHub issues welcome GitHub repo size

ig-profile-scraper

Fetch and save real time data anonymously from any Instagram profile without using official API.

Table of Content

  1. Prerequisites
  2. Installation
  3. Features
  4. License

Prerequisites

Before you continue, ensure you have met the following requirements.

  1. You are using a Linux or Windows OS Machine.
  2. You have installed latest version of Python, Firefox and Geckodriver.
  3. You have installed and running latest version of Tor listening on SOCKSPort 9050.
  4. You have installed xvfb (only for linux).

Installation and Setup

You can get step by step detailed Installation steps here for both windows and linux.

  • Git clone or Download this project and run below command in project directory.

    pip install -r requirements.txt
    
  • Open up config.py in your favourite text editor and

    • Replace timezone according to your country or state.

      TIMEZONE = timezone("Asia/Kolkata")
      
    • Add your temporary insta ids in ids dictonary.

      ids = {
          "<USERNAME_OR_EMAIL_HERE>" : "<PASSWORD_HERE>",
          "<USERNAME_OR_EMAIL_HERE>" : "<PASSWORD_HERE>"
      }
      
    • Add usernames of profiles which you want to scrape in the list of usernames.

      usernames = ["<USERNAME1>", "<USERNAME2>"]
      
    • Add your Slack webhook URL to get notified about errors and exceptions while running this scraper.

      slack = Slack(url = "<<ADD_YOUR_SLACK_WEBHOOK_URL_HERE>>")
      

Congratulations! you are ready to go, now run scraper.py . Ping me if you ever face any kind of error.

Features

  1. Profile Scraping

    • Full Name and Biography (Both encoded with utf-8)
    • Followers and Following
    • Number of public posts and owned media
    • Is user's account private, business, verified, has channel, joined recently
    • Profile page ID
    • Conneced FB page
    • Externel URL
  2. Save data to an unique csv file in output folder.

  3. Check for existing csv file and will create a new file if old one dosen't exist.

  4. Random sleep time (to create a little randomness).

  5. Autologin and auto logout (to switch ids after every 8 hours).

  6. Automatic browser screenshots in ss_log/browser folder.

  7. Slack webhook Integration to get error notifications

  8. Tor connectivity and public ip check

License

Project License can be found here

MIT © Rahul Meena