GitHub - luminati-io/proxy-csharp

This project demonstrates setting up proxy servers in C# for web scraping with Visual Studio and .NET 7, leveraging HtmlAgilityPack for HTML parsing. Proxies protect your digital identity during web scraping by using their IP address, circumventing IP bans and geoblocking.

Prerequisites

Configuring a Local Proxy

Download and install mitmproxy.
Launch mitmproxy with the mitmproxy command.

Web Scraping Setup

ProxyHttpClient - is designed to configure an HttpClient instance to route requests via a specified proxy server.

ProxyRotator - manages a list of proxies, providing a method to select a proxy for each web request randomly. This randomization is key to reducing the risk of detection and potential IP bans during web scraping.

When the isLocal is set to True, it takes the local proxy from mitmproxy. If it’s set to False, it takes the public IPs of the proxies.

ProxyChecker - is used to validate the list of proxy servers. When you use the GetWorkingProxies method with a list of proxy URLs, it checks each proxy’s status asynchronously via the CheckProxy method, collecting operational proxies in a workingProxies list. Within CheckProxy, you establish an HttpClient with the proxy URL, make a test request to http://www.google.com, and record progress safely using a semaphore.

The IsProxyWorking method confirms the proxy’s functionality by examining the response status code, returning true for operational proxies. This class aids in identifying working proxies from a given list.

WebScraper - encapsulates web scraping functionality. When you call the ScrapeData method, you provide it with a ProxyRotator instance and a target URL. Inside this method, you use an HttpClient to make an asynchronous GET request to the URL, retrieve the HTML content, and parse it using the HtmlAgilityPack library. Then you employ XPath queries to locate and extract links and corresponding titles from specific HTML elements. If any article links are found, you print their titles and absolute URLs; otherwise, you print a message indicating no links were found.

Using Bright Data Proxy

Sign up for Bright Data and create a residential proxy.
Update appsettings.json file with your credentials in WebScrapeBrightdata project

Running Your Application

Use dotnet build and dotnet run -- --url https://www.wikipedia.org/ to compile and execute the application, and program will show extracted Wikipedia article titles and links.

Bright Data’s proxy services facilitates anonymous and efficient web scraping in C#, providing a scalable solution for bypassing IP bans. This tutorial offers a foundation for integrating proxy servers into your web scraping projects, following best practices to ensure data collection reliability.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
WebScrapApp		WebScrapApp
WebScrapBrightData		WebScrapBrightData
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
ProxyServerBrightData.sln		ProxyServerBrightData.sln
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebScrapApp

WebScrapApp

WebScrapBrightData

WebScrapBrightData

.DS_Store

.DS_Store

.gitattributes

.gitattributes

.gitignore

.gitignore

ProxyServerBrightData.sln

ProxyServerBrightData.sln

readme.md

readme.md

Repository files navigation

Prerequisites

Configuring a Local Proxy

Web Scraping Setup

Using Bright Data Proxy

Running Your Application

About

Releases

Packages

Languages

luminati-io/proxy-csharp

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Configuring a Local Proxy

Web Scraping Setup

Using Bright Data Proxy

Running Your Application

About

Resources

Stars

Watchers

Forks

Languages