Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Public IPv6 IPs Pool with major cloud providers like AWS, GCP, Azure, etc. #218

Open
RajatThukral-Draup opened this issue Jan 30, 2024 · 16 comments

Comments

@RajatThukral-Draup
Copy link

Hello,

Thank you for the fantastic work on this project.

Given that major cloud providers are now charging substantially for public IPv4 addresses, it would be highly beneficial to incorporate IPv6 Pool support into this project. This would entail modifications to the provider API SDK code and the sections where requests are actually initiated.

@RajatThukral-Draup
Copy link
Author

Adding references:
aws-blog
news

@fabienvauchelles, I would greatly appreciate your insights on this matter.

@fabienvauchelles
Copy link
Owner

Hi,
There are 2 features:

  • allowing IPv6
  • allow many outbound IP addresses on VM for cloud providers (which can support that).

Can you details the usecases ? Because most users need only IPv4.

@packet-sent
Copy link

Most IPv6 proxy providers won't give you a direct IPv6 endpoint, what they will do is use some sort of 6to4 method, you will connect to the proxy via IPv4 and get an outgoing IPv6 address from the provider.

So I don't see a need to add full IPv6 support yet, but I guess it can be done in the future as Cloud Services are moving away from free IPv4 addresses.

@RajatThukral-Draup
Copy link
Author

Hi, There are 2 features:

  • allowing IPv6
  • allow many outbound IP addresses on VM for cloud providers (which can support that).

Can you details the usecases ? Because most users need only IPv4.

To address the growing costs associated with public IPv4 IP addresses, as cloud providers have started charging for them, we are in the process of transitioning to IPv6 for our AWS cloud infrastructure. During local testing, we deactivated outbound IPv4 traffic on ports 80 and 443 for our Scrapoxy instances. However, we've encountered an issue where requests are halting during the TLS handshake phase. This problem arises despite configuring our proxy agent and master to utilize IPv6 addresses as hostnames. Below are the logs from a test run:

curl -v --proxy http://localhost:8888 --proxy-user **user***:******* https://www.google.com
*   Trying [::1]:8888...
* Connected to localhost (::1) port 8888
* CONNECT tunnel: HTTP/1.1 negotiated
* allocate connect buffer
* Proxy auth using Basic with user '******'
* Establish HTTP proxy tunnel to www.google.com:443
> CONNECT www.google.com:443 HTTP/1.1
> Host: www.google.com:443
> Proxy-Authorization: Basic ****
> User-Agent: curl/8.4.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
<
* CONNECT phase completed
* CONNECT tunnel established, response 200
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none

Curl command output showing attempt to connect through proxy to www.google.com, indicating a successful connection to the proxy but stalling at the TLS handshake phase.
Re-enabling outbound IPv4 traffic on the aforementioned ports resolves the issue, allowing requests to complete successfully.

We would appreciate any insights or suggestions on troubleshooting this issue further.

@fabienvauchelles
Copy link
Owner

fabienvauchelles commented Feb 8, 2024

Hi,
Can you detail your environment @RajatThukral-Draup ?

  • Scrapoxy version: X.X.X
  • Is it a custom version? Yes/No
  • Which deployment method do you use? (Docker/Docker Compose/ Kubernetes/NPM/Other)
  • Which is the OS for Scrapoxy ? (I assume Linux)
  • Which kind of storage do you use? (file?)

@RajatThukral-Draup
Copy link
Author

RajatThukral-Draup commented Feb 9, 2024

Hi, Can you detail your environment @RajatThukral-Draup ?

  • Scrapoxy version: X.X.X
  • Is it a custom version? Yes/No
  • Which deployment method do you use? (Docker/Docker Compose/ Kubernetes/NPM/Other)
  • Which is the OS for Scrapoxy ? (I assume Linux)
  • Which kind of storage do you use? (file?)

Please find the details below -

  1. Scrapoxy version:
    3.1.1
  2. Is it a custom version?
    We have updated the code on top of version 3.1.1 to use ipv6 address
  3. Which deployment method do you use? (Docker/Docker Compose/ Kubernetes/NPM/Other)
    Using pm2
  4. Which is the OS for Scrapoxy ?
    Linux

Here are the modifications we've implemented:

  • For the AWS EC2 module handling instance descriptions and registrations within the manager, we now additionally capture the IPV6 address from the AWS SDK API response and incorporate it into the instance details.
  • Within the Scrapoxy master, when initiating HTTP requests through proxy settings, we've updated the proxy configuration to utilize IPV6 addresses instead of IPV4 addresses.
  • We've made alterations to the alive-check mechanism as well as the ping heartbeat verification process, specifically adjusting them to employ the private IPV4 addresses of the Scrapoxy workers.

Following these adjustments, we've successfully enabled the use of IPV6 addresses for Scrapoxy workers during requests. Although IPV6 was verified to be used within the proxyOptions, requests directed to google.com defaulted to using the public IPV4 address of Google, despite the presence of IPV6 addresses for both the requesting and target entities.

Upon manually accessing one of our Scrapoxy workers, we observed that the traffic from the worker to the target persisted in routing through the IPV4 interface, even though the request from the master to the worker was confirmed to travel over the IPV6 interface. This observation was made after analyzing packet exchanges via TCP dump.

@fabienvauchelles We greatly appreciate your assistance with this issue, as we're struggling to find a solution. Addressing this is a top priority for us, especially since the cost of AWS IPV4 IPs is significantly impacting our budget.

@fabienvauchelles
Copy link
Owner

Hi @RajatThukral-Draup ,
Thanks for your answer.

As I understand that you made a lot of custom code on 3.1.1, I need to understand for a proper integration on the V4.

Can you share the custom code with me? (can be a private repository).

Thanks.

@RajatThukral-Draup
Copy link
Author

Hi @fabienvauchelles

Certainly! I've shared the link to our custom Scrapoxy code repository below.

An invitation link has also been sent to you. Here's the link: https://github.com/Draup/scrapoxy.

Incorporating this feature into Scrapoxy version 3.1.1 would be greatly beneficial for us, as upgrading to Scrapoxy version 4 is expected to require a considerable amount of time for us.

@RajatThukral-Draup
Copy link
Author

Hey @fabienvauchelles

Just wanted to check in and see if you got a chance to peek at the Scrapoxy code repo link I sent over. Here it is again just in case: https://github.com/Draup/scrapoxy.

We're really keen on getting that feature rolled into Scrapoxy v3.1.1 since jumping to v4 is a bit of a stretch for us right now.

Let me know your thoughts, or if there's anything you're wondering about it. Hope to catch up soon!

Thanks

@fabienvauchelles
Copy link
Owner

Hi @RajatThukral-Draup ,

I fetch the repository, thanks. I need to explore it now.

Let me sometime to explore it and understand how it can smartly integrated. I will have some further questions for you to correctly write the requirements.

@RajatThukral-Draup
Copy link
Author

Hey @fabienvauchelles

Awesome, glad to hear you've got the repo! Take all the time you need to dive into it. I'm here to help answer any questions or clarify anything that might help you in understanding how we can best integrate this feature.

Just hit me up whenever you're ready or need some info. Looking forward to your insights and the questions you'll have!

Thanks

@fabienvauchelles
Copy link
Owner

fabienvauchelles commented Feb 15, 2024

Hi @RajatThukral-Draup ,

I've reviewed the code, and it is an excellent work! I truly appreciate the enhancements you've made to version 3, particularly regarding spot instances, Prometheus integration, and the introduction of new metrics.

I have some initial inquiries:

  • How do you go about building the image and ensuring AWS employs IPv6? (I couldn't find any references to IPv6 during instance creation)
  • Have you made any updates to the proxy.js file (located at tools/install/proxy.js)? If so, would you mind sharing the code with me?
  • Can you confirm whether you utilize a subnet to prevent IPs from being publicly accessible on the internet?
  • How many instances/proxies do you use on AWS? By region?
  • What's the purpose behind the "multi-region" settings?
  • Additionally, I'm interested in integrating the newly added metrics. Could you highlight which ones you consider most important and provide insight into how you utilize them?

@RajatThukral-Draup
Copy link
Author

Hi @RajatThukral-Draup ,

I've reviewed the code, and it is an excellent work! I truly appreciate the enhancements you've made to version 3, particularly regarding spot instances, Prometheus integration, and the introduction of new metrics.

I have some initial inquiries:

  • How do you go about building the image and ensuring AWS employs IPv6? (I couldn't find any references to IPv6 during instance creation)
  • Have you made any updates to the proxy.js file (located at tools/install/proxy.js)? If so, would you mind sharing the code with me?
  • Can you confirm whether you utilize a subnet to prevent IPs from being publicly accessible on the internet?
  • How many instances/proxies do you use on AWS? By region?
  • What's the purpose behind the "multi-region" settings?
  • Additionally, I'm interested in integrating the newly added metrics. Could you highlight which ones you consider most important and provide insight into how you utilize them?

Hi @fabienvauchelles

How do you go about building the image and ensuring AWS employs IPv6? (I couldn't find any references to IPv6 during instance creation)

  • Yes, we've configured our instances to automatically assign an IPv6 address upon creation.

Have you made any updates to the proxy.js file (located at tools/install/proxy.js)? If so, would you mind sharing the code with me?

Can you confirm whether you utilize a subnet to prevent IPs from being publicly accessible on the internet?

  • We haven't implemented subnet-based restrictions. However, we've limited access to port 3128 exclusively to scrapoxy workers from the scrapoxy master.

How many instances/proxies do you use on AWS? By region?

  • Currently, we manage around 150-200 instances across four distinct AWS regions.

What's the purpose behind the "multi-region" settings?

  • The strategy aims to enhance request success rates and circumvent regional limitations, creating a diversified proxy pool.

Additionally, I'm interested in integrating the newly added metrics. Could you highlight which ones you consider most important and provide insight into how you utilize them?

  • Implementing key metrics has significantly enhanced our system's transparency.
  • Key metrics:
    Throughput - monitoring the rate of requests per minute.
    Latency - measuring the overall request delay.
    Current IP count - tracking the number of IPs available at any moment.

Thanks

@RajatThukral-Draup
Copy link
Author

Hi @fabienvauchelles

I would appreciate hearing from you on this matter.

We've upgraded our code to support IPv6, yet our requests continue to default to IPv4. Is there a preference for IPv4 over IPv6 in the Node.js library? Any advice or insights you could provide on this issue would be very helpful.

Additionally, when executing a curl request to google.com from the same system, it appears to utilize an IPv6 address. I'm uncertain about the precise cause of this behavior and how to resolve it.

Our Node version: v18.18.0

Few references on this:
https://stackoverflow.com/questions/76844182/node-js-prefers-ipv4-over-ipv6

Thanks

@fabienvauchelles
Copy link
Owner

fabienvauchelles commented Feb 28, 2024

Hi,
If you have an IPv6 network interface on the VPC, you can force nodejs to use this specific interface:

On proxy.js (Scrapoxy V3 => https://github.com/fabienvauchelles/scrapoxy/blob/scrapoxy3/tools/install/proxy.js#L28C22-L28C29), it is possible to force Node.js to use a specific network interface.

Add the localAddress on the connect method to specify the IPv6 address of the network interface (check documentation here)

To get the list of network interface, you can use this function require('os').networkInterfaces() and filter on IPv6.

Can you keep me updated if this upgrade works ?

@votiakov
Copy link

votiakov commented May 7, 2024

Hi @RajatThukral-Draup,

Any luck with your transition to v6? Do you have your version available anywhere? Can't access the https://github.com/Draup/scrapoxy anymore.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants