New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automate and scrap rotating domains of the disposable providers #450
Comments
Is there already a list of disposable e-mail provider web pages that could serve as input for such automation? |
@haumacher I'd start with some of the bigger providers like
But each would be different and unique to scrape/parse, so I wouldn't spend much time on any specific one, just tried to find the low hanging fruit first. |
another one is https://www.disposablemail.com/ |
Should the script open a web headless browser and use a regex to search for the new email list. Adding the new domain, and then commit the new diff as a new PR? |
@icyavocado I think, this would be a huge effort and only possible for dumb fake-mail providers. "Professional" ones have strong protection against automated querying e.g. CAPTCHAs or even headless browser detection. Here are some more: https://www.spamgourmet.com |
Agree, from what I'm seeing, this may be a case of high effort, low reward scenario without a way to bypass bot checking. Unless we can show with certainty that requests from Github to these domains won't get blocked, finding an easy way to carry out this change doesn't seem likely. I wonder if we should write each of these different scrapping like a cypress/selenium test. |
I agree that some are well-enough protected against headless browser checking for domains. However, I believe there still are some low hanging fruits to be found. E.g. edit: the example above is actually likely static, but e.g. |
I can look into this and try to create a basic test that might work for most situations. The aim is to make a simple version to show that the idea can work with minimum effort. |
Here is my proposal for the script using puppeteer. Here is the propose change: icyavocado@384f7a8 TLDR: this script reads from the This is just the first step of the task. As we continue our discussion about how to implement this correctly, I'll be working on the automation aspect. P/S: I was able to get the automation to work, here is a run of the automation using the script above to find then create a new branch: https://github.com/icyavocado/disposable-email-domains/actions/runs/8142541958/job/22252322009 Here are some potential challenges we might face, along with possible solutions. Your insights and suggestions are welcome:
|
Some disposable services have fairly stable urls with lists -- hence we could implement a CI check that automatically merges domains in that list with our blocklist or opens PRs. Places that do not have such a resource can still be scrapped in CI with a headless browser or such.
The text was updated successfully, but these errors were encountered: