Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REGEX ISSUE] False Positives while email enumeration #5

Closed
devanshbatham opened this issue Mar 4, 2020 · 4 comments
Closed

[REGEX ISSUE] False Positives while email enumeration #5

devanshbatham opened this issue Mar 4, 2020 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@devanshbatham
Copy link
Owner

devanshbatham commented Mar 4, 2020

Well , in some cases this produces false positives while dealing with emails .

What the issue is ?
My current regex matches "test@2x.png" , "webflate@webflate.com-" , "caramel@jupiter.500" as valid emails but they arnt valid emails , however I created a blacklisted pattern (but ofcourse it is not possible to blacklist all the false positive possibilities)

Steps to reproduce
python3 archivefuzz.py webflow.com

email output :

dan@dantaylorphotography.com-60
partners@webflow.com
5c318ff6ff1c07e3a59ec00a__MG_7298-2@0.800
5b4558bdcff5812844878348_spacious@barcino2.500
5d9208a00984c562e3955dd6_@avg.surfer-
5b4558bdcff5812844878348_spacious@barcino2.800
5d918b2dd12b5bf8c6a6c03e_@pablo.aer-tri
5ab26694a8ed98f0f004194d_saroglia-k4RE-U11011184430715PLI-102576@LaStampa.it

My current Regex and blacklisting pattern is :

"Email": [
            "([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]{2,7})",
            "(-p-|mp4|webm|JPG|pdf|html|jpg|jpeg|png|gif|bmp|svg|1x|2x|3x|4x|5x|6x|7x|9x|10x|11x|12x|13x|14x|15x)"
            ],

I need to improve my regex, tried different flavors , still not able to remove all the false positives !

Help needed !

Thanks

@devanshbatham devanshbatham added the help wanted Extra attention is needed label Mar 4, 2020
@devanshbatham devanshbatham pinned this issue Mar 4, 2020
@NullPxl
Copy link

NullPxl commented Mar 4, 2020

I would use a whitelist rather than blacklist for domains. There is a list of all valid tlds here: http://data.iana.org/TLD/tlds-alpha-by-domain.txt

@devanshbatham
Copy link
Owner Author

Well after thinking on this , it is not possible to blacklist/whitelist all the invalid-words/TLDs .

@rohitcoder
Copy link

There exists a python library called validate_email which has 3 levels of email validation, including asking a valid SMTP server if the email address is valid (without sending an email).

So, it may be a time-consuming task. But, if someone wants he can use it. It will check for MX records.

You can also make dynamic blacklist (for future use) for each domain whose MX record isn't available.

@devanshbatham
Copy link
Owner Author

@rohitcoder can't use that, as Time is the constraint, so am thinking of creating a whitelist, Currently working on it

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants