Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrence by email address / username #9

Open
dennisbmoore opened this issue Jul 8, 2020 · 6 comments
Open

Recurrence by email address / username #9

dennisbmoore opened this issue Jul 8, 2020 · 6 comments

Comments

@dennisbmoore
Copy link

Did you capture usernames / email addresses in your data set? Can you determine uniqueness or lack thereof by email addresses? For example, what fraction of the passwords associated with a specific username (email address if relevant) are unique, and how does that vary with the number of duplicates of the username (i.e., reuse of passwords vs # of times the username is matched in the data set). Thanks!

@ignis-sec
Copy link
Owner

Hello!

Yes, i did capture username/email tuples in my data.

It is a great idea, however it is extremely time consuming to do a large-scale analysis on both username and password, because it requires doing a join operation on 1 billion rows.

But it is not as impactful as you might think.

  • Average number of times each email was found is 1.889.
  • 196.250.369 emails were only found once.
  • A few email addresses are responsible of raising the average. mail.ru@hotmail.com was the most common email address, found 90549 times, along with gmail.com@hotmail.com (85k times), password@gmail.com (38k times), info@yahoo.com (31k times) and so on.

So, i've decided not to process that metric, because it will be too computationally heavy with minimal impact.

If you disagree, please feel free to write so!

Cheers!

@dennisbmoore
Copy link
Author

Interesting. For the emails used many thousands of times, I wonder if those should be blacklisted (along with any accounts created using those as secondary accounts) - probably fraud related.

What if you limited it to say accounts which appeared within a smaller range of occurrences - say 10 to 500 times? This could substantially reduce the computational cost and would seem to still provide important information about reuse of passwords

Thanks for doing the important work you do!

@ignis-sec
Copy link
Owner

I've filtered accounts which have appeared more than once in a dump (just because i dont think a regular user can register with the same email more than once to a website).

If there were 25 (username,password) tuples with same username and password in a single dump, they were only counted as 1.

This had 2 possible outcomes - Either accounts repeating 90k times also shared the password and did not get processed 90k times, or they had random password, and did not influence the most common passwords list.

Interesting point though, these spam accounts appear in all kinds of lists, and they have very natural looking passwords, so i don't think these accounts skewed the statistics other than most common passwords either.
image

@ignis-sec
Copy link
Owner

Hmmmmmmmmmmm interesting breakthrough, i checked some of the more unique-looking passwords used by the mail.ru@hotmail account.

image
image

I'm pretty certain people trying to sell these leaks bloated the number of credentials inside, by duplicating accounts and replacing their usernames with these junks.

@ignis-sec ignis-sec reopened this Jul 8, 2020
@ignis-sec
Copy link
Owner

I've been checking passwords from mystery lists frantically, i was really excited there was something to possibly explain that, but it looks like just a fraction of these passwords are from these spam accounts.

@Malikiscute
Copy link

i need the commands for this how do i search for passwords

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants