Localized Wordlist #703

agyss · 2022-03-04T10:09:12Z

Localized wordlists would increase the education purpose a lot. The most common german passwords are for sure quite different from the chinese ones, englisch ones, and so on...

The best I found to start of was:
https://github.com/scipag/password-list/tree/main/countries

I would be willing to create the lists by my own if someone can provide access to specific raw data. The only requirement for the raw data would be some kind of a link between the password and the users native language. Things I came up with are:

IP-based localization (from logs maybe)
language settings
information about users location - country only, no personal/detailed information of course (Profile settings etc...)

I would be willig to connect the data if it's spread around multiple files, tables, ...

Any ideas about how to handle this? I am aware of the problem with PII-Data.

ItsIgnacioPortal · 2022-05-03T05:47:21Z

Ok, so this has a couple of challenges:

Getting the data breach databases
Separating the users by geographical location
Actually processing the data

I personally have a copy of Collection #1, Collection #2 to #5, and the ANTIPUBLIC breaches, totaling:

2.7 billion total records
1.2 billion unique e-mail address and password combinations
773 million unique e-mail addresses
21 million unique, plaintext passwords

So that's the first issue taken care of.

But the second one is a bit trickier. More often than not, data breaches are just combos of email:password; no IP address included. I think we might be able to classify the combos by matching the emails to the X most common names for each country in the world.

The third issue is though one for me. My machine isn't beefy enough to go trough 1.2 billion unique combos in a single lifetime 😂. @agyss do you think your machine is up to the task?

agyss · 2022-05-22T08:43:15Z

I checked for the second issue, by combining these two datasets, we should get a good base for matching:
https://web.archive.org/web/20200414235453/ftp://ftp.heise.de/pub/ct/listings/0717-182.zip
and
https://en.wikipedia.org/wiki/Category%3aLists_of_popular_names

Furthermore I would filter the mailadresses and only take the ones following the pattern firstname.lastname@.... or lastname.firstname@ (with numbers in and after the names to have a higher coverage).

I will do a preprocessing of the wordlists to gain hashsets for all possible firstname.lastname and lastname.firstname combinations.

Long story short I will find a way and do have the performance to process the data.

g0tmi1k · 2022-08-02T09:46:04Z

Feel free to open up a pull request with it!

DeveloperOl · 2022-08-17T13:01:32Z

I have some wordlists with tons of common and uncommon language specific words and names, etc (just words not common passwords) for many popular languages. Those could be used for your educational purposes in addition with hashcat rules ;) I can make a pull request if this is of interest. I would create a folder like /Passwords/localized if that is the right place for it. @g0tmi1k

ItsIgnacioPortal · 2022-09-09T07:25:47Z

I have some wordlists with tons of common and uncommon language specific words and names, etc (just words not common passwords) for many popular languages. Those could be used for your educational purposes in addition with hashcat rules ;) I can make a pull request if this is of interest. I would create a folder like /Passwords/localized if that is the right place for it. @g0tmi1k

That would be very useful for fulfilling this issue :). Please make a pull request @DeveloperOl

DeveloperOl · 2022-09-12T10:39:29Z

Okay, I will do some cleanup and than make a PR for German, French, Spanish, Polish and Swedish, maybe one by one becuase generating and cleaning up these lists is a pain.
Lists over 100MB should be compressed or not @ItsIgnacioPortal ?

ItsIgnacioPortal · 2022-09-12T11:30:06Z

Lists over 100MB should be compressed or not @ItsIgnacioPortal ?

No, it's fine. Though, I don't know up to what point such a crazy-long list would be useful. Could you limit each list to one hundred thousand lines?

DeveloperOl · 2022-09-12T13:15:13Z

I created these lists by crawling localized web pages because I realized there were words missing in the existing lists here and I found, that some password hashes were not cracked because of that, but would have been with a complete list (thats how I found this issue).
I could limit the wordlist by usage frequency, however that would mean many valid, but uncommon, words are dropping out.
I can add top-x lists and the full list in a PR and you can cherry pick if thats okay for you @ItsIgnacioPortal. I just need some finetuning then and recrawling with word count.

ItsIgnacioPortal · 2022-09-12T15:22:53Z

I created these lists by crawling localized web pages because I realized there were words missing in the existing lists here and I found, that some password hashes were not cracked because of that, but would have been with a complete list (thats how I found this issue). I could limit the wordlist by usage frequency, however that would mean many valid, but uncommon, words are dropping out. I can add top-x lists and the full list in a PR and you can cherry pick if thats okay for you @ItsIgnacioPortal. I just need some finetuning then and recrawling with word count.

Alright! I'm awaiting your PR

add localized wikipedia wordlists (Relates to #703) Source: https://github.com/DeveloperOl/wikipediator_v2

g0tmi1k added enhancement Enhancement proposal Status: Proposal labels Apr 26, 2022

DeveloperOl mentioned this issue Jun 28, 2023

add localized wikipedia wordlists (Relates to #703) #886

Merged

g0tmi1k added a commit that referenced this issue Nov 23, 2023

Merge pull request #886 from DeveloperOl/master

4820f44

add localized wikipedia wordlists (Relates to #703) Source: https://github.com/DeveloperOl/wikipediator_v2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Localized Wordlist #703

Localized Wordlist #703

agyss commented Mar 4, 2022

ItsIgnacioPortal commented May 3, 2022 •

edited

agyss commented May 22, 2022

g0tmi1k commented Aug 2, 2022

DeveloperOl commented Aug 17, 2022

ItsIgnacioPortal commented Sep 9, 2022

DeveloperOl commented Sep 12, 2022

ItsIgnacioPortal commented Sep 12, 2022 •

edited

DeveloperOl commented Sep 12, 2022

ItsIgnacioPortal commented Sep 12, 2022

Localized Wordlist #703

Localized Wordlist #703

Comments

agyss commented Mar 4, 2022

ItsIgnacioPortal commented May 3, 2022 • edited

agyss commented May 22, 2022

g0tmi1k commented Aug 2, 2022

DeveloperOl commented Aug 17, 2022

ItsIgnacioPortal commented Sep 9, 2022

DeveloperOl commented Sep 12, 2022

ItsIgnacioPortal commented Sep 12, 2022 • edited

DeveloperOl commented Sep 12, 2022

ItsIgnacioPortal commented Sep 12, 2022

ItsIgnacioPortal commented May 3, 2022 •

edited

ItsIgnacioPortal commented Sep 12, 2022 •

edited