Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Number of Junk Lines #1

Open
berzerk0 opened this issue May 28, 2017 · 1 comment
Open

High Number of Junk Lines #1

berzerk0 opened this issue May 28, 2017 · 1 comment

Comments

@berzerk0
Copy link
Owner

BEWGor makes a ton of lines, maybe too many.

It is such a stark contrast to the Probable-Wordlists that the dictionaries created by BEWGor have so many lines that just don't seem to be of good quality.

What kind of junk?

BEWGor goes through given dates, creates variations and extracts specifics.
If you fed it today's date, 28052017 - it would create the following with a max permutation length of 2, lines produced would include the following.

2805, 285, 2017, 28517, 52817, 5282017 - These are legitimate, quality variations.
2852805, 201717, 528285 - These are NOT quality variations.

If someone is going to include a date in their password, they might do it in a number of different formats (*5/28, 28/5, 05/28, 28/05, 28/05/2017, 28/05/17...) but it is highly unlikely they would include more than one format in the same password!

Now, I predict it would be RARE to have this kind of redundancy, but ultimately it is POSSIBLE.
Here we get to the age-old balance of security - there are always more steps you could take, but how many of them are practical? How many of the steps become overkill, not worth the trouble?

What can be done about the junk? Isn't this problem going to get worse?

As the detail increases, and more specific details are added about the Subjects, the permutations are going to grow exponentially and simply get out of hand. As a result, I will need to refine this process to do things like weed out alternative formats of redundant information.

So far the ideas I have had would require intensely specific creation of password formats, which has plenty of room for design holes. Instead of one implementation of a permutation function, I may end up having a gigantic bundle of nested for loops with conditional exclusions and re-writing of strings that would eat up all the RAM.

For example, I'd need to have a section that uses 'Initials + Birthday(no year),' then 'Birthday(no year) + Initials' then 'Birthday(with year) + Initials,' ...but for every. single. kind. of. in.for.ma.tion.
Nightmarish. CUPP, the program that inspired this one, may have limited the amount of information prompted for exactly this reason.

The answer here might be some kind of machine learning; Some way for the program to recognize that a given string contains redundant information. Unfortunately, I predict this is far above my head at this time.

But all is not lost, I will keep brainstorming and hunting down ways to slim the output down.
In addition, BEWGor only exists on the World's Largest Collaborative Software platform, so I have access to an excellently helpful community. In addition to my own pursuits, any outside suggestions on how to slim down the wordlist without sacrificing too much fidelity would be much appreciated!

Who is asking these questions?

Is posting my own issue like retweeting myself? I mean, I am asking these questions to myself and then answering them. It's a real rhetorical device, right?

TLDR - BEWGor has junk lines. Some of them contain redundant information. I'm trying to put a stop to that - suggestions are appreciated.

@glozanoa
Copy link

Hello @berzerk0 .
I thing splitting the BEWGor.py script into small scripts make more maintainable you repository. I have added a PR to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants