Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal-ish parser and detector #103

Open
M-Gonzalo opened this issue Dec 30, 2019 · 0 comments
Open

Universal-ish parser and detector #103

M-Gonzalo opened this issue Dec 30, 2019 · 0 comments

Comments

@M-Gonzalo
Copy link

M-Gonzalo commented Dec 30, 2019

Part of the nature of precomp calls for the frequent use of parsers and file-type recognition code.

This is a tedious task, as every parser needs to be manually written and tuned.
It is also very prone to errors and mismatches because of the different implementations of each standard and the fact that the streams are often only a part of the whole file so the program is flying blind.

I believe that having a robust and accurate universal type detection code is not only posible but probably easier to implement than the current system using the method described in here.

the proposed solution is correctly assigning file types based only on file fragments of size 1024 with an accuracy of 98.3%.

The parsers currently used by precomp are very good in their own right, yet there are a number of future applications where precomp will need more and more detection code, including some open issues:

#6 #20 #26 #44 and #86

There are other applications for a quick and correct type detection, as the use of dictionary preprocessors and/or custom compressors for text, exe preprocessing, mm preprocessing, and maybe even fast detection of header-less deflate streams, currently done on "brute mode".
There's also the proposed extract switch and the streams grouping to improve compression.

So it would probably make sense to tackle this before addressing any of the other issues...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant