Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline bits of pdf crate for better performance #36

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jorpic
Copy link

@jorpic jorpic commented Apr 22, 2024

Hello and thank you for this project, it helped me A LOT recently.

I was in a need to restore passwords for multiple PDF documents, so performance was really important. It occurs that sharing single pdf::file::Storage between attempts in a single thread results in a noticeable speedup.

Cracking one of the example files on my machine:

  • 6113 vs 2269 attempts/s for single thread
  • 14315 vs 6095 attempts/s for four threads

Single thread

user@zen ~/t/pdfrip (inline-pdf-storage)> cargo run --release -- -n 1 -f examples/default-query-1.pdf range 1 1000000
 2024-04-22T17:23:26.430Z INFO  engine > Starting password cracking job...
  [00:02:43] [████████████████████████████████████████]  999999/999999  100% 6113/s ETA: 0s
 2024-04-22T17:26:10.043Z INFO  cli_interface > Failed to crack file...
user@zen ~/t/pdfrip (main) [1]> cargo run --release -- -n 1 -f examples/default-query-1.pdf range 1 1000000
 2024-04-22T17:36:11.308Z INFO  engine > Starting password cracking job...
  [00:07:20] [████████████████████████████████████████]  999999/999999  100% 2269/s ETA: 0s
 2024-04-22T17:43:32.091Z INFO  cli_interface > Failed to crack file...

Four threads

user@zen ~/t/pdfrip (inline-pdf-storage) [1]> cargo run --release -- -n 4 -f examples/default-query-1.pdf range 1 1000000
 2024-04-22T17:28:08.219Z INFO  engine > Starting password cracking job...
  [00:01:09] [████████████████████████████████████████]  999999/999999  100% 14315/s ETA: 0s
 2024-04-22T17:29:18.090Z INFO  cli_interface > Failed to crack file...
user@zen ~/t/pdfrip (main) [1]> cargo run --release -- -n 4 -f examples/default-query-1.pdf range 1 1000000
 2024-04-22T17:32:44.111Z INFO  engine > Starting password cracking job...
  [00:02:44] [████████████████████████████████████████]  999999/999999  100% 6095/s ETA: 0s
 2024-04-22T17:35:28.198Z INFO  cli_interface > Failed to crack file...

@jorpic jorpic changed the title Inlile bits of pdf crate for better performance Inline bits of pdf crate for better performance Apr 23, 2024

match res {
Ok(storage) => Ok(Self(storage)),
Err(err) => Err(anyhow!(err).context("Failed to init cracker")),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it worth bringing anyhow here or it is better to create a custom error with thiserror.

use pdf::PdfError;
use pdf::any::AnySync;
use pdf::file::{Cache, Storage};
use pdf::object::{ParseOptions, PlainRef};

#[derive(Clone)]
pub struct PDFCracker(Vec<u8>);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change, PDFCracker is just a wrapper around Vec<u8> without any additional functionality. It could be better to rename it to PDFFile or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant