Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abysmal performance for hash algorithms in dart:crypto #5

Open
DartBot opened this issue Jun 5, 2015 · 13 comments
Open

Abysmal performance for hash algorithms in dart:crypto #5

DartBot opened this issue Jun 5, 2015 · 13 comments
Labels
P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@DartBot
Copy link

DartBot commented Jun 5, 2015

Originally opened as dart-lang/sdk#4611

This issue was originally filed by dha...@google.com


Test code:
new File('recordroyale.ogg').readAsBytes().then((buf) {
  var md5 = new MD5();
  md5.update(buf);
  print(md5.digest());
});

where 'recordroyale.ogg' is about six megabytes. This takes 22 seconds on my machine. Shelling out to md5sum(1) takes under 0.02 seconds. I understand that Dart won't ever be as fast as C, but with this disparity, the hash algorithms should be implemented in C[++] rather than Dart.

I had a vague suspicion that this was slow due to unnecessary allocations, but the allocations are mostly small and consumed quickly. Refactoring to eliminate unnecessary allocations resulted in no change.

My use case involved checksumming a byte array that existed only in memory at the relevant location; it is far less convenient to write data to a temporary file, shell out to md5sum(1), then delete that file.

@DartBot DartBot added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) Priority-Medium labels Jun 5, 2015
@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/5858078?v=3" align="left" width="48" height="48"hspace="10"> Comment by larsbak


As a starting point we should add a benchmark to trace performance of MD5:

#import("dart:io");
#import("dart:crypto");

main() {
  new File('data').readAsBytes().then((buf) {
    var md5 = new MD5();
    md5.update(buf);
    print(md5.digest());
  });
}


Set owner to @madsager.
Added Area-Library label.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


There is a lot that can be done to improve performance here. The data is being copied around way too much. Also, the code overflows smi range and (at least it used to be the case that) a lot of the medium-size integer operations ended up in runtime calls.


cc @iposva-google.
cc @sgmitrovic.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


Added Triaged label.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/5449880?v=3" align="left" width="48" height="48"hspace="10"> Comment by iposva-google


Mads, let's work together at creating a repeatable test case. I don't think file reading is really needed here.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/4865287?v=3" align="left" width="48" height="48"hspace="10"> Comment by lrhn


Removed Area-Library label.
Added Area-Pkg, Library-Crypto labels.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/5449880?v=3" align="left" width="48" height="48"hspace="10"> Comment by iposva-google


I wrote a little test just now and it looks like we can close this bug. Abysmal is certainly not what we see anymore:

import "dart:io";
import "package:crypto/crypto.dart";

readFileFully(var path) {
  return new File(path).readAsBytesSync();
}

main() {
  var binary = new Options().executable;
  print("Using $binary");
  for (int i = 0; i < 3; i++) {
    var stopwatch = new Stopwatch()..start();
    var bytes = readFileFully(binary);
    stopwatch.stop();
    print("Read ${bytes.length} bytes in ${stopwatch.elapsedMilliseconds} ms.");

    // Now compute the MD5 hash on the read bytes.
    stopwatch = new Stopwatch()..start();
    var md5 = new MD5();
    md5.add(bytes);
    var hash = md5.close();
    stopwatch.stop();
    print("Hashed ${bytes.length} bytes into ${hash.length} bytes in "
          "${stopwatch.elapsedMilliseconds} ms.");
    var sb = new StringBuffer();
    for (int j = 0; j < hash.length; j++) {
      var val = hash[j].toRadixString(16);
      sb.write(val.length == 1 ? "0$val" : val);
    }
    print("Hash: $sb");
  }
}

Gives this set of results on my MacBook:

dalbe[runtime] ./xcodebuild/ReleaseX64/dart --package-root=./xcodebuild/ReleaseIA32/packages/ ~/dart/Bug4611.dart
Using ./xcodebuild/ReleaseX64/dart
Read 9875196 bytes in 66 ms.
Hashed 9875196 bytes into 16 bytes in 810 ms.
Hash: e5f175b851495b54b393327ac934583a
Read 9875196 bytes in 18 ms.
Hashed 9875196 bytes into 16 bytes in 802 ms.
Hash: e5f175b851495b54b393327ac934583a
Read 9875196 bytes in 33 ms.
Hashed 9875196 bytes into 16 bytes in 816 ms.
Hash: e5f175b851495b54b393327ac934583a

dalbe[runtime] time md5 ./xcodebuild/ReleaseX64/dart
MD5 (./xcodebuild/ReleaseX64/dart) = e5f175b851495b54b393327ac934583a
0.026u 0.007s 0:00.02 100.0% 0+0k 0+1io 0pf+0w

This means we calculate the MD5 hash for a 9.41MB file in 810ms, which is significantly better than the reported 6MB in 22 seconds.


Added AssumedStale label.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/17034?v=3" align="left" width="48" height="48"hspace="10"> Comment by kevmoo


Removed Library-Crypto label.
Added Pkg-Crypto label.

@andreleblanc-wf
Copy link

This may be a slightly contrived example, but I was recently comparing dart performance to python performance on a set of 'programming challenges' and am still seeing VERY weak performance from dart's MD5 hashing. the following code computes about 10 million hashes, and takes roughly 18-20 seconds on my macbook, whereas the equivalent python code (using hashlib) runs in under 5 second.

import "dart:io";
import "package:crypto/crypto.dart";

main(List<String> args) {
  MD5 md5;
  String text;
  String privKey = "yzbqklnj";
  String digest = "";
  List<int> bytes;
  int counter = 0;
  while (!digest.startsWith("000000")) {
    counter ++;
    text = privKey + counter.toString();
    md5 = new MD5();
    md5.add(text.codeUnits);
    bytes = md5.close();
    digest = CryptoUtils.bytesToHex(bytes);
  }
  print(counter);
}

of course hashlib is implemented in C, so the speed is to be expected, but it also clearly demonstrates that dart is nowhere near the same ballpark without implementing this stuff in C.

@sethladd
Copy link
Contributor

sethladd commented Dec 4, 2015

Thanks for the report. Would you mind providing the version of Dart that you used? Did you have checked mode turned on? And, what version of crypto?

@andreleblanc-wf
Copy link

I actually just updated my comment, seems there was an issue with my initial python run, and the real number is closer to 5 seconds vs. dart at 20 seconds. this is using dart 1.13.0. The numbers are a little more reasonable now, but it's still a significant performance hit.

@sethladd
Copy link
Contributor

sethladd commented Dec 7, 2015

Thanks for the additional comments. I'm not sure this issue should be closed. Reopening.

cc @sgmitrovic if he's curious.

@scheglov
Copy link

I see that *_MD5Sink.updateHash allocates millions on _Mint objects, which causes GC, and probably slow performance. I compute MD5 of a fairly small list of bytes, and MD5 computation takes almost 50% of total time in my app. Why isn't it inlined to use unboxed values?!

@cachapa
Copy link

cachapa commented Dec 15, 2019

Is this issue still being worked on?
I'm still experiencing ~10x slower performance on MD5 compared to the native shell command line for large files (in my example, 121MB).

On my laptop (Ubuntu 19.10):

$ dart2native dart_md5.dart -o dart_md5
$ time dart_md5 video.mpg
54cd56fa1fcf144b21d48357e4e694a3

real	0m1,498s
user	0m1,584s
sys	0m0,038s

$ time md5sum video.mpg
54cd56fa1fcf144b21d48357e4e694a3  video.mpg

real	0m0,191s
user	0m0,187s
sys	0m0,004s

On my Raspberry Pi (raspbian):

$ dart2native dart_md5.dart -o dart_md5
$ time dart_md5 video.mpg
54cd56fa1fcf144b21d48357e4e694a3

real	0m7.923s
user	0m7.972s
sys	0m0.181s

$ time md5sum video.mpg
54cd56fa1fcf144b21d48357e4e694a3  video.mpg

real	0m0.646s
user	0m0.514s
sys	0m0.133s

Here's the relevant part of my code:

  static Future<String> _hash(File file) async =>
      (await md5.bind(file.openRead()).first).toString();

Dart VM version: 2.7.0

@devoncarew devoncarew added P2 A bug or feature request we're likely to work on and removed Priority-Medium labels Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

7 participants