Artifacts introduced due to piping chunks encryption #505

sylido · 2017-10-16T16:45:24Z

When uploading a file and setting streams/chunks to dynamic or just chunks to anything smaller than the size of the file getting uploaded - artifacts are introduced into the file. During download they are quite apparent, if it's an image file that is - binary data.

On the client side we use the pipe function to encrypt the data before it reaches the server. Seems like every chunk gets encrypted and then somehow the chunks get merged when storing the file on the server. The file is then decrypted and encrypted once again. When the file is then downloaded the artifacts are present - i.e. missing pixel data, wrong pixel data.

If the chunk size is set to a number bigger than the size of the file, the piping function encrypts only once and it seems to work just fine in that case. The problem is that having huge chunks pretty much freezes the browser while it's getting uploaded - which can take minutes.

Am I doing something wrong or is there an issue with binary data getting preserved correctly when being chunked ?

ostrio:files@1.8.2
Meteor@1.5.1
Seems like a Client issue

dr-dimitru · 2017-10-16T21:08:37Z

Hello @sylido ,

Needs to mentioned, what .pipe() wasn't used a lot, and you're one of the first developers to use it. So, there is might be a bug in it, or not. Let's start with some more info:

How do you decrypt the file? Do you decrypt the whole received file or chunk by chunk as it was sent from the client?
Does your encrypting function returns data as base64 string?
Could you show some used code for encryption and decryption? What library is used for it?

sylido · 2017-10-17T00:45:58Z

Hi @dr-dimitru thanks for responding so fast.

The file is decrypted on the server side in the onAfterUpload callback of the Files collection where the file gets uploaded. I am decrypting the whole file and not the chunks, maybe that's the problem since they seem to get encrypted just like that during the upload.
The encryption function does return a base64 string.
I'll list some of the code used underneath this. I use RSA for the client side encryption and subsequently AES for the server side encryption.

Client side code in order

Files.insert({
  file      : files[uid],
  streams   : "dynamic",
  chunkSize : "dynamic",
  meta      : {
    path       : directory,
    org        : cid,
    owner      : ownerId,
    uploadedAt : moment().valueOf(),
    createdAt  : moment().valueOf()
  }
}, false)
.pipe(encryptFile);

encryptFile : function(data) {
  if (Meteor.isClient) {
    var nrsa        = require("node-rsa"),
        cUserPubKey = _.get(Meteor.user(), "publicKey"),
        key         = new nrsa();

    key.importKey(cUserPubKey);
    
    if (data) {
      data = key.encrypt(data, "base64", "base64");
    }
  }
  
  return data;
}

Server side code as follows

onAfterUpload : function(fileRef) {
  if (Meteor.isServer) {
    var fileData  = fs.readFileSync(fileRef.path, { encoding : "base64" }),
        decrypted = '',
        encrypted = '',
        user      = Meteor.users.findOne({ _id : _.get(fileRef, "userId") }),
        privKey   = _.get(user, "privateKey"),
        salt      = _.get(user, "salt"),
        pubKey    = _.get(user, "publicKey"),
        univ      = _.get(user, "key"),
        key       = '',
        fs        = Npm.require('fs-extra'),
        aes       = Npm.require('crypto-js/aes'),
        sha3      = Npm.require('crypto-js/sha3'),
        cryp      = Npm.require('crypto-js/x64-core'),
        nrsa      = Npm.require("node-rsa");

    privKey = aes.decrypt(privKey, sha3(someKey + salt).toString()).toString(cryp.enc.Utf8);

    key = new nrsa();
    key.importKey(privKey);
    key.importKey(pubKey);

    decrypted = key.decrypt(fileData, "base64", "base64");
    univ      = sha3(key.decrypt(univ, "utf8").toString()).toString(); // get universal key
    encrypted = aes.encrypt(decrypted, univ).toString();
    fs.removeSync(fileRef.path);
    fs.writeFileSync(fileRef.meta.path + verName, encrypted, { encoding : "utf8" });

    Files.update({ _id : fileRef._id }, {
      $set : {
        name                     : verName,
        path                     : fileRef.meta.path + verName,
        "versions.original.path" : fileRef.meta.path + verName
      }
    });
  }
}

interceptDownload : function(http, fileRef) {
  if (Meteor.isServer) {
    var usr          = Meteor.users.findOne({ _id : http.userId }),
        privKey      = _.get(usr, "pk"),
        salt         = _.get(usr, "salt"),
        pubKey       = _.get(usr, "public"),
        univ         = _.get(usr, "key"),
        fileContents = '',
        key          = '',
        buf          = '';

    fileContents = fs.readFileSync(fileRef.path, { encoding : "utf8" });

    privKey = aes.decrypt(privKey, sha3(someKey + salt).toString()).toString(cryp.enc.Utf8);

    key = new nrsa();
    key.importKey(privKey);
    key.importKey(pubKey);

    univ         = sha3(key.decrypt(univ, "utf8").toString()).toString();
    fileContents = aes.decrypt(fileContents, univ).toString(cryp.enc.Utf8);
    fileContents = cryp.enc.Base64.parse(fileContents).toString(cryp.enc.Base64);
    buf          = new Buffer(fileContents, 'base64');

    http.response.writeHead(200, { "Content-Disposition" : "attachment; filename=" + fileRef.name });
    http.response.end(buf);

    return true;

  }
  return false;
}

dr-dimitru · 2017-10-17T02:11:33Z

@sylido thank you for update. So you should reverse encoding logic:

Set constraint chunkSize
Replace fs.readFileSync with reading file by chunks with exact size of chunkSize
Read file as base64 and decode it chunk by chunk into single Buffer
Final Buffer will be your decoded file

sylido · 2017-10-17T02:22:05Z

Thanks @dr-dimitru, just a clarification here, do you want me to set the chunk size manually, instead of dynamic - i.e. 1024 * 256 for a 256kb chunk size. Then using that chunk size, read the chunks from the HDD and decode each one of them, then merge the decrypted contents and encrypt the result once. Afterwards during download it should get decrypted only once.

dr-dimitru · 2017-10-17T03:45:08Z

do you want me to set the chunk size manually, instead of dynamic - i.e. 1024 * 256 for a 256kb chunk size.

Right.

read the chunks from the HDD and decode each one of them, then merge the decrypted contents and encrypt the result once

Yes.

Ping me if you will have any further issues.
Once you make it work - do not hesitate to share your experience with our community.
End-to-end encryption was discussed here, but wasn't finally implemented by anyone.

dr-dimitru · 2017-10-19T07:35:20Z

Hello @sylido ,

Any progress on this one?

sylido · 2017-10-19T15:48:57Z

Hi @dr-dimitru, I tried implementing it the way you suggested, but I'm encountering an issue where the bytes size for the chunk size when getting the data balloons to a different size after getting encrypted. Subsequently I think that causes the decryption on the server size to read less data than needed, trying to figure out how to match the chunk size before decryption to the chunk size after encryption.

I was wondering if there is a way to group up the chunks through the piping function into one and then do encryption on the whole thing ?

I'll update here on any progress I make.

dr-dimitru · 2017-10-19T16:35:27Z

I'm afraid, it will be easier to encrypt the whole file, reading it with FileReader, then upload its result as base64 string

WDYT?

sylido · 2017-10-20T22:56:09Z

I continued trying to get .pipe() to work, but it seems to have a bug, at least with encryption. The chunksizes, even if they match on the client/server seem to contain different data. I tried using 1 stream to see if it somehow asynchronously adds chunks out of order, but that didn't seem to be the case.

To get the proper bytelength that needs to be read on the server I used a buffer.byteLength(<encoded string>, "bae64") on the client side. Then used that value for the chunks that are read by fs.read() (in my case fs.readSync()). Again it seems to read the right amount of data in this case, but just contains bogus chunk data.

So I switched to what you suggested, reading all the file contents with FileReader and using a WebWorker to do the actual encryption. This was done to unblock the browser's UI as it's completely blocked by the encryption - which could take upwards of 10 seconds depending on the file size.

Afterwards I gave the encrypted contents to the Files.insert() I'm doing and everything seems to get transferred just fine even in chunks. I don't think I even need to use base64 as the encoded version of the file could easily just be encoded in Utf8. This can save you from specifying isBase64 : true and fileName on the client side.

dr-dimitru · 2017-10-21T00:40:58Z

@sylido thank you for update.

What your final conclusion?
Are you happy with current solution?

sylido · 2017-10-21T01:06:38Z

There's a bug in the pipe() function when handling chunks, not sure how to debug. Open to suggestions if you don't have time to look into it.
Yeah it seems like this is the way to go, since pipe() and chunks wouldn't have made the encryption part faster or unblock the UI.

Thanks a lot @dr-dimitru. Feel free to close the issue or I will if you think it's too niche of a case to spend time debugging, just let me know !

dr-dimitru · 2017-10-21T01:38:59Z

Yeah it seems like this is the way to go, since pipe() and chunks wouldn't have made the encryption part faster or unblock the UI.

Yes, .pipe() is available only at main thread, so there no way to accomplish encryption without blocking UI.

There's a bug in the pipe() function when handling chunks, not sure how to debug. Open to suggestions if you don't have time to look into it.

My thoughts:

Here I've said what encryption should be reversed for each received chunk on a server, idk how you have implemented this, and idk if it's even possible now with current API hooks, probably there is should be receiving .pipe() for incoming data
There is a reason why data comes as base64 into the .pipe() and caused by deprecation of readAsBinaryString() while readAsArrayBuffer() will be too expensive preparing it for transfer, so the readAsDataURL() most useful in our case and acts in the same way in all browsers. There is might be an issue as we read data by chunks, and base64 uses = for padding, replacing empty bits of data at the end of the string, not sure if it's related, but may give a clue
I'm glad you've found a way to make it work, it's efficient and non-blocking, I don't think there is a better way to accomplish this task. The only disadvantage is holding the encrypted file in a memory. So there is might be a place for implementing asynchronous analog of writableStream API for insert() method, where you can push chunks upon availability.

sylido · 2017-10-25T09:37:26Z

I did implement it in the way you described, I could list the server side code used to read the chunks. Obviously when encrypting data we change the size of the chunk that the pipe() function returns, so I tried to account for that on the server side, but that didn't help either.
Maybe something related to the padding, but the extra data I mentioned when reading/writing chunks doesn't seem like a bunch of =...=, so kind of stumped by what base64 does. An example of the behavior I was getting is - After Encoding Chunk(AEC) -> abacbcb=, what the pipe function returns. Then when reading a chunk of the same size as the AEC -> Encoded File Chunk(EFC) -> abacxyzxzz=. So you see that there is a part that is correct, then part of the original AEC chunk is missing and is replaced by completely different data. Maybe that's the padding data that gets removed and the next chunk replaces it.
Currently not a problem as most people have enough memory to handle it. I guess eventually might have to move to writing the chunks to disk as you suggested if the files get out of hand.

dr-dimitru · 2017-10-25T18:54:33Z

Hello @sylido ,

Thank for sharing your thoughts on it.
Sounds like you've found complete solution for your needs, question about .pipe() is still open (I may move it to separate ticket).

For now marked as [DOCUMENTATION] I'll try to extract useful info we have got during discussion into usable article about file encryption.

dr-dimitru added .insert() (upload) .pipe() labels Oct 16, 2017

dr-dimitru added the help wanted label Oct 21, 2017

dr-dimitru added the Documentation label Oct 25, 2017

sylido mentioned this issue Dec 12, 2017

Session expired #536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artifacts introduced due to piping chunks encryption #505

Artifacts introduced due to piping chunks encryption #505

sylido commented Oct 16, 2017

dr-dimitru commented Oct 16, 2017

sylido commented Oct 17, 2017 •

edited

dr-dimitru commented Oct 17, 2017

sylido commented Oct 17, 2017

dr-dimitru commented Oct 17, 2017

dr-dimitru commented Oct 19, 2017

sylido commented Oct 19, 2017

dr-dimitru commented Oct 19, 2017

sylido commented Oct 20, 2017 •

edited

dr-dimitru commented Oct 21, 2017

sylido commented Oct 21, 2017

dr-dimitru commented Oct 21, 2017

sylido commented Oct 25, 2017

dr-dimitru commented Oct 25, 2017

Artifacts introduced due to piping chunks encryption #505

Artifacts introduced due to piping chunks encryption #505

Comments

sylido commented Oct 16, 2017

dr-dimitru commented Oct 16, 2017

sylido commented Oct 17, 2017 • edited

dr-dimitru commented Oct 17, 2017

sylido commented Oct 17, 2017

dr-dimitru commented Oct 17, 2017

dr-dimitru commented Oct 19, 2017

sylido commented Oct 19, 2017

dr-dimitru commented Oct 19, 2017

sylido commented Oct 20, 2017 • edited

dr-dimitru commented Oct 21, 2017

sylido commented Oct 21, 2017

dr-dimitru commented Oct 21, 2017

sylido commented Oct 25, 2017

dr-dimitru commented Oct 25, 2017

sylido commented Oct 17, 2017 •

edited

sylido commented Oct 20, 2017 •

edited