Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifacts introduced due to piping chunks encryption #505

Open
sylido opened this issue Oct 16, 2017 · 14 comments
Open

Artifacts introduced due to piping chunks encryption #505

sylido opened this issue Oct 16, 2017 · 14 comments

Comments

@sylido
Copy link

sylido commented Oct 16, 2017

When uploading a file and setting streams/chunks to dynamic or just chunks to anything smaller than the size of the file getting uploaded - artifacts are introduced into the file. During download they are quite apparent, if it's an image file that is - binary data.

On the client side we use the pipe function to encrypt the data before it reaches the server. Seems like every chunk gets encrypted and then somehow the chunks get merged when storing the file on the server. The file is then decrypted and encrypted once again. When the file is then downloaded the artifacts are present - i.e. missing pixel data, wrong pixel data.

If the chunk size is set to a number bigger than the size of the file, the piping function encrypts only once and it seems to work just fine in that case. The problem is that having huge chunks pretty much freezes the browser while it's getting uploaded - which can take minutes.

Am I doing something wrong or is there an issue with binary data getting preserved correctly when being chunked ?

ostrio:files@1.8.2
Meteor@1.5.1
Seems like a Client issue

@dr-dimitru
Copy link
Member

Hello @sylido ,

Needs to mentioned, what .pipe() wasn't used a lot, and you're one of the first developers to use it. So, there is might be a bug in it, or not. Let's start with some more info:

  1. How do you decrypt the file? Do you decrypt the whole received file or chunk by chunk as it was sent from the client?
  2. Does your encrypting function returns data as base64 string?
  3. Could you show some used code for encryption and decryption? What library is used for it?

@sylido
Copy link
Author

sylido commented Oct 17, 2017

Hi @dr-dimitru thanks for responding so fast.

  1. The file is decrypted on the server side in the onAfterUpload callback of the Files collection where the file gets uploaded. I am decrypting the whole file and not the chunks, maybe that's the problem since they seem to get encrypted just like that during the upload.

  2. The encryption function does return a base64 string.

  3. I'll list some of the code used underneath this. I use RSA for the client side encryption and subsequently AES for the server side encryption.

Client side code in order

Files.insert({
  file      : files[uid],
  streams   : "dynamic",
  chunkSize : "dynamic",
  meta      : {
    path       : directory,
    org        : cid,
    owner      : ownerId,
    uploadedAt : moment().valueOf(),
    createdAt  : moment().valueOf()
  }
}, false)
.pipe(encryptFile);

encryptFile : function(data) {
  if (Meteor.isClient) {
    var nrsa        = require("node-rsa"),
        cUserPubKey = _.get(Meteor.user(), "publicKey"),
        key         = new nrsa();

    key.importKey(cUserPubKey);
    
    if (data) {
      data = key.encrypt(data, "base64", "base64");
    }
  }
  
  return data;
}

Server side code as follows

onAfterUpload : function(fileRef) {
  if (Meteor.isServer) {
    var fileData  = fs.readFileSync(fileRef.path, { encoding : "base64" }),
        decrypted = '',
        encrypted = '',
        user      = Meteor.users.findOne({ _id : _.get(fileRef, "userId") }),
        privKey   = _.get(user, "privateKey"),
        salt      = _.get(user, "salt"),
        pubKey    = _.get(user, "publicKey"),
        univ      = _.get(user, "key"),
        key       = '',
        fs        = Npm.require('fs-extra'),
        aes       = Npm.require('crypto-js/aes'),
        sha3      = Npm.require('crypto-js/sha3'),
        cryp      = Npm.require('crypto-js/x64-core'),
        nrsa      = Npm.require("node-rsa");

    privKey = aes.decrypt(privKey, sha3(someKey + salt).toString()).toString(cryp.enc.Utf8);

    key = new nrsa();
    key.importKey(privKey);
    key.importKey(pubKey);

    decrypted = key.decrypt(fileData, "base64", "base64");
    univ      = sha3(key.decrypt(univ, "utf8").toString()).toString(); // get universal key
    encrypted = aes.encrypt(decrypted, univ).toString();
    fs.removeSync(fileRef.path);
    fs.writeFileSync(fileRef.meta.path + verName, encrypted, { encoding : "utf8" });

    Files.update({ _id : fileRef._id }, {
      $set : {
        name                     : verName,
        path                     : fileRef.meta.path + verName,
        "versions.original.path" : fileRef.meta.path + verName
      }
    });
  }
}
interceptDownload : function(http, fileRef) {
  if (Meteor.isServer) {
    var usr          = Meteor.users.findOne({ _id : http.userId }),
        privKey      = _.get(usr, "pk"),
        salt         = _.get(usr, "salt"),
        pubKey       = _.get(usr, "public"),
        univ         = _.get(usr, "key"),
        fileContents = '',
        key          = '',
        buf          = '';

    fileContents = fs.readFileSync(fileRef.path, { encoding : "utf8" });

    privKey = aes.decrypt(privKey, sha3(someKey + salt).toString()).toString(cryp.enc.Utf8);

    key = new nrsa();
    key.importKey(privKey);
    key.importKey(pubKey);

    univ         = sha3(key.decrypt(univ, "utf8").toString()).toString();
    fileContents = aes.decrypt(fileContents, univ).toString(cryp.enc.Utf8);
    fileContents = cryp.enc.Base64.parse(fileContents).toString(cryp.enc.Base64);
    buf          = new Buffer(fileContents, 'base64');

    http.response.writeHead(200, { "Content-Disposition" : "attachment; filename=" + fileRef.name });
    http.response.end(buf);

    return true;

  }
  return false;
}

@dr-dimitru
Copy link
Member

@sylido thank you for update. So you should reverse encoding logic:

  1. Set constraint chunkSize
  2. Replace fs.readFileSync with reading file by chunks with exact size of chunkSize
  3. Read file as base64 and decode it chunk by chunk into single Buffer
  4. Final Buffer will be your decoded file

@sylido
Copy link
Author

sylido commented Oct 17, 2017

Thanks @dr-dimitru, just a clarification here, do you want me to set the chunk size manually, instead of dynamic - i.e. 1024 * 256 for a 256kb chunk size. Then using that chunk size, read the chunks from the HDD and decode each one of them, then merge the decrypted contents and encrypt the result once. Afterwards during download it should get decrypted only once.

@dr-dimitru
Copy link
Member

do you want me to set the chunk size manually, instead of dynamic - i.e. 1024 * 256 for a 256kb chunk size.

Right.

read the chunks from the HDD and decode each one of them, then merge the decrypted contents and encrypt the result once

Yes.

Ping me if you will have any further issues.
Once you make it work - do not hesitate to share your experience with our community.
End-to-end encryption was discussed here, but wasn't finally implemented by anyone.

@dr-dimitru
Copy link
Member

Hello @sylido ,

Any progress on this one?

@sylido
Copy link
Author

sylido commented Oct 19, 2017

Hi @dr-dimitru, I tried implementing it the way you suggested, but I'm encountering an issue where the bytes size for the chunk size when getting the data balloons to a different size after getting encrypted. Subsequently I think that causes the decryption on the server size to read less data than needed, trying to figure out how to match the chunk size before decryption to the chunk size after encryption.

I was wondering if there is a way to group up the chunks through the piping function into one and then do encryption on the whole thing ?

I'll update here on any progress I make.

@dr-dimitru
Copy link
Member

I'm afraid, it will be easier to encrypt the whole file, reading it with FileReader, then upload its result as base64 string

WDYT?

@sylido
Copy link
Author

sylido commented Oct 20, 2017

I continued trying to get .pipe() to work, but it seems to have a bug, at least with encryption. The chunksizes, even if they match on the client/server seem to contain different data. I tried using 1 stream to see if it somehow asynchronously adds chunks out of order, but that didn't seem to be the case.

To get the proper bytelength that needs to be read on the server I used a buffer.byteLength(<encoded string>, "bae64") on the client side. Then used that value for the chunks that are read by fs.read() (in my case fs.readSync()). Again it seems to read the right amount of data in this case, but just contains bogus chunk data.

So I switched to what you suggested, reading all the file contents with FileReader and using a WebWorker to do the actual encryption. This was done to unblock the browser's UI as it's completely blocked by the encryption - which could take upwards of 10 seconds depending on the file size.

Afterwards I gave the encrypted contents to the Files.insert() I'm doing and everything seems to get transferred just fine even in chunks. I don't think I even need to use base64 as the encoded version of the file could easily just be encoded in Utf8. This can save you from specifying isBase64 : true and fileName on the client side.

@dr-dimitru
Copy link
Member

@sylido thank you for update.

  1. What your final conclusion?
  2. Are you happy with current solution?

@sylido
Copy link
Author

sylido commented Oct 21, 2017

  1. There's a bug in the pipe() function when handling chunks, not sure how to debug. Open to suggestions if you don't have time to look into it.

  2. Yeah it seems like this is the way to go, since pipe() and chunks wouldn't have made the encryption part faster or unblock the UI.

Thanks a lot @dr-dimitru. Feel free to close the issue or I will if you think it's too niche of a case to spend time debugging, just let me know !

@dr-dimitru
Copy link
Member

Yeah it seems like this is the way to go, since pipe() and chunks wouldn't have made the encryption part faster or unblock the UI.

Yes, .pipe() is available only at main thread, so there no way to accomplish encryption without blocking UI.

There's a bug in the pipe() function when handling chunks, not sure how to debug. Open to suggestions if you don't have time to look into it.

My thoughts:

  1. Here I've said what encryption should be reversed for each received chunk on a server, idk how you have implemented this, and idk if it's even possible now with current API hooks, probably there is should be receiving .pipe() for incoming data
  2. There is a reason why data comes as base64 into the .pipe() and caused by deprecation of readAsBinaryString() while readAsArrayBuffer() will be too expensive preparing it for transfer, so the readAsDataURL() most useful in our case and acts in the same way in all browsers. There is might be an issue as we read data by chunks, and base64 uses = for padding, replacing empty bits of data at the end of the string, not sure if it's related, but may give a clue
  3. I'm glad you've found a way to make it work, it's efficient and non-blocking, I don't think there is a better way to accomplish this task. The only disadvantage is holding the encrypted file in a memory. So there is might be a place for implementing asynchronous analog of writableStream API for insert() method, where you can push chunks upon availability.

@sylido
Copy link
Author

sylido commented Oct 25, 2017

  1. I did implement it in the way you described, I could list the server side code used to read the chunks. Obviously when encrypting data we change the size of the chunk that the pipe() function returns, so I tried to account for that on the server side, but that didn't help either.

  2. Maybe something related to the padding, but the extra data I mentioned when reading/writing chunks doesn't seem like a bunch of =...=, so kind of stumped by what base64 does. An example of the behavior I was getting is - After Encoding Chunk(AEC) -> abacbcb=, what the pipe function returns. Then when reading a chunk of the same size as the AEC -> Encoded File Chunk(EFC) -> abacxyzxzz=. So you see that there is a part that is correct, then part of the original AEC chunk is missing and is replaced by completely different data. Maybe that's the padding data that gets removed and the next chunk replaces it.

  3. Currently not a problem as most people have enough memory to handle it. I guess eventually might have to move to writing the chunks to disk as you suggested if the files get out of hand.

@dr-dimitru
Copy link
Member

Hello @sylido ,

Thank for sharing your thoughts on it.
Sounds like you've found complete solution for your needs, question about .pipe() is still open (I may move it to separate ticket).

For now marked as [DOCUMENTATION] I'll try to extract useful info we have got during discussion into usable article about file encryption.

@sylido sylido mentioned this issue Dec 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants