Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for busboy configuration parameters; 'defCharset', 'defParamCharset', 'highWaterMark' and 'fileHwm' #1102

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RopoMen
Copy link

@RopoMen RopoMen commented Jun 2, 2022

Hi,

I added support for these busboy configuration paramters. Reason for this is that I need ' defParamCharset' override from latin1 to utf-8. We have busboy 1.6.0 in our project and busboy has changed filename parsing from utf-8 to latin1 and it broke our application. Our Multer version is 1.4.5-lts.1 so these changes should be release on the lts side as well.

I would not like to add iconv-lite package to our application, I would prefer Multer to allow to set these properties.

I tried to upgrade busboy to 1.6.0 in Multer, but it broke too many unit tests (30+). I changed test/unicode.js to use filename åäöèøßð.dat and with busboy 1.6.0 result will be aÌaÌoÌeÌøÃð.dat and it can be fixed by adding busboy configuration defParamCharset = 'utf-8'

Fixes:
#320
#846

`limits` | Limits of the uploaded data. **Defaults:** check [busboy's page](https://github.com/mscdex/busboy#exports)
`preservePath` | Keep the full path of files instead of just the base name. **Default:** check [busboy's page](https://github.com/mscdex/busboy#exports)
`charset` | Character set to use when one isn't defined. Sets busboy's option `defCharset`. **Default:** `utf-8`. List of [available charsets](https://github.com/mscdex/busboy/blob/master/lib/utils.js#L384)
`paramCharset` | For multipart forms, the default character set to use for values of part header parameters (e.g. filename) that are not extended parameters (that contain an explicit charset). Sets busboy's option `defParamCharset`. **Default:** `utf-8`. List of [available charsets](https://github.com/mscdex/busboy/blob/master/lib/utils.js#L384).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default charset be latin1 if the existing default is latin1 ?

You are trying to override latin1 to be utf-8 correct ( in your case ) ?

If the existing default is latin1, and you are changing the default to utf-8 then this would be a breaking change I believe

Copy link
Author

@RopoMen RopoMen Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are trying to override latin1 to be utf-8 correct ( in your case ) ?

Yes, and the reason for that is that Busboy has parsed filenames as utf8 before this commit 3 months ago mscdex/busboy@d7e4e2d

My opinion is that Multer should set its own defaults to ensure consistent behaviour and not rely on busboys defaults, because those may change without notice. This could be dealt with fixed version pinnings in package.json, but that may not be the best option.

Also preservePath should be set to false by Multer to ensure consistent behaviour. Although with that parameter I don't think that Busboy would change it to be true. Reasoning for this is that it depend on the client whether or not it sends the path of the file or not. And most of the clients don't send it.

And for the parameter handling; I didn't want to use same parameter names as Busboy because I see that Multer could someday relay on some other solution than Busboy and for example highwatermark params were named inconsistently in Busboy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Busboy currently is not interested if filename field is said to be utf-8, it still uses latin1 decoder at least in Busboy 1.6.0.

Here Busboy takes filename from multipart form. It does not matter whether or not filename has encoding set like this filename*=utf-8 it still uses same decoder, that is defined here and if defParamCharset is not given, decoder will be latin1

And because of this knowledge, I think that this unit test in lts branch is broken https://github.com/expressjs/multer/blob/lts/test/unicode.js#L37-L44 and proper fix would be configuring defParamCharset: 'utf8' into Busboy, either Multer provides configuration option for that charset or hardcodes one into make-middleware.js

We also tested yesterday that Chrome is not adding encoding like this filename*=utf-8 into form. (we are using FormData)

I just checked and Busboy 1.4.0 has worked ok with utf-8 filenames, but after that the breaking commit is added
Screenshot 2022-06-03 at 7 29 31

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I should comment this Unicode unit test again https://github.com/expressjs/multer/blob/lts/test/unicode.js#L37-L44

Because Busboy is handling filename*=utf-8 in a same way as filename= AND because Busboy >=1.4.0 parsed filenames always as utf-8; that test passed always. With Busboy 1.5.0+ that test should fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, it didn't fail. So I'm missing now something here. 🤔

We had Multer 1.4.2 that installed Busboy 0.2.14. Filename test-åäöèøßð.pdf is uploaded from Chrome 100
Screenshot 2022-06-03 at 9 45 23

Filename looks like that and Multer returns req.file.originalname as test-åäöèøßð.pdf

Because of this npm audit issue GHSA-wm7h-9275-46v2 we updated Multer to 1.4.5-lts.1 that installs Busboy 1.6.0

Now, browser sends file as it did previosly test-åäöèøßð.pdf, but Multer req.file.originalname is test-aÌaÌoÌeÌøÃð.pdf. This can be fixed by setting defParamCharset = utf-8 into Busboy.

@RopoMen
Copy link
Author

RopoMen commented Jun 6, 2022

@zacharytyhacz can you give any estimate how quickly this could be merged, cherry-picked into lts and making new LTS release? If ETA is 1-3 days I can wait, otherwise I need to add iconv-lite and fix this temporarily.

@zacharytyhacz
Copy link

Hey @RopoMen - I am not a maintainer or anything, I was just looking at this PR

@RopoMen
Copy link
Author

RopoMen commented Jun 7, 2022

@LinusU ping.

@RopoMen
Copy link
Author

RopoMen commented Jun 9, 2022

Also issue #1082 is related to this.

@RSS1102
Copy link

RSS1102 commented Jun 10, 2022

I also encountered the problem of "Chinese garbled code" today. Thank you. I got the solution - v1.4.5-lts 1.thanks.

@RopoMen
Copy link
Author

RopoMen commented Jun 13, 2022

I now use this temporary fix ( https://github.com/expressjs/multer#multeropts )

fileFilter: (req, file, cb) => {
  file.originalname = Buffer.from(file.originalname, 'latin1').toString(
    'utf8'
  )
  cb(null, true)
},

@ghost
Copy link

ghost commented Jul 7, 2022

I ❤️ u @RopoMen - been struggling with this issue the whole day, ur comment fixed it. Ty so much.

@marckornberger
Copy link

@RopoMen After an entire day of eternal pain, you are truly a hero! Thanks so much for posting your solution! ❤️

@adrianmxb
Copy link

It is a good idea to make this configurable.
Maybe someone can also add a notice to the recent release or something as the update from 1.44 to 1.45 is breaking stuff.

@luoxzhg
Copy link

luoxzhg commented Aug 15, 2022

original issue

a better solution

        if (!/[^\u0000-\u00ff]/.test(file.originalname)) {
            file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')
        }

@bf
Copy link

bf commented Aug 31, 2022

What is blocking a merge of this PR?

Currently, multer in default config us unusable for file uploads with non-ASCII characters in the filename!

@luojiong
Copy link

This problem still exists until October 22 @RopoMen but thanks I solved this problem with the method you gave me。my version is 1.4.7

@Flo0806
Copy link

Flo0806 commented Oct 25, 2022

Hello!

Thx @RopoMen for your solution. But one problem i have: If I convert the originalname with your code and after i wanna download a file with this name like

...
res.set("Content-Disposition", "attachment; filename="+convertedName);
res.set("Access-Control-Expose-Headers", "Content-Disposition");
...

(I convert the name inside fileFilter and later i download this file with a NodeJS Backend using ExpressJS)

I got the follow Error:

TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["Content-Disposition"]
    at ServerResponse.setHeader (node:_http_outgoing:606:3)
........
........
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  code: 'ERR_INVALID_CHAR'

If i look at the byte code (Buffer.from(file.originalname, 'latin1')) And convert the same name by hand like Buffer.from("FileNameToConvertÄ.pdf", 'utf8') it looks different, too.

Any solution for this problem? I use Content-Disposition to get the current filename and its important for my app (a DMS Drive App).

---------- UPDATE ----------
For all with the same problem, here is a solution:

Express (were fileNames[0].name is your filename from a database as example)

 res.setHeader(
        "Content-Disposition",
        `attachment; filename*=UTF-8''${encodeURI(fileNames[0].name)}`
      );

If the route is a GET a browser will convert the filename back. If you handle it with axios or httpClient (Angular) you need to use decodeURI() to convert the name back. And all works as well!
---------- UPDATE ----------

Thx and Greetings, Florian

@RopoMen
Copy link
Author

RopoMen commented Oct 26, 2022

@Flo0806 check this npm package https://www.npmjs.com/package/content-disposition

@jhpung jhpung mentioned this pull request Jan 11, 2023
@jhpung
Copy link

jhpung commented Jan 13, 2023

I published a multer-utf8 package on npm that read files as utf8 charset by default.

https://www.npmjs.com/package/multer-utf8

It may help you until this pr is approved.

@mkesper
Copy link

mkesper commented Mar 25, 2024

How someone can think that latin1 is still preferrable to utf-8 is beyond me. Please merge a fix for this so we can use non-ASCII chars in filenames without workarounds.

@Doc999tor
Copy link

This PR #1210 is backward compatible to the current multer behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet