Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with UTF-8 characters in filename #1104

Open
CleyFaye opened this issue Jun 9, 2022 · 16 comments · May be fixed by #1210
Open

Issue with UTF-8 characters in filename #1104

CleyFaye opened this issue Jun 9, 2022 · 16 comments · May be fixed by #1210

Comments

@CleyFaye
Copy link

CleyFaye commented Jun 9, 2022

Hi,

I found recently that something changed regarding the handling of filename containing utf-8 characters; they seem to be passed as-is, which was not the case before.

After investigating a bit I could reproduce the issue with the minimal code in https://github.com/CleyFaye/test-multer

I found that the browser side just pass the name as-is in the "filename" part of the header.
I've seen another issue related to using "filename*", but there is two problem with that: the browser's formdata does not use this, and RFC7578 actually says it should not be used.

What would be the proper way to handle this? Obviously it is possible, server side, to convert the content of originalname by putting all characters as bytes in an array then interpreting it as an utf-8 string (it does work), but since I never had this issue with older versions, I suspect something changed in the way multer handles this.

@CleyFaye
Copy link
Author

CleyFaye commented Jun 9, 2022

The small test provided returned the expected filename with multer@1.4.4, and changed with multer@1.4.4-lts.1.

@dvantage
Copy link

Same problem after update from 1.4.4 on 1.4.5-lts.1

@dvantage
Copy link

dvantage commented Jun 11, 2022

Multer has nothing to do with it, Busboy has changed something.
mscdex/busboy#20

This solved my problem:

file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')

@CleyFaye
Copy link
Author

Multer has something to do about this, since it definitely changed behavior in an arguably incompatible way in what looks like a patch revision.

What to do however I'm not sure; either way would be fine (interpreting the utf-8 to be consistent with previous behavior or passing the raw string to not make assumptions about encoding), but I believe this kind of change in a patch is troublesome to users.

@ghost
Copy link

ghost commented Jun 14, 2022

Multer has nothing to do with it, Busboy has changed something. mscdex/busboy#20

This solved my problem:

file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')

God bless you.

I've managed to make a bodge in my app

    const fileName = Buffer.from(el.originalname, 'latin1').toString('utf8');

because in my case invalid £$ file.txt was becoming invalid £$ file.txt.
Ideally we have this fixed when busboy is fixing that end.
Thanks a lot.

BobbyWibowo added a commit to BobbyWibowo/lolisafe that referenced this issue Jul 3, 2022
@sominlee74
Copy link

sominlee74 commented Jul 27, 2022

HI, I faced same issue with the filename in Korean. I found out that the issue is relevant to "busboy', especially config property of "defParanCharset." The default value of that property is 'latin1', which means some parameters like non-latin filename in input-form is misdecoded on nodejs side without proper configuration. However, in the "multer" we don't have option to change the config properties of busboy.

I hope the line 28 in '/lib/make-middleware.js' will be changed such as:
busboy = Busboy({ headers: req.headers, limits: limits, preservePath: preservePath, defParamCharset: 'utf8' })

At least, some way to configure busboy through multer module would be required.

@bf
Copy link

bf commented Aug 31, 2022

This issue is still relevant. Multer should not deviate from utf-8 default. An multer option should be created so that we can influence busboy defParamCharset.

@jhpung
Copy link

jhpung commented Jan 13, 2023

I published a multer-utf8 package on npm that read files as utf8 charset by default.

https://www.npmjs.com/package/multer-utf8

@lujijiang
Copy link

The problem still exists, please fix it quickly

@LinusU
Copy link
Member

LinusU commented Apr 7, 2023

Just to clarify, in Multer 1.4.4 the name was parsed as utf-8, and in Multer 1.4.5-lts.1 it's parsed as latin1?

In that case it seems straight forward to add defParamCharset: 'utf8' so that the new version behaves the same as the previous...

@Doc999tor
Copy link

  1. Tried both defParamCharset and defCharset - has no effect
multer({
	storage,
	defParamCharset: 'utf8',
	defCharset: 'utf8',
})
  1. As far as I see, only selected options are passed from the config to busboy
    https://github.com/expressjs/multer/blob/25794553989a674f4998b32a061dfc9287b23188/index.js#LL11C1-L23C2

@ngovanduy0908
Copy link

@CleyFaye @dvantage thank you very much

@TiuBen
Copy link

TiuBen commented Aug 25, 2023

why the default Postman is right?

@starnayuta
Copy link

@TiuBen

Postman uses “filename*”, so filename problems do not occur.
But browsers do not use it.

see #1104 (comment)

I've seen another issue related to using "filename*", but there is two problem with that: the browser's formdata does not use this

@TiuBen
Copy link

TiuBen commented Aug 26, 2023

where can I use filename*

jrafaaael added a commit to jrafaaael/thing-assistant that referenced this issue Oct 10, 2023
Cohee1207 added a commit to SillyTavern/SillyTavern that referenced this issue Apr 27, 2024
@stouch
Copy link

stouch commented May 1, 2024

Is this solved ? I still got this issue and Buffer.from(file.originalname, 'latin1').toString('utf8') solves it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.