Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable accumulating files in memory while using attachFieldsToBody #385

Open
2 tasks done
MolotovCherry opened this issue Aug 30, 2022 · 18 comments
Open
2 tasks done
Labels
feature request New feature to be added semver-minor Issue or PR that should land as semver minor

Comments

@MolotovCherry
Copy link

MolotovCherry commented Aug 30, 2022

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the feature has not already been requested

馃殌 Feature Proposal

Allow a flag in options to disable accumulation of files in memory, and let the caller handle the files themselves

Currently, if you want to use attachFieldsToBody, you're forced to either allow the library to accumulate files in memory, or provide an onFile function... Which also forces you to process the file (whether by streaming or accumulating).

I'd like to use the attachFieldsToBody feature, but I don't want any of my files accumulated or processed before the api function, because I need to handle them separately later and do streaming.

Motivation

attachFieldsToBody is a useful feature, and I'd rather not have to re-implement it so I can disable the accumulation

Example

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', disableAccumulation: true })

@mcollina
Copy link
Member

How would you implement this feature?

@MolotovCherry
Copy link
Author

MolotovCherry commented Aug 30, 2022

How would you implement this feature?

I think something like this is simple enough.
fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', disableAccumulation: true })

@mcollina
Copy link
Member

Thanks for reporting! Would you like to send a Pull Request to address this issue? Remember to add unit tests.

@mcollina mcollina added the semver-minor Issue or PR that should land as semver minor label Aug 30, 2022
@Eomm
Copy link
Member

Eomm commented Sep 4, 2022

Since this usage does not accumulate:

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: true, onFile })

I would support this new usage instead of adding a new option:

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', onFile })

@MolotovCherry
Copy link
Author

MolotovCherry commented Sep 4, 2022

Since this usage does not accumulate:

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: true, onFile })

I would support this new usage instead of adding a new option:

fastify.register(require('@fastify/multipart'), { attachFieldsToBody: 'keyValues', onFile })

We have a need to decide on how to process files by stream inside the API method itself. From what I understand, this usage requires the files be processed before it even reaches the API method.

Isn't this the case? Did I miss something, and it's possible to let the API method handle the file by stream even though an onFile callback is specified?

@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 4, 2022

Well you have somewhere to store the data. There is no rewind for these fileStreams. So you have to store them somewere if you want to still be able to process them.

@MolotovCherry
Copy link
Author

MolotovCherry commented Sep 4, 2022

Well you have somewhere to store the data. There is no rewind for these fileStreams. So you have to store them somewere if you want to still be able to process them.

This was my motivation for creating the issue. The case where you can't both use attachFieldsToBody and streams in the API method at the same time. (This is why I went with a disableAccumulation option)

Currently the only to use streams in this way is with attachFieldsToBody off

@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 4, 2022

I am really confused. What is what you want to achieve? You want to attach the fields to the body but on the other hand you want to stream the file content. Ok. So maybe you dont wantthat the files get attached to the body as strings? Is that what you want?

@MolotovCherry
Copy link
Author

MolotovCherry commented Sep 4, 2022

I am really confused. What is what you want to achieve? You want to attach the fields to the body but on the other hand you want to stream the file content. Ok. So maybe you dont wantthat the files get attached to the body as strings? Is that what you want?

Yeah, that's right. I'd like to have the fields attached to body, but don't want files handled at all. I want to directly stream the files myself in the API method (not before it reaches it)

Because I can't do both, I've had to re-implement attachFieldsToBody, but it would be much cleaner to let the library do this (after all, the option exists in the first place). This way I can use fastify schema validations on the API method, rather than having to manually validate afterwards inside the API method itself (which results in messier code)

@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 4, 2022

Well imho there is way to do tihs. You can not "stop" the stream. You have to store it somewhere.
Imagine you have a form with three fields, one of them is the file. Now you upload it, and first you get the file, and after it you get the field. As long as you not handle the file stream you will not get the field. So you have to process the file stream.

The only solution I know is to store the file into the memory, or to store it into redis.

redis/ioredis#1376

With RedisWritable, you could store the uploaded file into redis as onFile, and then in the handler, you can read it again with the RedisReadable.

@MolotovCherry
Copy link
Author

Imagine you have a form with three fields, one of them is the file. Now you upload it, and first you get the file, and after it you get the field. As long as you not handle the file stream you will not get the field. So you have to process the file stream.

Yes, in that case you would have to handle the file first, but as long as you order the fields first and have the file field be the very last one, this should be fine

@Uzlopak
Copy link
Contributor

Uzlopak commented Sep 4, 2022

Well I think it will not be possible to implement it.

@v4dkou
Copy link

v4dkou commented Sep 6, 2022

For the time being I disabled attachFieldsToBody and attach fields in a loop like this:

const body = {}
for await (const part of request.parts()) {
    if (part.file) {
        await pump(part.file, fs.createWriteStream(part.filename)) // `const pump = util.promisify(pipeline)` near imports as in other examples from docs
    } else {
        body[part.fieldname] = part // TODO: Array handling
    }
}

What might go wrong if the async generator function .parts would assign fields like this internally?

Of course, request.body would only be properly available after request.parts() is iterated over, but:
a) that seems intuitive to me
b) that could be mentioned in the docs along the other quirks like attachFieldsToBody: 'keyValues' converting all files with buffer.toString() by default

@liudichen
Copy link

Maybe, we can custom the onFile option , process the file stream, store files at local disk, and then attach local paths of files instead of files to body.

@Uzlopak
Copy link
Contributor

Uzlopak commented Mar 17, 2023

To be honest... This is the implementers duty to handle the streams. and tbh there are so many ready to use stream Implementations... An implementer just needs to correctly pipe the file streams.
I am strongly against storing data on the harddisk by default. I can already see, that then people complain that their harddisk is full and we did not implement logic to clean the files after they were processed. Then the next comes and asks why we dont stream it into redis or mongo gridFS, or, or, or...

@liudichen
Copy link

liudichen commented Mar 19, 2023

To be honest... This is the implementers duty to handle the streams. and tbh there are so many ready to use stream Implementations... An implementer just needs to correctly pipe the file streams. I am strongly against storing data on the harddisk by default. I can already see, that then people complain that their harddisk is full and we did not implement logic to clean the files after they were processed. Then the next comes and asks why we dont stream it into redis or mongo gridFS, or, or, or...

yes. we can do this just by passing a custom onFile to save to a local disk.

const onFile =  (field, file, filename, encoding, mimetype, body) =>  {
  const fileData = []
  const lastFile = body[field][body[field].length - 1]
  file.on('data', data => { if (!lastFile.limit) { fileData.push(data) } })
  file.on('limit', () => { lastFile.limit = true })
  file.on('end', () => {
    if (!lastFile.limit) {
      const buffer = Buffer.concat(fileData)
      lastFile.size = buffer.length
      const dir = path.join(os.tmpdir(), 'fastify-multipart')
     if (!fs.existsSync(dir)) { fs.mkdirSync(dir) }
    const filepath = path.join(dir, toID() + path.extname(filename))
     try {
        fs.writeFileSync(filepath, buffer)
        lastFile.filepath = filepath
     } catch (error) {
        lastFile.error = 'error when write to disk'
     }
     delete lastFile.data      
    } else {
        delete lastFile.data
    }
  })
}

However, I suggest whether the onFile configuration item can add several parameters, such as the options ( registration), or add an array parameter to collect all files (called requestFiles, for example, by which we can easily get files or clean them, or maybe add the logic to clean them in the cleanRequestFiles func) to enhance the customization ability of the onFile configuration item

@felicienfrancois
Copy link

I have the exact same issue.

Here what I would like to achieve :

fastify.post("/publish", function(request, reply) {
  const body = request.body;
  body.mainImageFile.file
    .pipe(imageResizer(1000,1000))
    .pipe(cloudStorage("some/path/" + body.mainImageFile.filename"));
  body.otherImageFile.file
    .pipe(imageResizer(500,500))
    .pipe(cloudStorage("some/path/" + body.otherImageFile.filename"));

   // use other fields as string
   body.title  // title as string
   body.description  // description as string
   ...
}

I even think it should be the default behavior. Fields are named in the request formdata so i expect to retrieve all fields (file and non-file) by their name in the body.

the attachFieldsToBody flag have probably been created for this purpose.
It is processing non-file fields as expected
JSON files are loaded and parsed which is a OK behavior to my opinion, as its the most common usage of such input.
Binary files are loaded in memory (in buffer) and set as the value (loosing contentType and filename metadatas).

I'm not sure it fits the usual need. I think binary files should be usable as stream and metadata should be available in value.

So I tried to do it with an onFile function:

// does not work
onFile: function(part) {
  part.value = part;
}

But this does not work as onFile must consume the stream in order to reach the next field or the end. The API method is only called when all files are consumed.

@maximelafarie
Copy link

maximelafarie commented Nov 16, 2023

I'm in the same situation as you @felicienfrancois.
After that, I don't think adding them to the body changes anything to the general need to only retrieve fields (other than the "file" type) in JSON format a more "convenient way".

I think a "helper" function would be a good thing, allowing us to do some variable destructuring.
E.g.:

const { fields, filename, mimetype } = await request.file(); // Actual behavior
const { myNonFileField, myOtherField } = fields.getFields(); // would be nice (but may return files)

// const { myNonFileField, myOtherField } = fields.getFields({ includeFiles: false }); // would be even nicer

Since actual fields (from await request.file();) value looks like:

fields: <ref *2> {
    file: [Circular *1],
    myExampleField: {
      type: 'field',
      fieldname: 'myExampleField',
      mimetype: 'text/plain',
      encoding: '7bit',
      value: 'myValue',
      fieldnameTruncated: false,
      valueTruncated: false,
      fields: [Circular *2]
    }
  }

The getFields() helper could just filter the fields by their mimetype or even if the key is different from file.

I don't know what @mcollina and @Eomm think about that.

@gurgunday gurgunday added the feature request New feature to be added label Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature to be added semver-minor Issue or PR that should land as semver minor
Projects
None yet
Development

No branches or pull requests

9 participants