Store file locations #839

trevorgerhardt · 2022-11-18T02:40:23Z

The code base is littered with code that generates file locations from metadata. This code is certainly necessary during the upload of those files or creation of new files, but once the files exist in our system we should no longer need to "create" the filename and path, only retrieve it.

For example,

In aggregation areas we have a method that generates the S3 Path.
In opportunity datasets we have methods that generate the storage location.
For regional analyses, results and locations are generated on demand.

Each instance of generating the storage location is not problematic in and of themselves, but they add up across the code base. Improvements in this area could take a massive migration, both in the database and the stored files but I believe it would be well worth it.

Storing file locations

I see two different options for storing the locations:

We can store the paths directly on the models, in a common format that aligns with our "File Storage" implementation.
We create a file collection in the database with an entry for each file.

We've discussed the second and have partially done it with data sources. But data sources attempt to do too much. I think extracting out a shared "file" collection would be very beneficial. We could model it like:

type FileItem = {
  _id: UUID
  name: string

  // Parameters to generate a `FileStorageKey` from:
  bucket: string 
  path: string

  // Auth
  accessGroup: string
  createdBy: string

  // Metadata
  bytes: number // File size, in bytes
  isGzipped: boolean
  type: string // MIME Type
}

All other types that have a file would reference it by its _id. Opportunity datasets and aggregation areas would have a fileItemId parameter now.

We would also be able to lookup all the files for a specific access group and calculate the storage size of a specific access group's uploaded data.

There are certain files this would not apply for, like Taui sites, which pre-generate thousands of files.

The text was updated successfully, but these errors were encountered:

trevorgerhardt added cleanup t1 Time level 1: think days labels Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store file locations #839

Store file locations #839

trevorgerhardt commented Nov 18, 2022

Store file locations #839

Store file locations #839

Comments

trevorgerhardt commented Nov 18, 2022

Storing file locations