Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to external file content from metadata #50

Open
matheus23 opened this issue Nov 14, 2022 · 6 comments
Open

Link to external file content from metadata #50

matheus23 opened this issue Nov 14, 2022 · 6 comments

Comments

@matheus23
Copy link
Member

matheus23 commented Nov 14, 2022

Goal: Solve issues raised in discussions #18 and #49.
The use case is a file system that supports more than just a single byte array as a file's data. E.g. storing thumbnails of images.

On the public file system this can be supported with adding CIDs that point to more (unixfs) data into metadata, but on the private file system this doesn't work as expected: The CID isn't "discoverable", it needs to be added to the private forest to be syncable. It also needs to be encrypted.

Thus the need for linking to external encrypted bytes from the metadata.

This would be pretty much supporting the same type that .content supports, but within metadata.
For the private side that means that metadata may include content that looks like this:

type ExternalContent = {
  key: Key
  blockSize: Uint64 // in bytes, at max 262132
  blockCount: Uint64
}

For example, this could be a file's metadata (here represented as JSON) with a link to some thumbnail:

{
  "createdAt": 9999999,
  "modifiedAt": 9999999,
  "thumbnail": {
      "key": "<bytes>",
      "blockSize": 262132,
      "blockCount": 3
  }
}

We should also specify that implementations support operations like read_metadata_linked(file, metadataKey: string): Bytes.

@matheus23
Copy link
Member Author

matheus23 commented Dec 1, 2022

One remaining question to resolve: What does should happen if some process writes to the file that doesn't know about the mechanics behind that metadata.

An example:

  1. I add a picture on my capyloon device. It creates a file with a thumbnail stored in the metadata
  2. I open up that picture on e.g. some kind of web-drive viewer and change the image. The web-drive editor doesn't know about thumbnail metadata.

What should happen? Should the new revision inherit all metadata from the previous revision or should the new revision not have metadata associated?


Perhaps it makes sense to have two types of metadata:

  1. Metadata on the specific version of a file. Every revision gets a reset of this metadata. Let's call this "revision metadata"
  2. Metadata on the file name/identity. New revisions inherit metadata from old revisions. Let's just call this "file metadata".

Updating a file clears revision metadata, but keeps file metadata.
Updating file or revision metadata will create a new revision with updated metadata. If only the metadata changes, even the revision metadata gets inherited!

These things would/could be file metadata:

  • Tags
  • mime type(?)
  • creation time

These would/could be revision metadata:

  • modification time
  • associated thumbnail
  • person/process that created this revision

@jessmartin
Copy link

@matheus23 and I discussed this, but just to capture here: I would suggest always copying all metadata and not introduce any distinctions. Leave it up to the file writer to properly account for metadata that should be cleared or updated.

My argument is that the writer always has to know about the metadata in order to handle it correctly. If the web-drive editor doesn't know about thumbnail metadata, a thumbnail for the new version will still be missing, which is just a different kind of wrong.

To handle the case described above with thumbnails, the file's metadata could have a thumbnail_for_cid field that contains the cid of the image that the thumbnail was created for. That way, you could look at the field to know whether the thumbnail is up-to-date with the current version of the image.

@fabricedesre
Copy link

To handle the case described above with thumbnails, the file's metadata could have a thumbnail_for_cid field that contains the cid of the image that the thumbnail was created for. That way, you could look at the field to know whether the thumbnail is up-to-date with the current version of the image.

Do you mean here that the "thumbnail_for_cid" would be a hardcoded field in the metadata, or is it possible to add whatever is needed? In Capyloon the thumbnail case is just one example of the generic concept of "variants" for a resource.

@matheus23
Copy link
Member Author

Do you mean here that the "thumbnail_for_cid" would be a hardcoded field in the metadata, or is it possible to add whatever is needed? In Capyloon the thumbnail case is just one example of the generic concept of "variants" for a resource.

Yeah, I'm pretty sure "thumbnail" was just illustrative from Jess.

We're exploring designs for the case when a process that doesn't understand file variants makes a write. Are file variants transferred from old revision to new revision? If so, how would we detect that they're outdated? (One option may be for them to refer to the CID they were generated from, if that's how it worked.)

@fabricedesre
Copy link

I think variants need to be transferred to the new revision in general. You're right that in some cases this will lead to out of sync variants, but not always:

  • modifying an image -> out of sync thumbnail.
  • modifying a contact phone number -> no need to update the picture variant (but the vcard one would be out of date!).

In Capyloon VFS I'm finalizing the work to automatically run "transformers" when the default representation of a resource is modified to address these use cases without involving the clients. Do you want to bring compute to wnfs? 😃

@matheus23
Copy link
Member Author

Do you want to bring compute to wnfs? 😃

I know you're joking but... maybe? :P
IPVM + IPFS could be a thing :D
Just not today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants