Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support detection of Placeholder files #355

Open
HEIC-to-JPEG-Dev opened this issue Dec 14, 2023 · 5 comments
Open

Support detection of Placeholder files #355

HEIC-to-JPEG-Dev opened this issue Dec 14, 2023 · 5 comments

Comments

@HEIC-to-JPEG-Dev
Copy link

When getting metadata from photos, everything is fine until it hits a Placeholder file (OneDrive/iCloud, etc.).

For those files the metadata cannot be extracted without downloading the file from the cloud which is a very long process, requires the internet, and defeats the disk saving feature of those files.

The information is available from the file, just not by opening it; for example, the Windows property system will have a subset of properties (depending on the sync providor); this can also be read using the system indexer.

Is there any potential for this being implemented? I understand that it would be a Windows only feature.

@drewnoakes
Copy link
Owner

I'm not familiar with these placeholder files. Are they still real files that can be opened and parsed? If so, we'd be open to discussing further and might ultimately accept a PR.

@HEIC-to-JPEG-Dev
Copy link
Author

They are still files, but sparse files. That is, they are < 1K in size (regardless of how big the real file is). The real file is kept in a sync providors cloud service (iCloud, OneDrive, Google, etc.).

If you try to open the file; windows Kernal instructs the sync providor to doanload the file and make it fully available.

So, when I use metdadataExtractor against one of these file, technically it works as your code doesn't see what goes on behind the scenes. But it leads to every file being downloaded, which is against the whole point of these types of files.

What Placeholder file aware code should be doing is identifying that the file is a sparse file, then asking the Windows property system what properties are available (think EXIF data) and then asking for those properties. This is very fast.

Sync providors typically fill the 1K sparse file with a thumbnail, common properties for image, video, music, etc., and other information.

@drewnoakes
Copy link
Owner

If you try to open the file; windows Kernal instructs the sync providor to doanload the file and make it fully available.

How would MetadataExtractor read the file if the kernel's going to transparently intercept the file system request and do the download?

I would be concerned that any fix here would be platform-specific.

@HEIC-to-JPEG-Dev
Copy link
Author

Apple has the same concept (store full versions in iCloud) and on Windows, Apple do the same as well as Microsoft.
The information you're "allowed" to get is part of the placeholder file format - go beyond that, and it will download the file.

I'm sure it's implemented differently on Windows and Mac, but anything that gets the metadata from multiple files will hot issues.

@drewnoakes
Copy link
Owner

It would help if you could find some analysis or documentation about the file formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants