Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a better error when trying to read a zip file without a manifest #264

Open
ctb opened this issue Mar 5, 2024 · 4 comments · May be fixed by #281
Open

provide a better error when trying to read a zip file without a manifest #264

ctb opened this issue Mar 5, 2024 · 4 comments · May be fixed by #281

Comments

@ctb
Copy link
Collaborator

ctb commented Mar 5, 2024

When we run:

scripts manysearch zip-with-manifest.zip zip-with-manifest.zip -o foo.csv

it works fine; but with

sourmash scripts manysearch zip-without-manifest.zip zip-without-manifest.zip -o foo.csv

we get

Reading query(s) from: 'zip-without-manifest.zip'
Sketch loading error: No such file or directory (os error 2)
WARNING: could not load sketches from path 'PK'
No valid signatures found in query pathlist 'zip-without-manifest.zip'
WARNING: 1 query paths failed to load. See error messages above.
No query signatures loaded, exiting.
Reading search(s) from: 'zip-without-manifest.zip'
Sketch loading error: No such file or directory (os error 2)
WARNING: could not load sketches from path 'PK'
No valid signatures found in search pathlist 'zip-without-manifest.zip'
WARNING: 1 search paths failed to load. See error messages above.
No search signatures loaded, exiting.
DONE. Processed 0 search sigs
...manysearch is done! results in 'foo.csv'

It's fine that it doesn't work but it'd be nice to provide a better error message, like "this is a zip file without a manifest, you suxor."

Related issues:

@ctb
Copy link
Collaborator Author

ctb commented Mar 19, 2024

So, this seems a little weird - in utils.rs,

pub fn collection_from_zipfile(sigpath: &Path, report_type: &ReportType) -> Result<Collection> {
    match Collection::from_zipfile(sigpath) {
        Ok(collection) => Ok(collection),
        Err(_) => bail!("failed to load {} zipfile: '{}'", report_type, sigpath\
),
    }
}

and then in sourmash collection.rs,

    pub fn from_zipfile<P: AsRef<Path>>(zipfile: P) -> Result<Self> {
        let storage = ZipStorage::from_file(zipfile)?;
        // Load manifest from standard location in zipstorage                   
        let manifest = Manifest::from_reader(storage.load("SOURMASH-MANIFEST.csv")?.as_slice())?;
        Ok(Self {
            manifest,
            storage: InnerStorage::new(storage),
        })
    }

and it would seem to me that storage.load should fail if SOURMASH-MANIFEST.csv doesn't exist in the zip file, and the error should be propogated.

@ctb
Copy link
Collaborator Author

ctb commented Mar 19, 2024

Figured it out - load_collection is trapping and ignoring the error from collection_from_zipfile.

My guess is that if we encounter an unloadable zipfile we should error-exit - there's no other collection type that should end in .zip - but will think on't.

@bluegenes
Copy link
Contributor

My guess is that if we encounter an unloadable zipfile we should error-exit - there's no other collection type that should end in .zip - but will think on't.

Agreed.

this comment is perhaps better here, since I suggested the same solution: #280 (comment)

So the reason errors are not automatically propagated up is because of way I changed sequential loading to allow manifests (in load_collection). Previously, if the extension was 'zip', we tried loading as a zipfile. If not, we tried as sig first and then fell back to pathlists. Since manifests and pathlists are so similar, I was having trouble integrating manifests into the same strategy.

Now, we try each loading function (zip > manifest > signature > pathlist). If the file can be loaded, we will load, even if that collection is empty. If we encounter an error, we track it and report the final error. I thought this was working well for reporting the right errors, but apparently not when the zip fails?

I'm definitely open to better sequential loading / error propagation strategies.

A simple zipfile fix would be to only use the zip loading for '.zip' and report errors directly. For the rest, I'm not sure how to better manage loading functions without enforcing file extensions or otherwise specifying which type of input we have (and therefore which loading methods we should try).

@ctb
Copy link
Collaborator Author

ctb commented Mar 22, 2024

Right, we run into the same problem in sourmash of course - load_save.py. There I think we catch only ValueError as "this is likely NOT the right database, so try next" and/or detect None being returned as the same thing, while other exceptions are raised and reported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants