Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior in Files.exists() #859

Closed
lbergelson opened this issue Mar 11, 2022 · 8 comments · Fixed by #980 · May be fixed by #863
Closed

Inconsistent behavior in Files.exists() #859

lbergelson opened this issue Mar 11, 2022 · 8 comments · Fixed by #980 · May be fixed by #863
Assignees
Labels
api: storage Issues related to the googleapis/java-storage-nio API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@lbergelson
Copy link
Contributor

Files.exists(gsPath) will sometimes throw an exception when a Path uses a bucket that doesn't exist.
Otherwise it returns true if the bucket doesn't exist. It seems like it should return false.

The truthyness depends on the psuedoDirectory settings.

Environment details

  1. gcloud nio
  2. OS type and version: all
  3. Java version: all
  4. version(s): current

Steps to reproduce

  1. usePseudoDirectories = true
  2. Files.exists() on a gs: path with a non existent bucket and no trailing /
  3. com.google.cloud.storage.StorageException: The specified bucket does not exist.

Code example

It crashes only when there isn't a trailing / because of an inconsistency in the way the pseudo directories are handled.

This explodes:
    Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing")));

These confusingly return true:
    Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing/")));
    Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist)));

Stack trace

com.google.cloud.storage.StorageException: The specified bucket does not exist.

	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:233)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:376)
	at com.google.cloud.storage.StorageImpl.lambda$listBlobs$11(StorageImpl.java:391)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:51)
	at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:388)
	at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:359)
	at com.google.cloud.storage.contrib.nio.CloudStoragePath.seemsLikeADirectoryAndUsePseudoDirectories(CloudStoragePath.java:126)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:743)
	at java.nio.file.Files.exists(Files.java:2385)
	at com.google.cloud.storage.contrib.nio.it.ITGcsNio.assertNoCrashWhenBucketDoesntExist2(ITGcsNio.java:356)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at com.google.cloud.testing.junit4.MultipleAttemptsRule$1.evaluate(MultipleAttemptsRule.java:94)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found

Any additional information below

Following these steps guarantees the quickest resolution possible.

Thanks!

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/java-storage-nio API. label Mar 11, 2022
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Mar 12, 2022
@cojenco cojenco added status: investigating The issue is under investigation, which is determined to be non-trivial. and removed triage me I really want to be triaged. labels Mar 14, 2022
@sydney-munro
Copy link
Contributor

I found this note in the docs

Cloud Storage uses a flat namespace and therefore doesn't support real directories. So this library supports what's known as "pseudo-directories". Any path that includes a trailing slash, will be considered a directory. It will always be assumed to exist, without performing any I/O. This allows you to do path manipulation in the same manner as you would with the normal UNIX file system implementation. You can disable this feature with [CloudStorageConfiguration.usePseudoDirectories()](https://www.mvndoc.com/c/com.google.cloud/gcloud-java-nio/com/google/cloud/storage/contrib/nio/CloudStorageConfiguration.html#usePseudoDirectories()).

https://www.mvndoc.com/c/com.google.cloud/gcloud-java-nio/index.html

This seems to be the reasoning behind why
Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing/"))); returns true.

@lbergelson
Copy link
Contributor Author

@sydney-munro Ah, thanks for diving for that! From that description it's not clear though that just "gs://bucketthatdoesntexist should also return true. And it's definitely not the case that it should crash.

@lbergelson
Copy link
Contributor Author

It sounds like it would be a change of behavior, but I would expect Files.exist() to return false on any file in a bucket that doesn't exist, (or the bucket itself), even if it ends in a /. I guess the analogy I have in my head is asking the local file system if a file exists on a drive that isn't there.

@sydney-munro
Copy link
Contributor

I agree, it has to do with when we probe to actually see if we can perform I/O, but yes ultimately we should be returning some sort of error such as a FileSystemNotFoundException.

Give me until tomorrow to sort out a path forward for handling this more appropriately.

@lbergelson
Copy link
Contributor Author

Thank you for looking into it! I think we would definitely appreciate it returning false instead of an exception in the case the bucket doesn't exist. I don't think the file not existing should be an exceptional case when checking if the file exists.

Other cases like "can't determine if the file exists because you lack permissions" are obviously a different case.

@sydney-munro
Copy link
Contributor

sydney-munro commented Mar 17, 2022

Thats understandable. Our concern is deviating from what would be standard behavior. For example if a file does not exist
a standard thought would be that they are able to create this file, which is not true if the bucket does not exist.

Whereas we believe FileSystemNotFoundException is more appropriate.
See documentation for Paths.get

@sydney-munro sydney-munro removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label May 3, 2022
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels May 3, 2022
@ddelgrosso1 ddelgrosso1 added priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels May 4, 2022
@sydney-munro
Copy link
Contributor

sydney-munro commented May 5, 2022

Apologies on the delay of getting back to this. So I have gone through the code and thought about the options.
What I have come up with so far is:
These return FileSystemNotFoundException:

  • Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing")));
  • Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist)));

This returns true:

  • Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing/")));

This is mainly to stay consistent with my initial comment however i do find that confusing. I would much prefer to return FileSystemNotFoundException for all of these cases

@sydney-munro
Copy link
Contributor

sydney-munro commented May 10, 2022

lbergelson apologies on the back and forth here. I spoke with the team and you also mentioned some similar concerns on my approach of using CloudStorageFileSystemProvider.getPath to check the existence of a bucket regarding lifecycle and the possibility of changing state while the program runs placing us in the same position as before with an unintended storage exception.

Given this we will not be reworking the behavior and rather manage wrapping the storage exception in something more appropriate.

We will continue to have these return true, and update the documentation to be more clear.

Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing/")));
Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist)));

This case will be handled to throw an appropriate error.

Files.exists(Paths.get(new URI("gs://bucketthatdoesntexist/thing")));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage-nio API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
5 participants