Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto content-type on blob creation #338

Merged
merged 8 commits into from Jul 15, 2020
Merged

Conversation

dmitry-fa
Copy link
Contributor

Fixes #47

Proposed fix of the auto detection of the content type based on the blob name, like other clients do (php, nodejs, go).

The class javax.activation.MimetypesFileTypeMap from JDK is used to detect the content type. Using this class with Java 11+ requires the dependency on Java Activation Framework.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 26, 2020
@codecov
Copy link

codecov bot commented May 26, 2020

Codecov Report

Merging #338 into master will decrease coverage by 0.06%.
The diff coverage is 28.57%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #338      +/-   ##
============================================
- Coverage     63.27%   63.20%   -0.07%     
- Complexity      609      619      +10     
============================================
  Files            32       32              
  Lines          5113     5123      +10     
  Branches        487      488       +1     
============================================
+ Hits           3235     3238       +3     
- Misses         1713     1720       +7     
  Partials        165      165              
Impacted Files Coverage Δ Complexity Δ
...om/google/cloud/storage/spi/v1/HttpStorageRpc.java 1.60% <11.11%> (+0.09%) 2.00 <1.00> (+1.00)
...rc/main/java/com/google/cloud/storage/Storage.java 80.54% <33.33%> (-0.36%) 0.00 <0.00> (ø)
...va/com/google/cloud/storage/spi/v1/StorageRpc.java 65.33% <100.00%> (+0.46%) 0.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12bd62c...736ed2e. Read the comment docs.

@@ -81,6 +81,7 @@
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import javax.activation.MimetypesFileTypeMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use com.google.common.net.MediaType so you don't introduce another dependency, and ping the guava team to get this out of beta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.google.common.net.MediaType doesn't provide mapping extension to the meida type. Its parse method recognizes strings like "text/plain".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Files.probeContentType(Path)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or URLConnection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Files.probeContentType(Path) and URLConnection work! I will use it instead. Thanks for the hint.

@@ -88,6 +88,11 @@
<groupId>org.threeten</groupId>
<artifactId>threetenbp</artifactId>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this included in java 6 and later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works with jdk7,8. This dependency is required for java9+

return contentType;
}
return firstNonNull(
FILE_NAME_MAP.getContentTypeFor(object.getName().toLowerCase()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toLowerCase always should have a locale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case "create":
return storage.create(blobInfo);
case "writer":
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the braces here?

String customType = "custom/type";
BlobId blobId = BlobId.of(BUCKET, names[0]);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType(customType).build();
blob = createBlob(method, blobInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't reuse local variable. There are two separate blob objects here that coincidentally share a variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a reason to reuse the variable but after rewriting the test that reason is not actual anymore. Thanks for catching that.

private void testAutoContentType(String method) throws IOException {
String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"};
String[] types = {"text/plain", "image/jpeg", "application/octet-stream"};
Blob blob = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can push this into the for loop

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dmitry-fa, thanks for addressing the request. Have two requests.

@@ -372,6 +375,16 @@ public StorageObject create(
}
}

private static String detectContentType(StorageObject object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an optional feature that's disabled by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about making this feature optional, but I didn't find an appropriate place to store this setting. It could be a property of a project/client/bucket/object.

I only found: https://cloud.google.com/storage/docs/metadata#content-type
which states: If the Content-Type is not specified by the uploader and cannot be determined, it is set to application/octet-stream or application/x-www-form-urlencoded, depending on how you uploaded the object.

If I read this correctly, the Storage should attempt to determine the content type if not explicitly set, so this feature should be on by default.

BTW, right now the content type of blobs created with Storage.create() methods is null, but should be application/octet-stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage doesn't provide contentType detection in the API.

If contentType is set at creation then application/octet-stream is used.

Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.

}

private void testAutoContentType(String method) throws IOException {
String[] names = {"file1.txt", "dir with spaces/Pic.Jpg", "no_extension"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to infer the type based on bytes rather than file extension? That's how Golang does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always possible. In case of resumeable upload a blob is created first and the content is available later:

writer = storage.writer(blobInfo); // creating a blob, detecting content-type, no contect
writer.write(content);
writer.close();              

I did an experiment:
I created a file PDF.txt with a pdf content. Then I uploaded it as PDF.jpg with various clients and checked the resulting content type. I got the following:

gsutil: text/plain
NodeJS: image/jpeg
Go: application/pdf

To achieve consistent behavior across all the clients it should be implemented on the server-side at the moment when the upload is finished.

Until this feature is implemented I suggest type detecting based on file extension, as it has been done recently in nodejs (googleapis/nodejs-storage#1190)

@@ -372,6 +375,16 @@ public StorageObject create(
}
}

private static String detectContentType(StorageObject object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage doesn't provide contentType detection in the API.

If contentType is set at creation then application/octet-stream is used.

Please make this optional otherwise it's a behavior change not originally intended on uploads with the Java client.

@dmitry-fa
Copy link
Contributor Author

@frankyn,

For the time being, objects created with Storage.create() method have content-type set to null if not explicitly set, not application/octet-stream.

Is it possible to store custom properties on the project level?
I mean something like: storage.setProperty('java.contentTypeDetection', 'extension')

@frankyn
Copy link
Member

frankyn commented Jun 30, 2020

If contentType is set at creation then application/octet-stream is used.

Whoops I meant, If contentType is not set at creation then application/octet-stream is used by the service.

Please introduce this as a new option in the request instead of a client level option for now.

@dmitry-fa
Copy link
Contributor Author

Not sure if it will be convenient to specify 'I want to detect the content type' each time when a new object is created. The content type could be provided explicitly instead.

Okay, I'll think about that. If the user specifies explicitly that he wants to detect the content type, he can also select a strategy for that: 'by extension', 'by content' or 'by both'.

@frankyn frankyn merged commit 66d1eb7 into googleapis:master Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature suggestion: auto content-type on blob creation
4 participants