Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata option in mc #2636

Closed
lavvy opened this issue Dec 31, 2018 · 15 comments
Closed

metadata option in mc #2636

lavvy opened this issue Dec 31, 2018 · 15 comments

Comments

@lavvy
Copy link

lavvy commented Dec 31, 2018

Expected behavior

./mc cp myobj play/bucket/myobj --metadata KeyName1=string,KeyName2=string

Actual behavior

Error .
Every other sdk has a way to add custom metadata but mc.

@harshavardhana
Copy link
Member

mc is not an SDK - you may use an SDK to do this right now.

@lavvy
Copy link
Author

lavvy commented Dec 31, 2018

thanks for the quick response. Yea we are using sdk for it now but mc allows us so much portability between applications . But thanks though cos i thought it was intentionally omitted were it mattered most.
By the way aws cli has it covered but for unknown reason its not as fast as mc.
Is there any timeline for it ? @harshavardhana

@harshavardhana
Copy link
Member

thanks for the quick response. Yea we are using sdk for it now but mc allows us so much portability between applications . But thanks though cos i thought it was intentionally omitted were it mattered most.
By the way aws cli has it covered but for unknown reason its not as fast as mc.
Is there any timeline for it ?

There is no easier way to do it @lavvy , we would like to limit mc's command line options. mc is a large scale tool copies million of files. Does this option apply for one file or all files? is complicated to reason.

This can lead to unexpected metadata accumulation on files which were not meant to have this metadata. aws cli implements lot more features which we do not implement because it simply doesn't make sense, mc is built with a different idea in mind to keep things simple and not be equivalent of what an SDK can do. Our SDKs are as portable as mc written in a portable language, so I see no reason why you would lose portability in your application. Unless you are making platform-specific assumptions.

@lavvy
Copy link
Author

lavvy commented Dec 31, 2018 via email

@harshavardhana
Copy link
Member

mc running on command line is portable for us in that you can use a single string command accross all apps. Secondly you can easily create a bash script to perform quick jobs. I am aware mc has to be simple enough but there are some scenarios that some kind of files will require at least a unique metadata. It doesn't have to apply in all cases, but minio now becoming a go to dump for any kind of data, these special cases do arise.

We will discuss this and let you know.

@lavvy
Copy link
Author

lavvy commented Dec 31, 2018

i have read a lot about your advises in many metadata related issues where you discouraged much use of it. I really could not decipher if its bcos of technical or ethical reasons. But assuming it's any of those or both.
In my opinion it should be better use case if you could include it in mc too cos you know one can easily use the go sdk or python script to achieve it .
##Current behaviour
So at the end the developer will have more scripts to manage .
Every dev trying to implement it will end up with different ethics which will clutter or fragment more.
Uncontrolled use of metadata will still find its way into the servers
##Expected behaviour
More homogenity
More predictable outcomes
Controlled ethics of how metadata is used in the system.

@harshavardhana
Copy link
Member

i have read a lot about your advises in many metadata related issues where you discouraged much use of it. I really could not decipher if its bcos of technical or ethical reasons. But assuming it's any of those or both.

It is discouraged because metadata should ideally exist in your application database, it is simply not worth storing it in object storage. You can't search using these metadata values, query object storage in any meaningful way there is no S3 API for this. This metadata is retrieved using a rudimentary mechanism either HEAD or GET i.e sent back as an HTTP header. It is not even available in ListObjects, neither there is a query param to list only relevant metadata contents etc.

I don't see how applications make use of this cleanly. If I was writing an application using object storage I am only interested in keeping my data safe on storage nothing more. Any additional layers are application construct. It may be useful in a very niche scenario that's why it was my point that it doesn't require it to be a first-class option in mc.

While on server Minio we have moved to support all of them for compatibility reasons, I don't see how even today applications use it cleanly other than some niche scenarios.

@lavvy
Copy link
Author

lavvy commented Dec 31, 2018 via email

@harshavardhana
Copy link
Member

harshavardhana commented Dec 31, 2018

For instance you want to save a custom file that needs a custom application to open it?

You can open an object regardless of what user metadata it has. It is not even a conditional where you can say open this object only if matches this user metadata. It is simply an opaque value which you can use to signify some proprietary meaning. Without your application, this value is not meaningful to any other application. For applications such as mc or aws cli your custom metadata is meaningless - neither it can be used for interpretation nor it can be used to make any valuable reasoning. This is what I meant by queriability and discovery which is not possible by using this metadata because underlying protocol S3 is not a query API. For example, if you user metadata is stored in elasticsearch you can do intelligent decisions.

One, of course, needs a database in almost all serious applications. If you are saying that you perform ListObjects on a bucket to find the object then it is surely a badly designed application. Object storage can never be a replacement for your database. Listing on a database is always faster than object storage.

User metadata implementation in S3 protocol is not meant as an extensible implementation, so its use to make some intelligent decisions is not possible. These values in-fact are not even respected in IAM policies or bucket policies. From all the applications and the use cases, we have seen it is all about saving a few extra opaque values along with the object. Which can be saved in your database as well and queried with more added benefits.

Another issue with user metadata is portability, applications need to understand to do a special interpretation of these HTTP headers. They are not standard HTTP headers and not part of RFC so don't offer any meaning outside a given application or even a protocol i.e S3 (Azure metadata is completely different in naming convention etc). The names can be vague and not intuitive - there is no guarantee or restriction on what the meaning can be, as there is no standard set of headers published.

So while we may implement adding metadata in mc its a minimal change - IMO I am just trying to argue the fundamental nature of using user metadata - I can't see what use this gives to your application which cannot be done outside which is more portable and provides more comprehensive meaning - provides richer discoverability etc.

@kannappanr
Copy link
Collaborator

@lavvy We talked internally and decided that we will implement this feature like this
./mc cp myobj play/bucket/myobj --attr KeyName1=string,KeyName2=string

we will still have to iron out details like copying a whole folder/bucket to another bucket, should we apply metadata to all the objects under that bucket etc.

@lavvy
Copy link
Author

lavvy commented Jan 12, 2019

So nice. We appreciate you guys. 👍

@adarrra
Copy link
Contributor

adarrra commented Jun 13, 2019

is --attr flag on some roadmap? Are you still plan to implement it?

@kannappanr
Copy link
Collaborator

It is already there.

@adarrra
Copy link
Contributor

adarrra commented Jun 13, 2019

maybe update docs?
#2793

@lock
Copy link

lock bot commented Jun 24, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants