Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per document access control #4673

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open

Per document access control #4673

wants to merge 62 commits into from

Conversation

janl
Copy link
Member

@janl janl commented Jul 12, 2023

first draft implementation of apache/couchdb-documentation#424 (which is itself a little out of date, but it paints the big picture).

@janl janl marked this pull request as draft July 12, 2023 15:11
@janl janl force-pushed the rebase/access-2023 branch 2 times, most recently from 3b15aae to c51a1eb Compare July 13, 2023 13:21
@janl janl force-pushed the rebase/access-2023 branch 3 times, most recently from 9e26c53 to 5ec411a Compare July 28, 2023 09:21
@janl janl marked this pull request as ready for review July 28, 2023 12:40
@janl janl changed the title [WIP, but please review] per doc access 2023 Per document access control Jul 28, 2023
Comment on lines 72 to 86
% {ok, Resp} = fabric:query_view(Db, Options, DDoc, ViewName,
% fun view_cb/2, VAcc, Args),
% {ok, Resp#vacc.resp}.
% % TODO: This might just be a debugging leftover, we might be able
% % to undo this by just returning {ok, Resp#vacc.resp}
% % However, this *might* be here because we need to handle
% % errors here now, because access might tell us to.
% case fabric:query_view(Db, Options, DDoc, ViewName,
% fun view_cb/2, VAcc, Args) of
% {ok, Resp} ->
% {ok, Resp#vacc.resp};
% {error, Error} ->
% throw(Error)
% end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed, or is more work needed here?

Copy link
Contributor

@jaydoane jaydoane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed out some compiler warnings that I saw when poking around.

I also noticed a few TODO and other comments that I'm guessing need to still be addressed or removed?

Currently this PR has no description, and I wonder if you could add some high level documentation of how this works, or maybe link to an architectural document?

src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db.erl Outdated Show resolved Hide resolved
@janl
Copy link
Member Author

janl commented Aug 7, 2023

Currently this PR has no description, and I wonder if you could add some high level documentation of how this works, or maybe link to an architectural document?

sorry for not making this explicit, but if you follow the superseded PRs backwards, you should find more commentary that I didn’t think was necessary to repeat here. The main design is outlined here: apache/couchdb-documentation#424

@janl
Copy link
Member Author

janl commented Aug 7, 2023

I also noticed a few TODO and other comments that I'm guessing need to still be addressed or removed?

thanks and yes, these are all minor or already resolved, but not cleaned up yet. None of these should be major changes. I wanted to get a review on the general shape before getting everything into perfect shape.

Also thanks for the compiler warnings pass, I’ll get to those.

@janl
Copy link
Member Author

janl commented Aug 7, 2023

latest push fixes compiler warnings noted by Jay

rel/overlay/etc/default.ini Outdated Show resolved Hide resolved
@@ -1034,6 +1034,8 @@ error_info({bad_request, Error, Reason}) ->
{400, couch_util:to_binary(Error), couch_util:to_binary(Reason)};
error_info({query_parse_error, Reason}) ->
{400, <<"query_parse_error">>, Reason};
error_info(access) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too generic for me, per_doc_access_denied or something similarly descriptive please.

src/chttpd/src/chttpd_db.erl Outdated Show resolved Hide resolved
src/chttpd/src/chttpd_db.erl Outdated Show resolved Hide resolved
src/chttpd/src/chttpd_db.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db_updater.erl Outdated Show resolved Hide resolved
src/couch/src/couch_db_updater.erl Outdated Show resolved Hide resolved
Roles = couch_util:get_value(<<"roles">>, UserProps, []),
case lists:member(<<"_admin">>, Roles) of
true -> Roles;
_ -> Roles ++ [<<"_users">>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[<<"_users">> | Roles]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we’ve been over. this before, I want to explicitly append the new role to the array so any tests and code that expect their existing roles in the beginning just keep working and you OK’d that reasoning on an older PR :)

@@ -199,7 +206,7 @@ proxy_auth_user(Req) ->
Roles =
case header_value(Req, XHeaderRoles) of
undefined -> [];
Else -> re:split(Else, "\\s*,\\s*", [trim, {return, binary}])
Else -> re:split(Else, "\\s*,\\s*", [trim, {return, binary}]) ++ [<<"_users">>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[<<"_users">> | re:split ...]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -248,7 +255,7 @@ jwt_authentication_handler(Req) ->
Req#httpd{
user_ctx = #user_ctx{
name = User,
roles = Roles
roles = Roles ++ [<<"_users">>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[<<"_users">> | Roles]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{Body} ->
Access = couch_util:get_value(<<"_access">>, Body),
apply_open_options(Db, {ok, Doc#doc{access = Access}}, Options);
_Else ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do a full review tomorrow but I am uncomfortable with a giant _Else here. what else would we get except a body? undefined or nil? let's be specific in couch_db.erl.

@arturog
Copy link

arturog commented Sep 7, 2023

I am coming back to couchdb updating an old application that has already implemented the above (per-document access, with linux-like permissions like owner/read/write access, even per-field protection on the doc). I personally don't like this feature and wouldn't like to see it merged:

  • This kind of security requirement is almost always specific to the application. In my case, for example, this solution is not fit for purpose.
  • Most of the time a front-end application server is better suited to handle security requirements, which might be complex.
  • I believe CouchDB should stick to being a database. We should refine the current codebase instead.
  • If we need to implement this kind of functionality (like in my case), we are better off providing a plugin/hook system where can enter the validate and read lifecycle of the document. Devs are free to implement what they want and we could provide a working example. (This is what I did). Providing a _custom_<field> on documents for devs to freely use would be great.
  • Are we going to filter the views? Are we going to filter the document counts on views? What about leaking information on views? Some views need some info, some others not? Will I sometimes use per-user view, sometimes the global views? And what about searching with Lucene? I feel we should not touch our indices.
  • Replicating documents into your database where the _access field changes, are we doing that? Will devs want a "master" where the _access field is allowed to change, and slaves where you won't? Is the ultimate purpose of this PR to segregate documents within your own DB? Then why not create per-user DBs?

I would be in favour of document lifecycle plugins/hooks, where we can implement this feature and many others.

@rnewson
Copy link
Member

rnewson commented Sep 7, 2023

@arturog

I think secondary indexes would need to be updated to support this for it to have the desired effect (restricting a user's visibility of a database to a subset). I'm sure we've not done that work for dreyfus/clouseau (and I definitely haven't for nouveau).

I share your concern that this implementation might not meet everyone's needs, but that's not necessarily a blocker. It just needs to meet a significant number of people's needs.

I don't find "I believe CouchDB should stick to being a database." compelling. Other databases have richer permissions and access controls that we currently do, they're not off-topic.

The PR changes a number of internal records which prevents a smooth upgrade of existing CouchDB clusters and is blocked from merging until that is fixed imo, so there's time to think.

We currently have validate_doc_update where you can implement any kind of write control logic, but no equivalent for reads (or view queries besides the deprecated _list option). If Javascript were evaluated more efficiently (c.f, the quickjs embedding idea) we might add that.

@arturog
Copy link

arturog commented Sep 7, 2023

We currently have validate_doc_update where you can implement any kind of write control logic, but no equivalent for reads (or view queries besides the deprecated _list option). If Javascript were evaluated more efficiently (c.f, the quickjs embedding idea) we might add that.

We implemented a fork of CouchDB with exactly that (a validate_doc_read), with code written in Erlang. We also added our own validate_doc_write in Erlang before the JS one kicks in. If extended, I believe it can provide not only this feature, but much more. Also, the issue of leaking out data in indices (specially to the lucene index) lead me to limit indexing to only public fields -- so the field-access was also born. Indices in CouchDB were kept untouched, and though all views had the full set of _ids, on document access (or requesting the full document via views), you would get <unauthorized>. Our implementation was not meant to present a "subset" of documents, but instead control access to the document.

Perhaps the requirement to present a subset of the database needs to be revisited? If this requirement were to be dropped, things might become easier.

@janl
Copy link
Member Author

janl commented Sep 12, 2023

@arturog thank you for your comments, practical experience with variants of the same idea here are invaluable.

This kind of security requirement is almost always specific to the application. In my case, for example, this solution is not fit for purpose.
Most of the time a front-end application server is better suited to handle security requirements, which might be complex.

I’m 100% on board with you here. That’s why this is designed as an opt-in feature. If you have a system on top of CouchSB that works for you, that will just continue to work.

To avoid confusion, let me restate the design goal of this PR: “make the db-per-user pattern obsolete”. It is specifically not “support arbitrary ACLs inside a database”.

With that out of the way, the scope of the potential work for a complete solution here is very big. In accordance with common wisdom: to built a complex system that works, one first has to build a simple system that works. The current PR reflects the simplest system that I could think of to get the minimal functionality working that satisfies the design goal.

To this end, per-user design docs (and all associated indexes of any form, JS, Mango, Search/Nouveau) are not supported. A future version of this PR however can use the new by-access-seq index to build per-user indexes, including reduces that are guaranteed to not leak any data. Global views are already supported but are limited to database admins.

Perhaps the requirement to present a subset of the database needs to be revisited? If this requirement were to be dropped, things might become easier.

Maybe you want a different feature? You are stating the explicit design goal of this PR that should be dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants