Access control in Scylla
For formal Scylla Doc, see https://docs.scylladb.com/operating-scylla/security/rbac_usecase/
Note: This page is a work-in-progress and will change extensively as information is corrected and plans clarified.
This page is an overview of access control in Scylla from a user and development perspective. It describes the current method of access control, which is based on users and permissions, and is also the home-page for the project to add role-based access control. Role-based access control, when merged, will deprecate the previous user-based system.
The concepts of authentication and authorization are very different, and both are necessary for access control.
Authentication is the process of verifying the validity of a form of identification. Users can authenticate themselves with something they know (like a password), something they own (like a key), or something they are (like their fingerprint).
On the other hand, authorization is the process of granting rights
to resources based on an access policy. For example, Joe is authorized
to read from the transactions
table, but Philip is not.
Today, access control in Scylla is centered around users.
To enable the access control system, set authenticator
to
PasswordAuthenticator
and the authorizator
to CassandraAuthorizer
in scylla.yaml
.
This will create the necessary metadata in the cluster, and also the
cassandra
user. Subsequent users will need to login with their
credentials. The cassandra
user's password is "cassandra"
.
$ cqlsh -u cassandra -p cassandra
In the implementation, authentication and authorization are handled by
specializations of the abstract auth::authenticator
and
auth::authorizor
interfaces. The default implementations,
PasswordAuthenticator
and CassandraAuthorizer
respectively, use
regular Scylla tables to store all necessary metadata. Looking at these
tables is instructive to understanding the implementation.
The following tables will then be created in the system_auth
keyspace:
CREATE TABLE system_auth.users (
name text PRIMARY KEY,
super boolean
);
This is a record of all registered users, and marks users as superusers. Superusers have access to all resources.
CREATE TABLE system_auth.credentials (
username text PRIMARY KEY,
options map<text, text>,
salted_hash text
);
This table is queried when authenticating users. Passwords are cryptogaphically hashed with salt.
CREATE TABLE system_auth.permissions (
username text,
resource text,
permissions set<text>,
PRIMARY KEY (username, resource)
);
This table stores the permissions that each user has on a per-resource
basis. It is queried during the process of authorization. Access is
opt-in rather than opt-out: unless a user is explicitly granted access
to a resource, then it will not be accessible. Permissions are granular
based on different types of operations. Permissions are READ
, WRITE
,
CREATE
, ALTER
, DROP
, SELECT
, MODIFY
, and AUTHORIZE
.
Resources can be named keyspaces or tables.
As an example, consider two users: asmith
and bgrant
. There exists a
keyspace called events
and a table events.ingest
.
CREATE USER asmith WITH PASSWORD 'asmith';
CREATE USER bgrant WITH PASSWORD 'bgrant';
asmith
has full access to the entirety of the events
keyspace, and
bgrant
has SELECT
permssions on the events.ingest
table.
GRANT ALL PERMISSIONS ON KEYSPACE events TO asmith;
GRANT SELECT PERMISSION ON TABLE events.ingest TO bgrant;
The tables in system_auth
now contain the following entries:
SELECT * FROM system_auth.users;
name | super
-----------+-------
cassandra | True
bgrant | False
asmith | False
SELECT * FROM system_auth.credentials;
username | options | salted_hash
-----------+---------+--------------------------------------------------------------------------------------------
cassandra | null | $6$$zeOx/OsV0BxfTona8nJznxkSX2kurTyb9k50aYIslcz45LJHdbHwyQwY0vTMxFW6L7MZ3D3HppgFAKdWc9.zp/
bgrant | null | $6$$5lSUNclqh7HuUusJKncVaJPVgOrh41MUKVF9gORq04saI36/r9wE9XPY7Usn5waKAATGDlutvYrXiEk/5WcFO1
asmith | null | $6$$Qcp1.ugjQ3zUrp.ooSlzpf5ZYtXBrcVSuN1bte2.CLwMXkm3I/8tdhSAtJOCLO3s5SWzPGLTueE/aqFuubFyL1
SELECT * FROM system_auth.permissions;
username | resource | permissions
----------+--------------------+--------------------------------------------------------------
bgrant | data/events/ingest | {'SELECT'}
asmith | data/events | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}
Permissions are transitive. Since asmith
has full access to the
events
keyspace, he also access full access to events.ingest
.
The implementation of most CQL queries have a function check_access()
(for example,
select_statement::check_access()
in cql3/statements/select_statement.cc
. In turn,
service::client_state::has_column_family_access()
invokes
auth::auth::get_permission()
which queries the permission set of a
particular user and resource.
Doing a table read when performing a query on any resource would be
slow. Thus, a permissions cache runs per-shard with a local copy of
pairs of (user, resource) -> permission_set
. This cache,
auth::auth::permission_cache
, is an instance of a Seastar sharded
service: this is a feature of the Seastar framework that automatically
distributes a service across all logical cores of a machine.
The cache, internally a specialization of utils::loading_cache
,
updates its contents based on elapsed time and capacity. The maximum
entries are controlled by the permissions_cache_max_entries
configuration variable. Entries remain in the cache for
permissions_validity_in_ms
milliseconds, and the cache is repopulated
via auth::authorizer::authorize()
every
permissions_update_interval_in_ms
. The default authorizer,
auth::authorizer::authorize()
, simply queries, with consistency level (CL) LOCAL_ONE
, the
system_auth.permissions
table for all (user, resource)
entries in the cache.
The primary problem of access-control based only on users is that
managing permissions for each individual user in a complex environment
can become unwieldy. For example, if all data analysts at an
organization should have SELECT
access on the same ten tables, then
ensuring that new users have been granted all appropriate permissions is
error prone and tedious.
One idea to resolve this tedium is to create "umbrella" users with
shared identities. For example, all analysts would login with the
"analyst"
user and changes to the access rights for analysts only need
to be updated once. Unfortunately, this solutions violates an important
principal of information security called non-repudiation: roughly,
that the origin of data can be reliably traced to a particular identity.
In other words, if hundreds of users share the same user identity, then
tracing modifications or queries of data to a single "real" user is
extremely challenging. Furthermore, if fine-grained changes to
permissions need to be made for a particular analyst, then either all
analysts will have to have the same change applied, or a new user will
need to be created with all the same access permissions and the
additional change. This tedium is what we were trying to avoid. Finally,
if an employee leaves the organisation, then the password will need to
be changed and all analysts will be effected.
The solution to this complexity is access control based on roles: somewhat analogous to user groups in operating systems like Linux, roles are permission sets that can be inherited by other roles.
Considering the previous example, a role "analyst"
would be granted
SELECT
permissions to all relevant tables. When a new analyst is hired
at the company, they are granted the "analyst"
role and thus they
inherit all relevant permissions. If a particular employee needs
additional permissions, they can be granted roles as necessary. Since
the employee still logs in to the system using their unique identity,
their particular interaction with the system can be traced reliably.
We can use Cassandra to demonstrate the functionality of roles with a
small example. Set role_manager
to CassandraRoleManager
in
cassandra.yaml
and restart the service.
As with before, the "cassandra"
"super" role is automatically created.
A role with login capability is analogous to the previous "user"
concept.
The following tables now exist in the system_auth
keyspace:
CREATE TABLE system_auth.roles (
role text PRIMARY KEY,
can_login boolean,
is_superuser boolean,
member_of set<text>,
salted_hash text
);
Analogous to the old system_auth.users
table, this table is managed by
CassandraRoleManager
and queried by PasswordAuthenticator
(which
effectively couples them).
CREATE TABLE system_auth.role_members (
role text,
member text,
PRIMARY KEY (role, member)
);
Also managed by CassandraRoleManager
, this table tracks the roles
which a role has inherited.
CREATE TABLE system_auth.role_permissions (
role text,
resource text,
permissions set<text>,
PRIMARY KEY (role, resource)
);
Analogous to the old system_auth.permissions
, this table is managed by
CassandraAuthorizer
and tracks the permissions that have been granted
to reach role on a per-resource basis.
As an example, consider a "lord"
role with full permissions to the
fields
table in the property
keyspace. The "peon"
role can only
SELECT
on fields
. "robert"
and "julia"
are login roles that are
granted the "peon"
role.
CREATE ROLE lord;
GRANT ALL PERMISSIONS on KEYSPACE property TO lord;
CREATE ROLE peon;
GRANT SELECT PERMISSION on TABLE property.fields TO peon;
CREATE ROLE robert WITH LOGIN = true AND PASSWORD = 'robert';
CREATE ROLE julia WITH LOGIN = true AND PASSWORD = 'julia';
GRANT peon TO robert;
GRANT peon TO julia;
Internally, the tables are populated as follows (omitting the
"cassandra"
superuser):
SELECT * FROM system_auth.roles;
role | can_login | is_superuser | member_of | salted_hash
-----------+-----------+--------------+-----------+--------------------------------------------------------------
julia | True | False | {'peon'} | $2a$10$k0MmGeHw6sD6u/PdD9/1xeWezsHHUcBsEZfgoBc3KPdBfGz8szhVC
lord | False | False | null | null
peon | False | False | null | null
robert | True | False | {'peon'} | $2a$10$iL1paUoNiZkIrnsQb9QMQ.2SP/HcFp8Cbx51RJgOR4H1Ey6h4gNC.
SELECT * FROM system_auth.role_members;
role | member
------+--------
peon | julia
peon | robert
SELECT * FROM system_auth.role_permissions;
role | resource | permissions
-----------+----------------------+--------------------------------------------------------------
lord | data/property | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}
peon | data/property/fields | {'SELECT'}
-
Relevant commits
- Introduce role-based access control
- Add new role management permissions
- Automatically grant permissions to creators of new objects and roles
- Make '=' optional in CREATE/ALTER role statements
- Make custom role options accessible from IRoleManager
- Make syntax for role options consistent with other statements
- User/role permissions for UDFs
- DropRoleStatement should only check super-user status for existing roles
- Allow roles cache to be invalidated
- Make role-based statements backwords compatible with user-based syntax
- Preserve case properly for quoted user/role names
- Better handle invalid system roles table
-
Other references
This section will be regularly updated as the plan solidifies and with relevant issue numbers.
-
#2929: Refactor the auth module away from global state, in both its use and implementation. -
#2987: Add a preliminary role manager (unused) with a simple security model. -
#3027: Generalizeauth::data_resource
to other kinds of resources. -
#2988: Add fine-grained role permissions and update the security model. - #3216: Creators of resources are granted all permissions on them
- #3217: Refine CQL syntax
- TBD
- #1941: Done!