Skip to content

Access control in Scylla

Tzach Livyatan edited this page Sep 11, 2018 · 11 revisions

For formal Scylla Doc, see https://docs.scylladb.com/operating-scylla/security/rbac_usecase/

Access control in Scylla

Note: This page is a work-in-progress and will change extensively as information is corrected and plans clarified.

This page is an overview of access control in Scylla from a user and development perspective. It describes the current method of access control, which is based on users and permissions, and is also the home-page for the project to add role-based access control. Role-based access control, when merged, will deprecate the previous user-based system.

Authentication and authorization

The concepts of authentication and authorization are very different, and both are necessary for access control.

Authentication is the process of verifying the validity of a form of identification. Users can authenticate themselves with something they know (like a password), something they own (like a key), or something they are (like their fingerprint).

On the other hand, authorization is the process of granting rights to resources based on an access policy. For example, Joe is authorized to read from the transactions table, but Philip is not.

User-based access control

Today, access control in Scylla is centered around users.

CQL perspective

To enable the access control system, set authenticator to PasswordAuthenticator and the authorizator to CassandraAuthorizer in scylla.yaml.

This will create the necessary metadata in the cluster, and also the cassandra user. Subsequent users will need to login with their credentials. The cassandra user's password is "cassandra".

$ cqlsh -u cassandra -p cassandra

In the implementation, authentication and authorization are handled by specializations of the abstract auth::authenticator and auth::authorizor interfaces. The default implementations, PasswordAuthenticator and CassandraAuthorizer respectively, use regular Scylla tables to store all necessary metadata. Looking at these tables is instructive to understanding the implementation.

The following tables will then be created in the system_auth keyspace:

CREATE TABLE system_auth.users (
  name text PRIMARY KEY,
  super boolean
);

This is a record of all registered users, and marks users as superusers. Superusers have access to all resources.

CREATE TABLE system_auth.credentials (
  username text PRIMARY KEY,
  options map<text, text>,
  salted_hash text
);

This table is queried when authenticating users. Passwords are cryptogaphically hashed with salt.

CREATE TABLE system_auth.permissions (
  username text,
  resource text,
  permissions set<text>,
  PRIMARY KEY (username, resource)
);

This table stores the permissions that each user has on a per-resource basis. It is queried during the process of authorization. Access is opt-in rather than opt-out: unless a user is explicitly granted access to a resource, then it will not be accessible. Permissions are granular based on different types of operations. Permissions are READ, WRITE, CREATE, ALTER, DROP, SELECT, MODIFY, and AUTHORIZE. Resources can be named keyspaces or tables.

As an example, consider two users: asmith and bgrant. There exists a keyspace called events and a table events.ingest.

CREATE USER asmith WITH PASSWORD 'asmith';
CREATE USER bgrant WITH PASSWORD 'bgrant';

asmith has full access to the entirety of the events keyspace, and bgrant has SELECT permssions on the events.ingest table.

GRANT ALL PERMISSIONS ON KEYSPACE events TO asmith;
GRANT SELECT PERMISSION ON TABLE events.ingest TO bgrant;

The tables in system_auth now contain the following entries:

SELECT * FROM system_auth.users;

name      | super
-----------+-------
cassandra |  True
   bgrant | False
   asmith | False
SELECT * FROM system_auth.credentials;

 username  | options | salted_hash
-----------+---------+--------------------------------------------------------------------------------------------
 cassandra |    null | $6$$zeOx/OsV0BxfTona8nJznxkSX2kurTyb9k50aYIslcz45LJHdbHwyQwY0vTMxFW6L7MZ3D3HppgFAKdWc9.zp/
    bgrant |    null | $6$$5lSUNclqh7HuUusJKncVaJPVgOrh41MUKVF9gORq04saI36/r9wE9XPY7Usn5waKAATGDlutvYrXiEk/5WcFO1
    asmith |    null | $6$$Qcp1.ugjQ3zUrp.ooSlzpf5ZYtXBrcVSuN1bte2.CLwMXkm3I/8tdhSAtJOCLO3s5SWzPGLTueE/aqFuubFyL1
SELECT * FROM system_auth.permissions;

 username | resource           | permissions
----------+--------------------+--------------------------------------------------------------
   bgrant | data/events/ingest |                                                   {'SELECT'}
   asmith |        data/events | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}

Permissions are transitive. Since asmith has full access to the events keyspace, he also access full access to events.ingest.

More on the Implementation

The implementation of most CQL queries have a function check_access() (for example, select_statement::check_access() in cql3/statements/select_statement.cc. In turn, service::client_state::has_column_family_access() invokes auth::auth::get_permission() which queries the permission set of a particular user and resource.

Doing a table read when performing a query on any resource would be slow. Thus, a permissions cache runs per-shard with a local copy of pairs of (user, resource) -> permission_set. This cache, auth::auth::permission_cache, is an instance of a Seastar sharded service: this is a feature of the Seastar framework that automatically distributes a service across all logical cores of a machine.

The cache, internally a specialization of utils::loading_cache, updates its contents based on elapsed time and capacity. The maximum entries are controlled by the permissions_cache_max_entries configuration variable. Entries remain in the cache for permissions_validity_in_ms milliseconds, and the cache is repopulated via auth::authorizer::authorize() every permissions_update_interval_in_ms. The default authorizer, auth::authorizer::authorize(), simply queries, with consistency level (CL) LOCAL_ONE, the system_auth.permissions table for all (user, resource) entries in the cache.

Limitations of user-based access control

The primary problem of access-control based only on users is that managing permissions for each individual user in a complex environment can become unwieldy. For example, if all data analysts at an organization should have SELECT access on the same ten tables, then ensuring that new users have been granted all appropriate permissions is error prone and tedious.

One idea to resolve this tedium is to create "umbrella" users with shared identities. For example, all analysts would login with the "analyst" user and changes to the access rights for analysts only need to be updated once. Unfortunately, this solutions violates an important principal of information security called non-repudiation: roughly, that the origin of data can be reliably traced to a particular identity. In other words, if hundreds of users share the same user identity, then tracing modifications or queries of data to a single "real" user is extremely challenging. Furthermore, if fine-grained changes to permissions need to be made for a particular analyst, then either all analysts will have to have the same change applied, or a new user will need to be created with all the same access permissions and the additional change. This tedium is what we were trying to avoid. Finally, if an employee leaves the organisation, then the password will need to be changed and all analysts will be effected.

The solution to this complexity is access control based on roles: somewhat analogous to user groups in operating systems like Linux, roles are permission sets that can be inherited by other roles.

Considering the previous example, a role "analyst" would be granted SELECT permissions to all relevant tables. When a new analyst is hired at the company, they are granted the "analyst" role and thus they inherit all relevant permissions. If a particular employee needs additional permissions, they can be granted roles as necessary. Since the employee still logs in to the system using their unique identity, their particular interaction with the system can be traced reliably.

Role-based access control

CQL perspective

We can use Cassandra to demonstrate the functionality of roles with a small example. Set role_manager to CassandraRoleManager in cassandra.yaml and restart the service.

As with before, the "cassandra" "super" role is automatically created. A role with login capability is analogous to the previous "user" concept.

The following tables now exist in the system_auth keyspace:

CREATE TABLE system_auth.roles (
  role text PRIMARY KEY,
  can_login boolean,
  is_superuser boolean,
  member_of set<text>,
  salted_hash text
);

Analogous to the old system_auth.users table, this table is managed by CassandraRoleManager and queried by PasswordAuthenticator (which effectively couples them).

 CREATE TABLE system_auth.role_members (
   role text,
   member text,
   PRIMARY KEY (role, member)
);

Also managed by CassandraRoleManager, this table tracks the roles which a role has inherited.

CREATE TABLE system_auth.role_permissions (
  role text,
  resource text,
  permissions set<text>,
  PRIMARY KEY (role, resource)
);

Analogous to the old system_auth.permissions, this table is managed by CassandraAuthorizer and tracks the permissions that have been granted to reach role on a per-resource basis.

As an example, consider a "lord" role with full permissions to the fields table in the property keyspace. The "peon" role can only SELECT on fields. "robert" and "julia" are login roles that are granted the "peon" role.

CREATE ROLE lord;
GRANT ALL PERMISSIONS on KEYSPACE property TO lord;
CREATE ROLE peon;
GRANT SELECT PERMISSION on TABLE property.fields TO peon;
CREATE ROLE robert WITH LOGIN = true AND PASSWORD = 'robert';
CREATE ROLE julia WITH LOGIN = true AND PASSWORD = 'julia';
GRANT peon TO robert;
GRANT peon TO julia;

Internally, the tables are populated as follows (omitting the "cassandra" superuser):

SELECT * FROM system_auth.roles;

 role      | can_login | is_superuser | member_of | salted_hash
-----------+-----------+--------------+-----------+--------------------------------------------------------------
     julia |      True |        False |  {'peon'} | $2a$10$k0MmGeHw6sD6u/PdD9/1xeWezsHHUcBsEZfgoBc3KPdBfGz8szhVC
      lord |     False |        False |      null |                                                         null
      peon |     False |        False |      null |                                                         null
    robert |      True |        False |  {'peon'} | $2a$10$iL1paUoNiZkIrnsQb9QMQ.2SP/HcFp8Cbx51RJgOR4H1Ey6h4gNC.
SELECT * FROM system_auth.role_members;

 role | member
------+--------
 peon |  julia
 peon | robert
SELECT * FROM system_auth.role_permissions;

 role      | resource             | permissions
-----------+----------------------+--------------------------------------------------------------
      lord |        data/property | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}
      peon | data/property/fields |                                                   {'SELECT'}

Implementation and feature references in Cassandra

  1. Relevant commits

  2. Other references

Project milestones

This section will be regularly updated as the plan solidifies and with relevant issue numbers.

  • #2929: Refactor the auth module away from global state, in both its use and implementation.
  • #2987: Add a preliminary role manager (unused) with a simple security model.
  • #3027: Generalize auth::data_resource to other kinds of resources.
  • #2988: Add fine-grained role permissions and update the security model.
  • #3216: Creators of resources are granted all permissions on them
  • #3217: Refine CQL syntax
  • TBD
  • #1941: Done!
Clone this wiki locally