Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support regex processing of DB role names in collector #194

Open
joehorsnell opened this issue Aug 13, 2021 · 3 comments
Open

Support regex processing of DB role names in collector #194

joehorsnell opened this issue Aug 13, 2021 · 3 comments

Comments

@joehorsnell
Copy link
Contributor

Hi @lfittl,

As per our call earlier this week (cc @robharrop), I'm opening an issue for your comment/approval on our proposed collector change, to allow processing/transformation of DB role names in the collector, before sending to the pganalyze server.

Our specific use case is to handle Vault Dynamic Secrets, as the dynamically generated suffixes that Vault adds to the DB roles it creates cause pganalyze to treat statements that we would like to be grouped together as separate, so there is effectively no useful history for each statement. The proposed solution (using regex) supports our requirement, but I believe would also be generic enough to be useful for other contexts, even just literal string replacements.

Detail

The DB role names generated by Vault Dynamic Secrets have two dynamically generated suffixes that we would like to trim and ignore.

For example, for role v-aws-some-service-3tce39Yn1P2lV7RSQnMN-1628677987, we want to ignore the two suffix groups, and just send v-aws-some-service to pganalyze.

The proposed solution is to add some collector config options that accept a Sed-like search/replace pattern, eg.

echo "v-aws-some-service-3tce39Yn1P2lV7RSQnMN-1628677987" | gsed --regexp-extended 's/-[[:alnum:]]+-[[:digit:]]+$//'
v-aws-some-service

It seems the built-in Go regex package supports the required regex syntax.

If you are OK in principle with this idea, we can look at fleshing out a bit more detail on this, eg a simple design (exactly what config options to provide and how), so please let us know. Also could you please confirm whether there's any server-side change that would be required to support this, or if it could be handled completely within the collector?

Thanks,

Joe.

@msakrejda
Copy link
Contributor

Hi Joe,

As Lukas mentioned offline, this makes sense in general. In some ways this is similar to #62 . And it would not require any server-side changes assuming that the "folding" happens consistently everywhere we reference roles/users in the collector.

In terms of changes this would require, we (mostly) do not track object ownership, so the main places that would have to change are:

Does the "canonical" role (without the suffixes) actually exist in your system? If so (or if that's something you can create without too much trouble to accommodate this functionality), I think that makes things easier, since pg_stat_statements deals with actual user_ids. If we can fold these other roles to a real role, that avoids having to manage synthetic ids (and avoid collisions with real role ids).

In terms of code changes, the pg_stat_statements case will either require the match to be pushed down into the query and a join with pg_roles (to map the pg_stat_statements userid to the canonical role), or to add a separate id mapping step after the main query if the role processing regexp is set. I think either approach can work. If we go for the join, we could use a left join and coalesce to the existing pg_stat_statements.userid to short-circuit the join if the regexp is not set.

The log parsing case is probably simpler: the log parsing code in general is a little gnarly due to different prefix handling, but the user name could be massaged in the one place I linked for all of them, I think.

I have not looked at the other role associations (the third bullet point), but I think this should be similar to what we would need to do for statements handling.

Does this make sense?

/cc @seanlinsley if you have any thoughts

@joehorsnell
Copy link
Contributor Author

Hi @uhoh-itsmaciek, thanks for your response and additional info on where the changes would be needed.

Does the "canonical" role (without the suffixes) actually exist in your system?

That's a good question. As things stand, no - the default username template used by Vault for dynamic roles truncates the role to 8 chars in the generated role, although that should be pretty simple for us to change - we'll investigate.

Do you think it makes sense to do an initial refactoring of the collector to introduce a type to represent the role, rather than just a string, so that operations on it (eg to sanitise it to the canonical role) can be centralised? I'm not too familiar with the collector code to know whether that would be too big a task and/or worth it.

Thanks.

@msakrejda
Copy link
Contributor

Regarding the refactoring, my gut feeling is that if we go with the separate mapping step, that's probably easier as a shared helper function rather than a first-class type for roles. It could take a database connection and a list of role OIDs, and return the normalized OID. I'm certainly open to other options if you see a better approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants