New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap consideration: rely on database layer for transaction, replication, and durability #270
Comments
\o Hello @hixichen, Thanks for the issue! This came at a good time, as we are considering a GA roadmap as a community; we welcome any additional thoughts you might have in that direction (perhaps in a separate issue?). This reply also got rather long, so apologies in advance. :-) This has definitely been on my mind a lot lately. There's three sets of issues that I had in mind when removing all of the storage backends (see #64):
(I could definitely expand more on each of these, but trying to keep it brief!) The issue with Spanner is it is a cloud-only offering and thus unattractive to on-prem, airgapped users. And it is wholly proprietary offering, so by rationale of #64, it'd be best served as an external plugin, not part of the core offering (OSI-licensed integrations only!). FoundationDB is open source, yes. But from a cursory search, there's no vendor ecosystem of managed offerings. Who do I pay when my FoundationDB instance goes sideways? IMO, if we were to choose an external database, we'd want one that is OSI-licensed (so we can have native first-class support for it, and FoundationDB is!)... But also one that has a broader vendor ecosystem so that it is accessible to all categories of potential users: to run it at home, for smaller businesses who don't view infrastructure as a critical cost center and will want to rely on vendor support, and for large businesses that view careful infrastructure management as a critical requirement and will want to have experts in the technology on staff. (As an aside, we don't have existing external plugin support for storage backends, but once we have a few more storage related core features, I'd be amenable to adding it: what I don't want to have the community again be burdened by attempting to support both the supported storage modes and the arbitrary external storage plugin ecosystem too early when the final interfaces aren't ready.) Many of the existing storage backends stagnated because upstream did not maintain them. The only maintained ones were Consul (now non-OSI) and Raft. We'd have to take on a lot of work to update, modernize, and performance test any other backends, so selection should be done carefully, IMO. I wholeheartedly agree about the lack of transactions (Write/Update Consistency above). Note though, that while "raft" is nominally the name on the storage backend, it derives this from its consistency algorithm: its actual backing database is bbolt. I think we can argue either way about some of the production requirements and whether bbolt meets these. :-) Arguably it is simpler (than most fully-fledged databases) in certain aspects, so perhaps... :-) Notably, we already have transactions for critical core storage. The shortcomings presently, are thus:
Part of this is due to shortcomings of the interface itself: we can only send a series of predetermined operations! This does not conform with widespread programmer expectations aligning to something that provides all operations within the context of a transaction (e.g., Go's transaction model that returns another transactional handle). However, BoltDB does provide semantics like that! So the initial gap is about design, more so than fundamental shortcomings with the system. :-) Here's where an RFC would come in, once the problem space is understood better. Digression on clustering.You might already be aware of this, but for the benefit of the rest of the community... Upstream has three clustering modes:
These last two are Vault Enterprise only. Assuming a good choice of database, OpenBao can thus achieve fairly good availability; HA mode ensures that at least as well as Perf clusters would, and DR clusters are (for this discussion) merely an extension of HA mode for Perf clusters. The requirements are relatively modest too: there's only one active node and so a multi-writer scenario would not occur and existing (removed) storage backends ensured this even on large databases which supported it. However, Perf Secondary clusters impose a lot of infrastructure problems as each is its own cluster with some shared data syncing. Instead, it'd make more sense for OpenBao to expand HA mode to allow reading on standby nodes. This gives the read scaling of Perf Secondaries without the additional overhead and is a net-better version. Allowing multi-writers likely would be too much work to be worthwhile and would require all plugins and storage operations use transactions and would require a substantial update to the entire plugin ecosystem (as they cache data and can assume data will not change under them unless they're the active node and thus did the operation themselves to invalidate that cache). All this to say, I don't think improving clustering modes necessarily changes our database/storage requirements any. But, w.r.t.:
while certainly there is a differentiated offering, as no doubt required by an open core model (and IMO, HashiCorp making money off of Vault via Enterprise was certainly a good thing on the whole, even if I personally had wished they had stayed with an OSI-approved license), I don't think this is quite as true as we'd think. It is only true from an Enterprise support perspective, I'd posit. In particular, HA in Vault Community supports arbitrary storage backends as long as they provide their own HA mode. This is not Raft in many places and is completely transparent to Vault/OpenBao (e.g., pointing at a shared Postgres instance with replication, or some other natively distributed DB). Raft is just a convenient way to build this with their choice of backing datastore, bbolt. However, since we have no enterprise licensing concerns, taking the simplest approach to allowing scaling (the natural extension to HA mode that upstream would not consider to preserve the open core model) IMO makes the best sense. Would you mind elaborating on this point:
Does data isolation necessarily come from multiple databases (which seems expensive technologically to maintain under a single instance of the app)? Or can it come from layered seal+barrier mechanisms per-tenant, writing into the same database? IMHO, if tenants require strict database isolation, it becomes too much of an operational challenge (for a community supported open source project) to build a single platform to do so. Instead, it'd be easier to run multiple instances of the software for each of these customers, and use crypto-level separation rather than requiring multiple parallel database management, for the rest. All this to say... What is your view on Postgres? If I were to suggest an alternative to your suggestions, I'd suggest Postgres. It is a widely adopted, widely supported, widely understood database. It is a boring choice, in the way that light, neutral colored walls are standard. There are many vendors (cloud or traditional) that offer paid support for Postgres. There are variations upon it for various scenarios (Percona comes to mind as one, else the managed Postgres or Postgres compatible offerings of clouds as another). It has one of the largest communities of open database users (besides perhaps MariaDB and Sqlite) and it is very widely deployed and distributed, which makes infrastructure setup easy. And it won't restrict our future ability to ship in Linux distributions in any way. This is the direction I was leaning, but my personal roadmap was something closer to:
Curious to hear your thoughts! |
Thank you for your comprehensive response. It has provided me with a deeper understanding of Vault's decision-making regarding their technological direction.I might have been a bit hasty in my initial judgments. Essentially, the primary aim seems to be to delve into the problem space and clarify it. To rephrase, I now grasp the high availability (HA) motivation and the concept of abstracting the key-value (KV) interface for physical storage. From my perspective, I see it as follows: The simplicity and generic interface of the KV model restricts transactional capabilities since a single Vault write operation necessitates multiple write calls. However, I do concur with your analysis of Spanner and FoundationDB. Their limitations could restrict the choices available to many users, especially considering the varied environments in which they operate. That said, if we can prioritize features that drive adoption and allow for easy updates, modernization, and performance tuning, it would be beneficial. In this context, PostgreSQL emerges as a strong candidate for the backend. To clarify my thought: In an ideal scenario, users would deploy PostgreSQL as a single instance for local development and as a clustered setup for production, with the ability to replicate across multiple regions or zones as needed. Vault, functioning more like a client, would be deployed flexibly but primarily handle read operations, directing write operations to the master node of PostgreSQL. |
\o hello again @hixichen!
Just to clarify, a single write HTTP request to OpenBao could (depending on the plugin's code) result in multiple write operations to the underlying storage. This today isn't transactional, but could and should be. I think, the simplicity will likely remain, sans transactions. Adding transactions does admittedly complicate the interface, but we'll still probably keep the same core operations and not add say, relational data/queries as this would be hard to support in a storage backend-agnostic manner.
I cannot speak to the founding of upstream, nor do I wish to. However, I can make some general observations... OpenBao maintains an application-level encryption of its backend storage. Which is to say, the backend itself (whether Raft or Postgres or what have you) is not aware of the decryption keys and only ever sees encrypted data. This makes the threat model easier: it is localized to compromise of OpenBao itself. Compared to say, database-naive row-level encryption or full-disk encryption of the database host, this limits your ability to have meaningful data structures in a relational database and thus K/V looks like the least-common denominator... unless you build complex relational queries on top of that, in OpenBao (because it is the only one with the encryption keys). (In short, with this encryption model, a K/V interface is very attractive and hence a technology pairing like Raft+bbolt is rather attractive). However! I will say, upstream's Vault is more like this, in the community edition offering. I guess HashiCorp's Vault Enterprise doesn't allow Postgres as a backend for Performance Secondaries, but for the Vault Community's HA mode, it does essentially rely on the database's replication. (With a slight asterisk: HA mode is only a single active node and other standby nodes simply forward requests; they don't even handle read operations like Performance Standby nodes do). We, in OpenBao, removed Postgres and everything else, thus deviating from that client-like model (for those improvements I mentioned above -- ListPages already landed). And thanks to this discussion, I've finally gotten around to writing up this RFC on transactions that I had been thinking about. But I think it is risky to consider multi-writer, even when backed by a capable backend (like Postgres). Plugins authoring is already hard, and many are inherently stateful. E.g., Should node A revoke the credentials? Or should node B? Which one will be doing CRL rebuilding? How do they communicate about this or other things (currently there is no external node-to-node, plugin-specific communication mechanism, other than storage)? &c. By having a single-writer setup, it becomes the default that one node will handle these operations and others will service fewer (but no less important! -- PKI cert issuance without storage or Transit encryption operations, &c). I like talking about how OpenBao lacks a cross-plugin communication mechanism. However, it also lacks a cross-node communication mechanism as well, outside of storage. There's no GRPC mechanism between instances of the same plugin running on different nodes, there's no discovery of other nodes (at the plugin level -- so even if you had embedded GRPC in a plugin, you wouldn't know where to connect to -- or if you had each node write to storage, you're not sure if that's out of date or if a node is temporarily down and will come back up or if it is permanently lost), &c. In short, I think a multi-writer scheme would require active coordination between nodes, which OpenBao isn't necessarily suited to solve in the medium term. Long term, perhaps, anything is possible. :-) All this to say, for transactions, OpenBao will definitely be a client of the underlying storage technology. For data integrity, it will again be a client of the underlying storage technology. But the top-level application cannot easily be made to be multi-writer without, IMHO, substantial work. Looking forward, I think my immediate next goals are (after Transaction support lands), trying to make the existing HA mode multi-reader. Once we have this, I think we'll be in a good place to start re-introducing other storage backends if that's the community's desire, in a limited, maintainable fashion. Your help modernizing the Postgres storage backend, then, would be much appreciated, if you're so willing! :-) |
Can also consider YugabyteDB as it is wire compatible to Postgres and is open sourced cloud native with option for paid managed DB. |
@alberk8 if it is wire compatible, I assume there would be no work to do? You can find the old backend here: https://github.com/openbao/openbao/blob/before-plugin-removal/physical/postgresql/postgresql.go |
Claim: The statement is my own and does not represent my company
As someone who has operated Vault for a few years in production env, I feel that Vault's design has become trapped in a cocoon of its own making.
In its approach to modeling a generic product with an API - Controller - DB framework, Vault places itself partially within the DB's responsibilities.
This includes investing significantly in features like Raft/replication and transactions, which, in my opinion, add an unnecessary burden.
Taking a quick look at the production database requirements for secrets, keys, and certificates.
it's clear that there are numerous databases on the market that already support these features, like spanner, foundationDB.
Vault's built-in Raft and its approach to locking its controller layer and database hinder the foundational database's ability to perform cross-region replication for reading. This seems to be a strategy to push for a business license that includes cross-region replication.
I hold high hopes for this project, especially because it is truly open source. I wish it would rely more on databases for durability, reliability, and replication, rather than on Vault itself. I suggest pushing Vault towards functioning mainly as a controller layer, where each node can handle reads and all nodes can write, assuming the database supports transactions. This could eliminate the need for a leader election model, the fake high availability(why you even need standby) that doesn't necessarily contribute to the system's robustness.
The text was updated successfully, but these errors were encountered: