FAQ
Permazen is a persistence layer that sits between your Java application and some other, underlying key/value database. The underlying database is responsible for providing transactions and durably storing information. Permazen provides all of the remaining features you expect from a "database" and more, including indexes, a command line tool, auto-generated Vaadin GUI, etc.
With this design Permazen can make persistence simple, natural, and completely type safe for a Java application, without sacrificing scalability or practical convenience.
Almost every database in existence is, at its heart, just some form of key/value store. Permazen let’s the database do what it’s really good at - storing key/value pairs - and takes over from there with the goal of providing an optimal experience Java programmers.
Having said that, Permazen also provides several key/value store implementations.
Permazen is has these layers (from top to bottom):
-
The Java model (or Permazen) layer
-
The core API layer
-
The key/value store API layer
At the bottom layer is a simple byte[]
array key/value database.
Transactions are supported at this layer and several implementations are included;
see io.permazen.kv
and sub-packages.
On top of that sits the core API layer, which provides a rigourous database abstraction on top of the key/value store. It supports simple fields of any atomic Java type, as well as list, set, and map complex fields, tightly controlled schema versioning, simple and composite indexes, and lifecycle and change notifications. It is not Java-specific or explicitly object-oriented. The core API is provided via the Database class.
The Java model layer is a Java-centric, type safe, object-oriented persistence layer for Java applications. It sits on top of the core API layer and provides a fully type-safe, Java-centric view of a core API database. All data access is through user-supplied Java model classes. Database types and fields, as well as listener methods, are all inferred from a simple set of Java annotations. This layer also provides automatic incremental JSR 303 validation. The Permazen class represents an instance of the top layer.
This top layer is what Java programmers normally deal with. It’s main job is mapping an object-oriented, Java-centric presentation onto the simpler structure/field world of the Core API. In turn, the core API relies on the key/value store API to provide a basic sorted, transactional key/value store.
Key/Value Store API
The key/value Store API is very simple. Keys and values are arbitrary
byte[]
arrays, and keys are sorted lexicographically (with unsigned byte
values). A key/value store supports transactions.
Core API Layer
The core API layer provides basic types, and concepts roughly analogous to tables, columns, and rows. The core API calls these concepts object type, field, and object, respectively. However, this analogy is loose and there are some subtle but important differences. For example, two different object types can contain the same field. This allows you, for example, to index a field across multiple object types - even if the types are not in the same type hierarchy. For example, you can index a field corresponding to a bean property declared by a Java interface, and then query the index for objects having any type that implements the interface.
All core API fields have a strictly well-defined type, sort ordering,
and serialized encoding as byte[]
values (see FieldType).
Along with "atomic" field types, the core API includes support for a few special
field types, including reference types (i.e., "pointers"), identifier
list "enum" types, lock-free counter types. The core API also supports
user-defined types.
In addition to the aforementioned simple field types, the core API layer also provides support for complex field types: List, Set, and Map.
Indexes on both simple and complex fields are supported, and composite indexes on multiple simple fields are supported.
The set of all object types and their fields defines a schema. With certain restrictions, the core API allows multiple different schemas to exist at the same time in the same database; each schema has a unique integer version. As a consequence, all objects in the database are versioned. An object type may have different fields in different schema versions.
All core API layer stored types (objects, fields, indexes, etc.) are identified by an integer storage ID, not by name. This allows names to change at a higher level without affecting the core API schema structure.
Although written in Java, there’s nothing inherently Java specific about the core API layer. The "objects" in the core API layer are just data structures: there is no explicit notion of class, inheritance, or methods (it is the job of the Permazen layer to perform that mapping).
Java Model Layer
The Java (or Permazen) layer sits on top of the core API layer. It provides the developer-friendly, Java-centric view of the core API layer. You normally only need to deal with the Permazen layer.
At the Java layer, the "schema" is implicitly defined by your
Java model classes, which are identified by the @PermazenType
annotation. To restate that: your set of Java model classes is your
Permazen schema; there is no separate schema "configuration" required.
Under the covers of course, the Permazen layer generates an appropriate
core API schema from your model classes and provides this to the core
API layer.
The Java layer also does any necessary translation of core API values. For example, in the core API layer a "reference" is described by a 64-bit object identifier (see ObjId), whereas in the Java layer a reference is a Java model object.
First, it allows complete flexibility in your Java model classes, while still providing well-defined semantics, strict type safety, and easy version migration, even in the face of arbitrary code refactoring (no small feat).
Secondly, sometimes you want to inspect or modify data directly, without any "object orientedness", i.e., without the possibility of any Java model class methods being invoked as listeners or whatever. The core API lets you do this.
The Permazen command line interface (CLI) utility also supports this notion: it can run in either core API mode or Java mode (aka. "Permazen" mode).
Permazen supports the following simple types out of the box:
-
Primitive types
-
Primitive wrapper types
-
References to Java model classes (or any wider type)
-
Enum
types -
Arrays of any simple type up to 255 dimensions (passed by value)
-
java.lang.String
-
java.util.Date
-
java.util.UUID
-
java.util.File
-
java.util.regex.Pattern
-
java.time.*
Yes, by writing a class that subclasses FieldType, annotating it with @JFieldType
, and putting in on the classpath.
An easy way to create a custom type for any type that can be encoded as
a String
is to pass an appropriate Converter<T, String>
to a new
instance of StringEncodedType. Database equality and sort order then derives from
the string representation.
In the core API layer, Enum
values are represented by EnumValue
objects which serialize (usually) into a single byte. In the Permazen layer, they are represented by instances of
the appropriate Enum
Java model class.
At the core layer, two enum types are considered equivalent if and only
if they have the same (ordered) identifier list. This means you can move
an Enum
model class to a different package without requiring a schema
change. However, if you add or change an Enum
value, that forces a
schema change, because the field’s type has effectively changed.
By default, when a field’s type changes during a schema change, the field
is reset to its default value (which is null
for non-primitive types).
However, you have the option of telling Permazen to automatically map
the old Enum
value to the new Enum
type if its identifier still exists.
This is part of a more general mechanism for automatic conversion of field values when a field’s type changes during a schema migration; see @JField.upgradeConversion() for details. In short, your options are: reset the field or try to automatically convert it.
Or for complete control, provide an
@OnVersionChange
method to map between the old and new field values.
Permazen will supply the old Enum
field values as EnumValue
objects, which are just an int
, String
pair.
Here’s an example showing an original model class:
// Schema version #1
@PermazenType
public abstract class Vehicle {
public enum Color {
RED,
LIGHT_GREEN,
DARK_GREEN,
BLUE
}
public abstract Color getColor();
public abstract void setColor(Color color);
}
and a new model class with the renamed field and schema "fixup":
// Schema version #2
@PermazenType
public abstract class Vehicle {
public enum Color {
RED,
GREEN, // was LIGHT_GREEN or DARK_GREEN
BLUE
}
public abstract Color getColor();
public abstract void setColor(Color color);
@OnVersionChange
private void update(Map<String, Object> prev) {
EnumValue colorName = ((EnumValue)prev.get("color")).getName();
if (colorName.endsWith("_GREEN"))
colorName = "GREEN";
this.setColor(Color.valueOf(colorName));
}
}
Unlike with JPA, because Permazen takes care to not mix incompatible types,
it’s not possible to read an Enum
value that doesn’t exist
from the database, even after schema changes, and you have total control
of whether and how fields are converted during a schema change.
Lists, Sets, and Maps.
The element, key, and value can have any simple type. In the case of primitive types, null values will be disallowed.
Sets actually implement NavigableSet
, and Maps actually implement NavigableMap
.
Lists have performance characteristics similar to ArrayList
.
No.
For a few operations such as creating a new instance and querying an
index, you invoke methods on the current PermazenTransaction
.
Everything else can be normal Java, and all access methods can be either instance or static methods in your Java model classes.
Let’s take a simple example Java model with Account
and User
model
classes. We have these requirements:
-
Every user must have an account
-
Usernames must be unique
-
We must be able to efficiently find users by username
-
We must be able to efficiently find all users associated with an account
Here’s an what those classes might look like, including all the "DAO" methods you would need:
@PermazenType
public abstract class User implements PermazenObject {
// Fields
// Get this user's username
@PermazenField(indexed = true, unique = true)
@NotNull
public abstract String getUsername();
public abstract void setUsername(String username);
// Get this user's account
@NotNull
public abstract Account getAccount();
public abstract void setAccount(Account account);
// "DAO" methods
// Create new user
public static User create() {
return PermazenTransaction.getCurrent().create(User.class);
}
// Find user by username
public static User getByUsername(String username) {
final NavigableSet<User> users = PermazenTransaction.getCurrent().queryIndex(
String.class, "username", User.class).asMap().get(username);
return users != null ? users.first() : null;
}
}
@PermazenType
public abstract class Account implements PermazenObject {
// Fields
// Get the name of this account
@NotNull
public abstract String getName();
public abstract void setName(String name);
// "DAO" methods
// Create new account
public static Account create() {
return PermazenTransaction.getCurrent().create(Account.class);
}
// Get all users associated with this account
public NavigableSet<User> getUsers() {
final NavigableSet<User> users = this.getTransaction().queryIndex(
User.class, "account", Account.class).asMap().get(this);
return users != null ? users : NavigableSets.<User>empty();
}
// Get all accounts
public static NavigableSet<Account> getAll() {
return PermazenTransaction.getCurrent().getAll(Account.class);
}
}
Congratulations, you’re done! You’ve just configured an entire Java application persistence layer.
You write them yourself in Java.
Yes and no.
Permazen believes that having everything done in maintainable Java code is worth the trade-off of having to write a few helper methods. Code is only written once, but it’s maintained forever.
Also, and perhaps more importantly, Permazen makes it impossible to write a poorly performing query unless you explicitly write it that way yourself.
For example, in SQL a query like
SELECT * FROM USER WHERE LOWER(USERNAME) = 'fred'
will require
examining every row of the USER
table even if the USERNAME column is
indexed, because of the use of LOWER()
in the WHERE
clause.
The problem is that it’s not obvious that this query is going to be slow just by looking at it. Of course this is just a simple example, in the real world query performance can be much more obfuscated.
In Permazen, to implement that query, you’d have to write a loop that
iterates over every User
in the database. This makes the performance
reality obvious.
The more "correct" thing to do would be to add a new private field that contained the lowercase version of the user’s name, somehow always keep it up to date, and then index that field.
Permazen makes this easy using the @OnChange
annotation. The example
below shows how you would have case-insensitive but unique usernames:
@PermazenType
public abstract class User implements PermazenObject {
// Fields
// Get this user's username - contains a mix of upper & lower case
@NotNull
public abstract String getUsername();
public abstract void setUsername(String username);
// Derived fields - these are not public
// Get this user's lower case username - automatically kept in sync
@PermazenField(indexed = true, unique = true)
protected abstract String getLowercaseUsername();
protected abstract void setLowercaseUsername(String username);
@OnChange("username")
private void onUsernameChange(SimpleFieldChange<User, String> change) {
final String username = change.getNewValue();
this.setLowercaseUsername(username != null ? username.toLowerCase() : null);
}
// "DAO" methods
// Find user by (case-insensitive) username
public static User getByUsername(String username) {
final String lowername = username.toLowerCase();
final NavigableSet<User> users = PermazenTransaction.getCurrent().queryIndex(
String.class, "lowercaseUsername", User.class).asMap().get(lowername);
return users != null ? users.first() : null;
}
// Get all users ordered by (case-insensitive) username
public static Stream<User> getAllSortedByUsername() {
return PermazenTransaction.getCurrent().queryIndex(
String.class, "lowercaseUsername", User.class).asMap()
.values().stream().flatMap(NavigableSet::stream);
}
}
Now you’ve got a fast query by lowercase username, and all the details are contained in one place and hidden from other classes.
Instead of thinking in terms dictated by the database technology,
Permazen lets you think in more natural terms of sets, specifically
NavigableSet
, which provides efficient range queries, reverse ordering,
etc.
Permazen also provides efficient union, intersection, and difference implementations (see NavigableSets). These operations provide the functionality of database joins.
Using PermazenTransaction.queryIndex()
.
Index queries are parameterized by the Java types you are interested in and type safe.
These Java types can be arbitrarily wide or narrow.
The key/value store must support data access via the KVStore interface:
-
Efficiently get, put, and remove keys
-
Efficiently find the next higher or lower key
-
Support transaction; see KVDatabase for details.
See Key Value Stores for a list.
Several popular NoSQL databases are not compatible because of one or more of the following:
-
Keys are not sorted (only hashed)
-
Keys have limited length (e.g., at most 64 or 128 bits)
Preferred but not required. The philosophy behind Permazen states that simplicity promotes solid, reliable, maintainable code. In particular, if the code is too complicated, it becomes unfeasible for developers to prove to themselves that the code is fully correct — and of course if the developers can’t ensure the code is fully correct, it won’t magically become fully correct by itself. Stated another way, "complexity kills".
A persistence technology that doesn’t provide consistent, ACID-compliant transactions can be too difficult for programmers to reason about. In addition, recently there has been a change in the traditional belief that you can’t have both ACID compliance and scalability: Google Cloud Spanner and FoundationDB are proving this assumption wrong.
In any case, you are welcome to use any key/value store you want to; you just need to make sure you understand how it affects your program logic. In particular, Permazen uses the key/value to store both primary object information and secondary (derived) index information. So, for example, if transaction mutations are not applied atomically, it’s possible an index could return results that are inconsistent with the fields that it indexes.
See LAYOUT.txt for a basic overview.
Object IDs are 64 bits (8 bytes), with a prefix that indicates the object type.
Simple field values are encoded as self-delimiting byte[]
arrays. Because they are self-delimiting, any two simple values and/or
object ID’s can be concatenated. Integral values are stored using an encoding that requires only one byte for small values
(-118 through 119), two bytes for larger values, etc.
Use the PermazenFactory class to configure your Java model classes and your underlying key/value database, and you’re good to go.
See the Spring package for an example of configuring Permazen in a Spring application.
No. Permazen is designed to avoid any "whole database" operations that might limit scalability.
Schema changes are applied on demand, on a per-object basis, as objects are accessed during normal operation.
What happens if my Java model classes change? Won’t that break the mapping to the core API objects and fields?
The short answer is: Permazen always guarantees Java type safety and correct encoding/decoding of objects, even in the face of arbitrary Java model class refactoring.
Permazen allow arbitrary code refactoring at the Java model layer, but if the generated core API schema changes in a structurally incompatible way, then a new schema version is required. Normally schema version numbers are auto-generated based on the generated core API schema, so this happens automatically.
If you want you can define schema version numbers manually, so in this case
you’ll need to specify a new schema version number, and if you try to use an
incompatible schema without changing the schema version number, you’ll get a
SchemaMismatchException
when trying to open a new transaction.
When you run code with a new schema version for the first time, Permazen records the schema in the database. From that point onward, Permazen will not allow the use of any other, incompatible schema with that same version number.
What happens to objects created by an older schema version after an upgrade to a newer schema version?
After a schema change, your new code will create objects with the new schema version. Objects created by your old code will continue to exist in the database unchanged.
What happens when a new version of my code tries to read an object created by an old version of my code?
When your new code first encounters an object with an older version number, the object will be automatically upgraded to the new schema version. Newly added fields and fields whose types have changed will be initialized to their default values, and removed fields will be deleted.
If that’s good enough for you, you don’t need to do anything else.
For simple fields whose type has changed (e.g., from int
to long
),
you can configure whether they are automatically converted (default)
or reset to their default values; see @PermazenField.upgradeConversion().
However, Permazen also gives you an opportunity to perform arbitrary schema
change "fixup" logic if necessary, by invoking any @OnVersionChange
methods
on the object. All of the fields in the old version of the object
(including fields that were removed) are made available to this method.
What happens when an old version of my code tries to read an object created by a new version of my code?
Same thing. Permazen doesn’t really care about the schema version numbers themselves; they are simply unique identifiers. So "upgrades" and "downgrades" are handled exactly the same way.
If you will have multiple versions of your code writing to the same database, then both versions will need to know how to handle an object version change from the other version. In this situation a phased upgrade process is recommended:
-
Upgrade nodes to understand both the old and new schema versions, but disable newer functionality until all nodes are upgraded
-
Once all nodes are upgraded, start using using the new schema and associated new functionality
-
(Optional) Force upgrade all remaining database objects, e.g., use CLI command:
eval all().forEach(PermazenObject::migrateSchema)
-
(Optional) Garbage collect the old schema version from your database meta-data, e.g., use CLI command:
delete-schema-version 3
-
(Optional) Remove support for the old schema version in your code
This process allows for rolling schema upgrades across multiple nodes with no downtime.
The core API layer records all of the schemas ever used in a database (until you garbage collect them) in the database meta-data, so it always knows how to decode any object.
It’s not possible to garbage collect a schema version until no more objects exist with that version.
Objects created by older schema versions whose model class no longer
exists are still accessible, but the will have type UntypedPermazenObject
. If needed, you can access their
fields using the field introspection methods of the PermazenTransaction
class.
Typically, however, deleting a Java model class means you don’t need or
want the data anymore.
You can encounter UntypedPermazenObject
instances in the following two situations:
-
As the value of a removed field in an
@OnVersionChange
schema update callback method when:-
The older version contained the model class; and
-
The newer version does not
-
-
In index query results, when:
-
The older version contained the indexed field in a model class that was removed; and
-
UntypedPermazenObject
is assignable to the Java type requested by the query (e.g., you request all objects in the index of typeObject
).
-
Note that type safety is still preserved in all situations.
What if a new schema changes an object reference to have a narrower Java type? Won’t then older versions of the class violate type safety?
No, because during a schema upgrade Permazen automatically eliminates any references that would no longer be valid due to narrowing Java types.
Of course, you have an opportunity to do something with the old, invalid
references in your @OnVersionChange
method.
Permazen requires a limited amount of consistency between schema versions. Specifically, a field cannot have two different types between schema versions and also be indexed in both schema versions. This restriction is required because indexes can index objects from any schema version, and mixing types would result in an ambiguous encoding of values in the index.
Under your control, Permazen can optionally perform some automatic
conversions (e.g., from float
to String
, Enum
values with the same
identifier, etc.) for you. See @PermazenField.upgradeConversion()).
If you need more control, you can do arbitrary conversions in an @OnVersionChange
method.
How would I handle a schema change that splits a class Vehicle
into Car
and Truck
? Or that does the reverse?
These types of schema changes are tricky for any Java persistence
framework. For example, there’s no way to avoid visiting every Vehicle
at some point to decide whether it needs to be a Car
or a Truck
.
The easiest way to handle this scenario is to upgrade in two steps. In the first phase, all three classes exist (in the obvious inheritance arrangement), and your code knows how to handle all three. During this phase, a custom background upgrade thread iterates through every instance, deciding what to do with it, updating or replacing it as necessary. In the second phase, all objects have been transitioned to the new classes, so the old class(es) are no longer needed and can be removed.
What if my schema change requires replacing instances of one class with instances of a different class? How do I update incoming references?
In Permazen all reference fields are indexed, so you can simply query the index for each reference field that refers to the instance you are replacing, and then update those references.
What if model class A contains a reference to model class B, and then a schema change deletes class B?
Then the Java type of the reference in class A will also have to change, otherwise your code won’t compile, or schema generation will fail because class B isn’t a model class.
Objects are upgraded automatically the first time your code attempts to read or write a field in the object.
Yes, PermazenObject.migrateSchema()
upgrades an object to the current schema.
From the CLI, you can upgrade every object by invoking
eval all().forEach(PermazenObject::migrateSchema)
.
Fields are expliclitly typed; each type has an associated FieldType
implementation.
You can use the CLI command delete-schema-version
to remove a recorded
schema version from the database.
This operation will fail if any objects with that version still exist -
you must upgrade (or delete) them first, e.g., using the CLI command
eval all().forEach(PermazenObject::migrateSchema)
.
For simplicity, it is recommended to always upgrade objects after a
schema change, so your @OnVersionChange
methods only have to deal with
one version change at a time.
Permazen keeps an internal index on object versions. Therefore, it’s easy to query for which objects of which types have which versions.
For example, in the CLI to find how many objects of type Vehicle
have
version four, you could say
eval all(Vehicle) & queryVersion().get(4)).stream().count()
.
I changed my model classes and now new transactions are failing with SchemaMismatchException… now what do I do?
Avoid this problem by configuring your schema version as -1
to have a
random version auto-generated for you based on hashing the schema.
Don’t forget to add @OnVersionChange
methods as necessary to handle
any required schema change fixups.
The schema version number can be provided explicitly when you configure a
Permazen instance, or auto-generated based on hashing your schema (by
setting the version to -1
). In the latter case, you don’t have to do anything.
However, you need to give Permazen permission to record a new schema version in the database; this is just an extra safety check.