Skip to content

JPA_Comparison

Archie L. Cobbs edited this page Jan 11, 2024 · 18 revisions

Feature Comparison

Here's a quick comparison between Permazen and JPA, the most common Java persistence technology:

Feature/Issue JPA Permazen
Maturity level Mature; since 2006 Mature; since 2014
Underlying database SQL only Any key/value store (including SQL)
Query language JPQL, SQL, Criteria None; just use regular Java
Compile-time type safety Only with Criteria Always
Configuration Java Annotations + XML Java Annotations
Number of annotation classes 89 12
Simple data types SQL data types Primitives, Date, etc. + any user-definable
Array types byte[] only Any type including multi-dimensional
Indexable data types SQL data types Any type
Composite indexes Supported Supported
Collection types Sets, lists, maps Sets, lists, maps
Lockless counter type Not supported Supported (if key/value does)
Detached Transactions Partial/implicit Supported
Offline data is queryable Not supported Supported
Efficient mutable snapshots Not supported Supported
Uniqueness constraint exclusions No Supported
Transactional validation No Supported
Forward cascades Supported Supported
Inverse cascades Not Supported Supported
"Clone" cascades Not Supported Supported
Object graph serialization Not supported Supported
Query indexes in snapshots Not supported Supported
Field change notifications Simple types only Supported
Notification via reference path Not supported Supported
Notification with old/new value Not supported Supported
Slow query debug Difficult Easy
Reference inversion Implicit; via query Supported
Versioned objects Not supported Supported
Version updates provide old/new fields Not supported Supported
Rolling online schema changes Not supported Supported
Spring integration Supported Supported
Delete actions Supported via DDL Supported
Watch for DB changes Not supported Supported
XML import/export objects Not supported Supported
Command-line client Only via SQL database Supported
CLI parses Java expressions Not supported Supported
Vaadin GUI auto-generator Not supported Supported

Problems & Solutions

If your application is written in Java, and you need to persist data onto disk, then there are certain issues and challenges that will always be present no matter what persistence layer you use. These issues are inherent in the problem that all of the persistence layers are trying to solve.

So it's instructive to compare persistence layers by looking at these inherent problems and how the various layers address them.

Note that JPA does a great job solving the problem it's designed to solve: allow Java object-oriented access to an underlying SQL database. However, what's not always obvious until you are deep in the weeds trying to debug some hairy problem are the sacrifices JPA makes because it is based on SQL. When you free a persistence layer from that requirement, as Permazen is, the experience at the Java level becomes not only simpler but more capable at the same time.

Problem #1: Transactions

Transactions allow developers to think rationally about how their code is functioning and most persistence layers support some transactional notion. From a Java programming point of view however, other than knowing they exists there is often minimal need to directly interact with transactions.

Typically, transaction management is handled by a separate layer (e.g., Spring) and transactions are exposed to the application through an implicit association of a transaction with the current thread. Commit errors convert into exceptions, and conversely exceptions within a transaction (ususally) result in an automatic transaction rollback.

This is an area where both JPA and Permazen take the same approach; there is little overall difference between the two persistence layers with respect to transaction handling.

Permazen does provide a single, uniform exception type for retryable exceptions, whereas with JPA you get whatever the underlying JDBC driver happens to throw.

Problem #2: Data Access

Once a transaction is open, there is the basic inherent problem of how do you actually access the data in the database?

Both JPA and Permazen provide access through plain Java objects (POJO's) that are either obtained from the transaction itself through some kind of query, or re-attached from a previous transaction. However, there are some important differences as well.

Transaction Data Access

With both JPA and Permazen, the Java objects (POJO's) that represent objects in the database are created anew by each new transaction, even for the same database identity. This is a logical approach, especially in light of the fact that a database identity that exists in one transaction may not exist in another. If you want to "re-use" a POJO in a new transaction, you have to ask the new transaction to find the corresponding POJO for that database identity. As a side effect, the transaction verifies that the object has not been deleted.

In any case, during a transaction a POJO always draws its state from the transaction with which it is associated.

Post-Transaction Data Access

Typically transactions need to be short-lived to reduce contention. That means your application is going to copy some data out of the transaction into memory for use later after the transaction closes. So the next inherent problem is how do you copy that data out, and how do you access it once the transaction closes?

For Java applications, it's very convenient if this post-transaction access can use the same POJO-based Java API as normal transaction access. Indeed, both JPA and Permazen have this feature, but take different approaches.

It's worth noting that whether using JPA or Permazen, you have to know ahead of time to read information into memory during the transaction if you want to access it after the transaction ends.

With JPA, POJO's retain some of their state after a transaction ends, becoming "detached" objects. However, exactly which state is retained is not always clear to the programmer, as it depends for example on whether queries were performed with lazy loading or not and which relationships the application accessed during the transaction. In other words, the JPA POJO's just represent a cache, and whatever happens to be in the cache when the transaction ends is what's available.

There are several downsides with this approach. First, JPA's cache is always separate from, and in addition to, whatever cache is provided by the database itself. This means that if the database is also caching data in memory during a transaction, that cache is completely redundant and wasting memory.

Another downside is that the set of data you need to access after a transaction ends is not necessarily equal to the set of data that you happened to access and bring into the cache during the transaction (though the former should always be a subset of the latter). However, JPA equates these notions, causing potentially wasted memory. For example, if during a transaction you access an object and iterate through some large collection property (perhaps to calculate some aggregate value), that collection will be cached in memory after the transaction as long as you keep a reference to the object, even if all you actually needed after the transaction was a few non-collection properties of the object.

A third downside to the JPA approach is that the post-transaction state available from detached JPA POJO's is impossible to query against. In other words, there is no way to perform the normal JPA queries (SQL, JPQL, Criteria) outside of a transaction. You may have loaded every user into memory, but once the transaction ends there's no way to query those users by username.

In the Permazen approach, POJO's don't retain state when their transaction closes, and trying to access them results in an exception. In fact, Permazen doesn't cache any data: Permazen POJO's represent a direct view of the data in the database as seen through the open transaction. If for some reason the caching already provided by the transaction itself is not sufficient, this can be easily (and more appropriately) handled in a separate key/value caching layer.

Permazen does however provide explicit post-transaction data access support, using "detached" transactions. These are lightweight, in-memory transactions that start out completely empty. Any data that you need to access after the transaction closes you "copy out" of the real transaction into a detached transaction. Permazen provides methods to make this operation easy, for anything from a single object to an arbitrary graph of objects (with potentially circular references) that you define. This way, the data available in the detached transaction (and taking up memory) is always exactly what you specified that you needed, no more no less.

Detached transactions have all of the functionality of normal transactions; you can query indexes, get notified of changes, etc. Querying for data in a detached transaction works exactly the same way as querying for data in a "real" transaction. The only thing you can't do with a detached transaction is commit() it. Like regular transactions, detached transations generate their own distinct POJO's.

If necessary you can create multiple detached transactions, and they will persist for as long as you hold a reference to them. The support for copying objects between a "real" transaction and detached transaction works in general between any two transactions, whether real or detached.

Finally, you can create detached transactions on top of raw key/value stores, effectively creating an ad hoc in-memory database. This allows for clean serialization solutions. For example, one machine could populate a detached transaction with a graph of objects and send the serialized key/value store to a remote machine which then creates its own detached transaction of the same data, from which is can perform efficient index queries, etc.

Problem #3: Relationships and Collections

All databases must support some notion of relationships, or "pointers", between database records. Pointers in some form are required to represent one-to-one, one-to-many, many-to-one, and many-to-many relationships. For example, if a database stores readings for water meters, there must be some pointer that links a meter reading to the associated water meter, and in this case the relationship is many-to-one, because a single meter can have multiple readings.

JPA and Permazen represent relationships and collections to the programmer in a similar way, though with important differences. Both use normal Java object references to represent relationships, and normal Java Set, List, and Map interfaces to represent collections. These references and collection classes are meant to be Java reflections of the underlying database reality.

However, the JPA representation, as a reflection of the underlying database reality, is both less precise and less accurate than Permazen's.

Settable Collection Fields

JPA allows a programmer to set a collection field to null, e.g., meter.setReadings(null), but this makes no sense - even though the relationship can sometimes be empty, it always exists. In addition, in newly created objects the programmer must remember to initialize the field (whereas in objects returned by queries, JPA will populate the field automatically).

These problems simply don't exist with Permazen, because collection fields have only getter methods, are declared as abstract methods, and Permazen guarantees that the returned value is never null.

The "inverse" problem

For collection relationships between entities, JPA defines a notion of the "forward" direction and the "inverse" direction. For example, you might be able to access both parent.getChildren() and child.getParent(). In this case child -> parent is the forward direction and parent -> child is the "inverse" direction. What this really means is that the parent -> child relationship represented by parent.getChildren() is a phantom that does not necessarily represent reality. In the underlying database, there is only one pointer - from child to parent, but JPA presents two versions of it.

Only one can be right, so it begs the question: what happens when you say both parent1.getChildren().add(child) and child.setParent(parent2) where parent1 != parent2? (Answer: parent2 becomes the parent.) This creates an opportunity for hard-to-find bugs when application code somewhere forgets to update both sides of the relationship.

Permazen solves this problem by not creating two representations of the same relationship in the first place. With Permazen only the child -> parent database relationship (the "real" one) is defined. To access the "inverse" direction from parent -> child, you query the index associated with that relationship:

   // class Child

    public abstract Parent getParent();
    public abstract void setParent(Parent parent);

  // class Parent

    public NavigableSet<Child> getChildren() {
        final NavigableSet<Person> kids = this.getTransaction().queryIndex(
          Child.class, "parent", Parent.class).asMap().get(this);
        return kids != null ? kids : NavigableSets.empty();
    }

What gets returned from parent.getChildren() is not a copy or snapshot of the relationship, it is a real-time view into the relationship. Because child.getParent() and parent.getChildren() are accessing the same data, it's not possible for them to get out of sync. Invoking child.setParent(parent1) always means that parent1.getChildren().contains(child) becomes immediately true.

Collections and Sorting

JPA sets and maps normally implement the Set and Map interfaces. You can improve this to SortedSet and SortedMap with the right magic annotations and caveats, such as having to ensure your Java sort order matches your database sort order.

However the semantics that databases actually provide are those of the more powerful NavigableSet and NavgiableMap interfaces. For example, these allow you to view the collection in reverse order.

Permazen sets and maps implement NavigableSet and NavgiableMap, and the Permazen type system guarantees that the Java ordering (which is defined by the type) always matches the database ordering.

Problem #4: Indexes

Fundamental to any database is support for indexing. Indexing is simply the automated creation of derived information that makes it efficient to perform certain queries that would otherwise be too slow. More precisely, whereas objects represent a mapping from object ID to property values, an index is a (sorted) mapping from property value(s) back to object ID.

The Java language itself does not have any built-in notion of an "index". In regular Java, if for example you want to be able to efficiently search for Person objects by name, you will have to programmatically create and maintain an index on the name property yourself using TreeMap or whatever. In other words, you have to homebrew your own index. Since indexing is core to the function of databases, this leaves the question of how indexes should be exposed to Java.

JPA does not directly provide access to indexes; neither does SQL itself. Instead, it leaves it up to you to ensure that that whatever queries you are performing are going to be efficient. This of course depends not only on the queries themselves and what indexes are defined on the database, but also on how JPA translates your query into SQL, and how the database maps that SQL into a query plan. The many layers involed can make this simple determination surprisingly difficult.

In contrast, Permazen provides a direct Java-level view into your defined indexes using the NavigableMap and Index interfaces. As a result, Permazen doesn't have or need a "query language". All queries are performed using normal Java, and their efficiency (or lack thereof) will be obvious when looking at the code.

In addition, there is less need for composite (multi-field) indexes. In Permazen, every index resolves to a NavigableSet of objects, and these sets can always be efficiently unioned and intersected using the methods in the NavigableSets utility class. For example:

  // Person fields

    public abstract String getLastName();
    public abstract void setLastName(String lastName);

    public abstract String getFirstName();
    public abstract void setFirstName(String firstName);

  // Person index queries

    public static NavigableMap<String, NavigableSet<Person>> queryLastName() {
        return PermazenTransaction.getCurrent().queryIndex(Person.class, "lastName", String.class).asMap();
    }

    public static NavigableMap<String, NavigableSet<Person>> queryFirstName() {
        return PermazenTransaction.getCurrent().queryIndex(Person.class, "firstName", String.class).asMap();
    }

    public static NavigableSet<Person> getByLastAndFirstName(String last, String first) {
        final NavigableSet<Person> byLast = Person.queryLastName().get(last);
        final NavigableSet<Person> byFirst = Person.queryFirstName().get(first);
        if (byLast == null || byFirst == null)
            return NavigableSets.empty();
        return NavigableSets.intersection(byLast, byFirst);  // efficient intersection
    }

Composite indexes are only needed when you require efficient queries to objects sorted on multiple keys.

Problem #5: Database Types

The first inherent problem with persisting Java objects in any database is the potential for mismatch between Java types and the database's supported types. For example, you can map a java.util.Date property to a database TIMESTAMP column, but those types are not the same; e.g., MySQL does not support TIMESTAMP value prior to 1970-01-01 00:00:01. You can map your object's String properties to VARCHAR() columns, but in MySQL the total length of those strings will be limited to 65,535 bytes. This kind of type mismatch creates lots of opportunities for subtle bugs.

Another problem is that databases usually restrict indexing to their supported types. If your application create custom types that don't map to supported types, you can't index them. For example, suppose you have a custom type for software versions with values such as 7.5alpha and 10.3, and you want those two versions to sort in that order. With SQL, neither a numeric nor a character column type will work. You can define your own custom binary type, but then you lose visibility into that type at the SQL level.

JPA has no solution to the type mismatch problem because it only works with SQL databases and maps Java properties directly to SQL columns.

Permazen supports Java types precisely and sorts them naturally. It requires that all types have a well defined range of values, sort order, and equality semantics that behave identically both for Java instances and database-encoded values. Permazen allows you to define your own custom types; these are first-class types that may be indexed, form collections, etc.

Problem #6: Notification of Database Changes

It is often the case that you want your application to be automatically notified when certain database fields are modified. For example, you might have a console displaying information read from the database. You want the console to update whenever the information changes, but you don't want the console to have to continually poll the database. Instead, you just want to be notified only when there is a change to the specific data you are displaying.

There are really two different versions of this problem, the first being a special case of the second:

  1. You want to be notified about changes originating from within the same application instance
  2. You want to be notified about changes originating from any application instance that is connected to the database

JPA only provides a (partial) solution to the first version of this problem, using annotations like @PrePersist and @PreUpdate. This mechanism exists entirely internal to JPA, and only the application doing the modification gets notified.

While SQL databases support triggers for notification, sadly JPA provides no mechanism to propagate trigger notifications back up the stack to Java code.

Permazen provides a simple solution to the more general version of this problem based on support from the underlying key/value store. The Key/value database KVTransaction API includes a watchKey() method, which returns a Future that fires when the specified key is modified by a committed transaction.

Due to underlying limitations, not all key/value database implementations support watchKey(), but several do including FoundationKVDatabase, RocksDBKVDatabase, LevelDBKVDatabase, and the RaftKVDatabase distributed key/value database.

The PermazenTransaction class provides a getKey() method that returns the key associated with any field in any object. So it's easy to get notified of a change in one object field. It's also possible to monitor an arbitrary subset of the database, by watching a "sentinel" field that gets modified by @OnChange notifications.

Problem #7: Database Schema and Upgrades

A more challenging inherent problem with database persistence, no matter what language the application is written in, is that application code and the structure of the data it persists changes and evolves over time. The challenge is how to safely manage changes to the structure of data when the application is upgraded.

Both JPA and Permazen are "schemaful" persistence layers. However, JPA provides no support for object versioning or schema migration, so with JPA you are on your own here. In other words, for this particular inherent problem, JPA provides no help whatsoever. Unfortunately, developers often only become aware of the complexity inherent in this problem after they have deployed code into production and are starting to contemplate how they are going to safely and reliably upgrade existing customers. The problem is exacerbated when the application is deployed across a cluster of servers and there is a requirement for zero downtime during the upgrade: then you have a situation where different servers will be running different versions of the code, with correspondingly different expectations of data structure, yet are talking to a "schemaful" database that can only have one schema at a time.

Some database technologies claim to solve this problem by being "schemaless". Of course, there is no such thing as "schemaless" data, there is only the question of which software layer enforces the schema. A database not having a schema is a "solution" to the above problem only in the sense that it now makes it entirely impossible for the database to help you solve a problem that "schemaful" databases in theory could, but in practice don't, solve.

Instead of going backward into schemaless anarchy, the more correct fix for the overall situation is to add explicit support for object versioning and schema migration into the database. That is, put the schema migration support at the same layer that defines and enforces the schema, and make everything rigorous.

Permazen does exactly this, and it does it (as always) in a Java-centric way. All objects are versioned, and you are allowed to change your object hierarchy any way you want. When an new application version encounters an object with an older schema version, the object is notified via any @OnVersionChange-annotated methods with all of the information it needs to migrate the object. The rest of the application need not be aware that any schema change has occurred. Object are also indexed by version, which facilitates schema migration by, for example, making it easy to write an object version migration background thread.

Problem #8: Command Line Access

The next inherent problem with Java and persistence is that Java is not the most convenient tool for various administration tasks relating to databases. Some kind of command line tool is needed that allows ad hoc data manipulation, queries, dump/restore, etc.

JPA's solution to this problem is an appropriate one, which is to rely on the underlying SQL database providing sufficient command line tools. However this approach as a major drawback: these tools know nothing about Java. You've left the Java world. To take a trivial example, you couldn't easily write an SQL statement that find all Person objects whose name has a certain Java hashCode(). For a more general example, you may have annotated your POJO's with lots of utility methods (e.g., calculateLastPayment()) that are entirely off-limits fromt the command line.

In any case, for Permazen relying on database vendor tools is especially unattractive because unlike with JPA, the underlying database technology (i.e., key/value store) is untyped. So using the database vendor tools would mean you are trying to make sense of unintelligible binary blobs.

Instead, Permazen provides its own command line interface (CLI) utility. Permazen's CLI provides the usual maintenance capabilities, such as schema management, XML file import/export, etc., and also includes a jshell subcommand that fires up the JDK's JShell. The JShell "snippets" are executed locally and withing a Permazen transaction context:

$ java -jar permazen-cli.jar --memory

Welcome to Permazen.

Permazen> info
  CLI Mode: PERMAZEN
  Database: Memory database
  Access Mode: Read/Write
  Verbose Mode: false
  Schema ID: Schema_109aa9de51c4e83805d25d49d12f7b41
  Schema Model: Empty
  New Schema Allowed: No
  Validation Mode: AUTOMATIC
Permazen> jshell

Welcome to Permazen JShell. Your Permazen CLI session is available via "session".

|  Welcome to JShell -- Version 17.0.9
|  For an introduction type: /help intro

jshell> io.permazen.PermazenTransaction.getCurrent();
$1 ==> io.permazen.PermazenTransaction@58cb053d

jshell> /exit
|  Goodbye
Permazen> quit

You can extend the functionality of the command line tool with your own Java-defined commands and functions, even further aligning the CLI utility with your Java application. Conversely, all of the CLI functionality can be included in your own application if desired. See permazen-cliapp for example code.