Skip to content

WhyPermazenWasCreated

Archie L. Cobbs edited this page Jan 9, 2024 · 2 revisions

Paper

Prefer academic papers? See Permazen: Language-Driven Persistence for Java.

Java & Databases: A Love/Hate Relationship

Databases were created long before Java. They serve their function well.

Java also serves its function well, as an object-oriented programming language and runtime.

However, using databases to persist Java object data has never been easy.

For example, there is the well-known object/relational impedance mismatch problem.

E.g., if a parent gets a new child, am I supposed to invoke

child.setParent(parent)

or

parent.getChildren().add(child)

...or both?

And why are some fields of an object available after the transaction closes but not others? How can I know which ones?

This is really a language problem. Java is a great language for expressing algorithms using object-oriented data, and SQL is a great language for querying and updating tabular data.

Solutions like JPA attempt to use Java as a language for querying and updating tabular data, which in some ways gives you the worst of both worlds. For example:

  • Databases have a limited range of column types that don't exactly match normal and/or custom Java types.
  • There is a lack of visibility into how Java queries are translated into actual database queries:
    • You can't tell if a database query will be fast or slow just by looking at it; you have to know how your ORM layer converts it into SQL, which columns are indexed, how smart the query optimizer is and what query plan it will use, and whether the database keeps a "secret" index for any aggregate functions.
    • You can't tell whether a database query is going to introduce locking problems (e.g., inter-transaction deadlocks caused by reversed locking order) by looking at it, because without (a) deducing how your ORM layer is going to convert your query into SQL, and (b) examining the database's query plan for that SQL, you don't know what order tables and rows are going to be accessed.
  • Database queries just look ugly nested inside Java code, even with recent improvements such as JPA criteria queries. There's no way to make them look "natural".
  • ORM layers often have murky areas of unspecified or unpredictable behavior, largely due to their inherent complexity. Instead, there should be no ambiguity what is going to happen in any particular situation.
  • ORM layers attempt to hide the complexity of converting everything into SQL, but they do so incompletely and imperfectly, resulting in a layer of abstraction for which you still must expend mental energy understanding and paying attention to lower level details
  • There are many subtle violations of Java type-safety, e.g., when handling Enum types

Neither Java nor SQL is a natural language for expressing the concepts relating to persistence of Java object data, which is really it's own unique domain with unique issues:

  • Persistence necessarily implies serialization, something Java traditionally doesn't do very well
  • Serialization implies a requirement for versioning of the serialized data, in order to support code changes over time
  • Applications to databases can be a many-to-one relationship, so multiple versions of a Java application, with different ideas about serialization, can be accessing the same serialized data at the same time
  • Indexes on the serialized data are required for efficiency, but the Java language has no built-in notion of an "index" on object data
  • It must be possible to flexibly query the indexed data directly without reifying into Java objects

Traditional Java persistence layers like JPA provide awkward and not strictly typesafe answers, and only to some of these issues; developers are expected to "roll your own" solution to the rest. Instead, a Java-centric persistence layer should make these issues straightforward to address, in a Java-centric way, and using easily understood, pre-defined patterns where appropriate.

Permazen Goals

The goal of Permazen is to take what's good about databases, and what's good about programming in Java, and bring the two closer together.

At it's core, any database is just a bunch of functionality wrapped around a core sorted key/value technology of some kind. At its heart, the database can efficiently find, add and remove keys, and iterate over keys in order. On top of this is added indexing, data types, table and column structures, foreign keys, joins, and SQL. In other words, SQL is really just NoSQL with a bunch of extra structure.

However, SQL was never optimized for Java programmers, and in fact often serves to create additional obstacles.

Permazen's attitude toward databases is: "You focus on the key/value part, and I'll do the rest".

Some goals and features of Permazen's Java-centric persistence model:

  • Be easy to understand for Java programmers
    • Everything should be configured using annotations
    • Everything should be done using normal Java objects
    • Type Safety is paramount
  • Be capable of scaling to large data sets and multiple nodes
    • Permazen should be able to run on almost any database
      • SQL
      • NoSQL
    • Database only need implement a simple key/value store API
  • Support arbitrary user-defined types
    • There should be no distinction between built-in types and user types
    • User types should be indexable just like built-in types
  • Provide first class, built-in abstractions for Sets, Lists, and Maps
    • You can build anything out of these three collection types
  • Provide first class object references with strong referential integrity
    • Referential integrity implies reference fields are always indexed
    • Since they are always indexed, expose that "invert reference" capability to the application
    • Configurable "on delete" and delete cascade behavior
  • Make it easy to monitor for changes through arbitrary reference paths
    • Notify my @OnChange method when "parent.friend.age" changes
    • Maintain any custom indexes (derived information) in one place
  • Provide incremental validation support
    • Automatically run JSR 303 validation, but only on what's changed
    • Allow me to run pending validations at any time
  • Support painless "on-line" schema changes without downtime
    • Track object versions automatically
    • Invoke my @OnVersionChange method with old & new field values
  • Support object lifecycle notifications
    • @OnCreate
    • @OnDelete
  • Support copying object state into and out of transactions
    • "Detached" transactions retain a portion of transaction state indefinitely in-memory
    • Let the user decide what is retained and what is not
    • Allow normal index queries on in-memory transaction data
  • Support a Java-centric command line interface (CLI)