Skip to content

Presto Development Guidelines

Steve Burnett edited this page Jan 29, 2024 · 40 revisions

Philosophy

  • Presto is not a library. The exposed surface area is the SQL language, client protocol, JDBC driver, and SPI. The internals of Presto can and often do change drastically between releases.
  • SQL language features are forever. Anything added to the SQL interface should be considered permanent. For example, any newly added function is likely to be used by many users across thousands of stored queries. This means that removing a feature or changing the semantics is extremely difficult. While adding a new function may seem trivial, the naming and semantics, including all edge cases, must be considered carefully.
  • Presto follows standard SQL (ISO/IEC 9075). If a proposed feature exists in the standard, we will follow the standard. If not, we carefully consider how to implement it without conflicting with a future version of the standard. When this is not clear, we often err on the side of not adding the feature.
  • Correctness is of utmost importance. Users depend on Presto returning correct results.
  • Stability and maintainability of the code over the long term is generally more important than adding features in the short term. While Presto is complex software and already has many features, we have much more work left to do. Keeping the architecture and code clean allows us to continue iterating and run the system successfully in production. Compromising this may allow us to move faster in the short term, but it will hurt us in the long term.
  • Do things the simple way when possible. Only make it complicated when benchmarks or other data show that it is necessary. Complex does not mean unreadable. It is even more important to focus on readability and maintainability for complex code.

Required Reading

  • Presto follows modern Java best practices as described in the excellent book Effective Java. Every developer should read this book. Note that when the book says "use X judiciously", that generally means do not use it in Presto. (Feel free to skip the section on Java serialization, as this is not used in Presto.)
  • There are many things to consider when writing code that is readable and maintainable. Rather than attempting to list and explain all of them here with the rationale, we will defer to the book Clean Code.
  • Many parts of Presto, especially in the execution engine, are highly concurrent. We strongly recommend reading Java Concurrency in Practice (JCIP).

Development

  • The Presto code base is designed to be developed in a powerful, modern Java IDE such as IntelliJ IDEA. For example, Presto uses JCIP annotations such as @GuardedBy which will be flagged by the IDE inspections when synchronization is used inconsistently.
  • In accordance with Effective Java and JCIP, favor immutable data structures where possible. In particular, prefer Guava immutable collections over mutable collections.
  • Prefer utility classes in Guava over Apache Commons (which we generally do not use in Presto). Prefer built-in Java utility functions over equivalent Guava utilities.
  • Use ListenableFuture in preference to CompletableFuture. The former has a much simpler API and is less error prone than the latter. We previously used CompletableFuture everywhere after updating to Java 8, then switched back to ListenableFuture after encountering many bugs caused by the non-obvious design choices in the CompletableFuture and CompletionStage APIs.
  • Avoid the log-and-throw anti-pattern. Exceptions can be handled by logging, or they can be re-thrown, but not both. Violating this rule results in the same error being logged multiple times in subtly different ways, which clutters the logs and makes troubleshooting harder.
  • When re-throwing an exception, include the original exception as the cause.
  • Document things that are surprising or unexpected, or that have non-obvious design choices. See Clean Code chapter 4.
  • Dependency versions should be declared in the dependencyManagement section in the root POM. This assures consistent versioning across modules. Many common dependencies are declared in the Airbase parent base POM.
  • In addition to Guava, there many helpful libraries in Airlift that should be used where possible. Some common ones: Slice, Duration, DataSize, Threads, MoreFutures, JsonCodec
  • Consider AssertJ when writing new tests, as the resulting code is often easier to read. In particular, assertThatThrownBy() is a great way to test code that throws exceptions (in places where @Test(expectedExceptions=...) does not work).
  • See also the code style recommendations.

Formatting

  • Format code using the formatter rules. Note that the formatter does not cover everything. Follow the code style of the surrounding code. It should be uniform and look like it was written by one person. See Clean Code chapter 5.
  • Parameter lists are often too long to be readable on a single line. When wrapping parameters, put each one on a separate line. Otherwise, keep them all on the same line. This applies to both declarations and call sites.
  • Avoid abbreviations or acronyms in names. These generally make code harder to read as they are both technical domain specific and language specific. For example, use executorService or simply executor rather than execSvc. There are a few exceptions to this, such as stats vs statistics. Use the most concise name that represents the concept. See Clean Code chapter 2.
  • Common acronyms in names should be capitalized as if they are words, rather writing them in uppercase. For example, write HttpClient or SqlParser, not HTTPClient or SQLParser. This improves readability, especially when multiple acronyms are used together. For example, consider HttpSqlDriver vs HTTPSQLDriver.
  • Dependencies in POMs are ordered with compile (default) scope at the top and test scope at the end. For Presto plugins, the SPI dependencies go in the middle. The SPI and test sections are delineated with a standard comment (see existing usage for examples).
  • When using streams, put the stream() call on the same line as the stream source, rather than putting it on the next line. Seeing it on a line by itself does not improve readability and wastes vertical space. The rest of the stream methods should be on separate lines.
  • Use static imports for methods when the method name is self descriptive and does not need to be qualified by the class name. For example, use randomUUID() rather than UUID.randomUUID(), as the UUID. prefix is entirely redundant and thus makes the code harder to read.
Clone this wiki locally