Presto Virtual Meetup June 05 2020

We pushed the May 2020 meetup out to June given the memorial day long weekend. We have 2 speakers present at this meetup and there are the links to the talks they presented.

Agenda:

11:00am -11:05am - Welcome to the meetup

11:05am -11:30am - Extending Presto at LinkedIn with a Smart Catalog Layer (LinkedIn) slides

11:30am -11:55am - Common Sub Expression Optimization (Facebook) slides

11:55 am -12:00 pm - Wrap up

Details:

Talk #1 Extending Presto at LinkedIn with a Smart Catalog Layer Walaa Eldin Moustafa, Staff Software Engineer at LinkedIn

In this talk, Walaa describes how LinkedIn extended its Presto Hive Catalog with a smart logical abstraction layer that is capable of reasoning about logical views with UDFs by using two core components, Coral and Transport UDFs. Coral is a view virtualization library, powered by Apache Calcite, that represents views using their logical query plans. Walaa shows how LinkedIn leverages Coral abstractions to decouple view expression language from the execution engine, and hence execute non-Presto-SQL views inside Presto, and achieve on-the-fly query rewrite for data governance and query optimization. Moreover, he describes Transport UDFs, a framework for defining user-defined functions once, and automatically translating them to native UDF versions of multiple engines such as Presto, Spark, Hive, or data formats such as Avro. Both Coral and Transport UDFs are open-source projects. Learn more about them at https://github.com/linkedin/coral and https://github.com/linkedin/transport.

Talk #2 Common Sub Expression Optimization Rongrong Zhong, Software Engineer at Facebook

In complex analytics queries, we often see repeated expressions, for example parsing the same JSON column but extracting different fields, elaborate CASE statement with common predicates and different ones. Previously, Presto will compute the same expression many times as they appear in query. With common sub expression optimization, we would only evaluate the same expression once within the same project operator or filter operator. In our workload, we’ve seen 3x improvements on certain queries with expensive common sub expressions like JSON_PARSE. Microbenchmark also shows a consistent ~10% performance improvement for simple common sub-expressions like x + y. In this talk, we will talk about how this is implemented.

Please check our meetup group at https://www.meetup.com/prestodb