Skip to content

Apache Pinot Release 0.12.0

Compare
Choose a tag to compare
@xiangfu0 xiangfu0 released this 19 Jan 19:45
· 1782 commits to master since this release

What's Changed

Major updates

  • Force commit consuming segments by @sajjad-moradi in #9197
  • add a freshness based consumption status checker by @jadami10 in #9244
  • Add metrics to track controller segment download and upload requests in progress by @gviedma in #9258
  • Adding endpoint to download local log files for each component by @xiangfu0 in #9259
  • [Feature] Add an option to search input files recursively in ingestion job. The default is set to true to be backward compatible. by @61yao in #9265
  • add query cancel APIs on controller backed by those on brokers by @klsince in #9276
  • Add Spark Job Launcher tool by @KKcorps in #9288
  • Enable Consistent Data Push for Standalone Segment Push Job Runners by @yuanbenson in #9295
  • Allow server to directly return the final aggregation result by @Jackie-Jiang in #9304
  • TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
  • Adaptive Server Selection by @vvivekiyer in #9311
  • [Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
  • Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
  • Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
  • skip late cron job with max allowed delay by @klsince in #9372
  • Do not allow implicit cast for BOOLEAN and TIMESTAMP by @Jackie-Jiang in #9385
  • Add missing properties in CSV plugin by @KKcorps in #9399
  • set MDC so that one can route minion task logs to separate files cleanly by @klsince in #9400
  • Add a new API to fix segment date time in metadata by @KKcorps in #9413
  • Update get bytes to return raw bytes of string and support getBytesMV by @61yao in #9441
  • Exposing consumer's record lag in /consumingSegmentsInfo by @navina in #9515
  • Do not create dictionary for high-cardinality columns by @KKcorps in #9527
  • get task runtime configs tracked in Helix by @klsince in #9540
  • Add more options to json index by @Jackie-Jiang in #9543
  • add SegmentTierAssigner and refine restful APIs to get segment tier info by @klsince in #9598
  • Add segment level debug API by @saurabhd336 in #9609
  • Add record availability lag for Kafka connector by @navina in #9621
  • notify servers that need to move segments to new tiers via SegmentReloadMessage by @klsince in #9624
  • Allow to configure multi-datadirs as instance configs and a Quickstart example about them by @klsince in #9705
  • Customize stopword for Lucene Index by @jasperjiaguo in #9708
  • Add memory optimized dimension table by @KKcorps in #9802
  • ADLS file system upgrade by @xiangfu0 in #9855
  • Added Delete Schema/Table pinot admin commands by @bagipriyank in #9857
  • Adding new ADLSPinotFS auth type: DEFAULT by @xiangfu0 in #9860
  • Add rate limit to Kinesis requests by @KKcorps in #9863
  • Adding configs for zk client timeout by @xiangfu0 in #9975

Other features/changes

  • Show most recent scheduling errors by @satishwaghela in #9161
  • Do not use aggregation result for distinct query in IntermediateResultsBlock by @Jackie-Jiang in #9262
  • Emit metrics for ratio of actual consumption rate to rate limit in realtime tables by @sajjad-moradi in #9201
  • add metrics entry offlineTableCount by @walterddr in #9270
  • refine query cancel resp msg by @klsince in #9242
  • add @ManualAuthorization annotation for non-standard endpoints by @apucher in #9252
  • Optimize ser/de to avoid using output stream by @Jackie-Jiang in #9278
  • Add Support for Covariance Function by @SabrinaZhaozyf in #9236
  • Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries by @somandal in #9078
  • Improve server query cancellation and timeout checking during execution by @jasperjiaguo in #9286
  • Add capabilities to ingest from another stream without disabling the realtime table by @sajjad-moradi in #9289
  • Add minMaxInvalid flag to avoid unnecessary needPreprocess by @npawar in #9238
  • Add array cardinality function by @walterddr in #9300
  • TierBasedSegmentDirectoryLoader to keep segments in multi-datadir by @klsince in #9306
  • Add support for custom null values in CSV record reader by @KKcorps in #9318
  • Infer parquet reader type based on file metadata by @saurabhd336 in #9294
  • Include fmpp plugin module inside the src assembly file by @xiangfu0 in #9321
  • Add Support for Cast Function on MV Columns by @SabrinaZhaozyf in #9296
  • Allow ingestion of errored records with incorrect datatype by @KKcorps in #9320
  • [Feature] Not Operator Transformation by @61yao in #9330
  • Handle null string in CSV decoder by @KKcorps in #9340
  • [Feature] Not scalar function by @61yao in #9338
  • Add support for EXTRACT syntax and converts it to appropriate Pinot expression by @tanmesh in #9184
  • Add support for Auth in controller requests in java query client by @KKcorps in #9230
  • delete all related minion task metadata when deleting a table by @zhtaoxiang in #9339
  • BloomFilterRule should only recommend for supported column type by @yuanbenson in #9364
  • Support all the types in ParquetNativeRecordReader by @xiangfu0 in #9352
  • Improve segment name check in metadata push by @zhtaoxiang in #9359
  • Allow expression transformer cotinue on error by @xiangfu0 in #9376
  • skip late cron job with max allowed delay by @klsince in #9372
  • Enhance and filter predicate evaluation efficiency by @jasperjiaguo in #9336
  • Deprecate instanceId Config For Broker/Minion Specific Configs by @ankitsultana in #9308
  • Optimize combine operator to fully utilize threads by @Jackie-Jiang in #9387
  • Terminate the query after plan generation if timeout by @jasperjiaguo in #9386
  • [Feature] Support IsDistinctFrom and IsNotDistinctFrom by @61yao in #9312
  • [Feature] Support Coalesce for Column Names by @61yao in #9327
  • Disable logging for interrupted exceptions in kinesis by @KKcorps in #9405
  • Benchmark thread cpu time by @jasperjiaguo in #9408
  • Use ISODateTimeFormat as default for SIMPLE_DATE_FORMAT by @KKcorps in #9378
  • Extract the common logic for upsert metadata manager by @Jackie-Jiang in #9435
  • Make minion task metadata manager methods more generic by @saurabhd336 in #9436
  • Always pass clientId to kafka's consumer properties by @navina in #9444
  • Adaptive Server Selection by @vvivekiyer in #9311
  • Refine IndexHandler methods a bit to make them reentrant by @klsince in #9440
  • use MinionEventObserver to track finer grained task progress status on worker by @klsince in #9432
  • Allow spaces in input file paths by @KKcorps in #9426
  • Add support for gracefully handling the errors while transformations by @KKcorps in #9377
  • Cache Deleted Segment Names in Server to Avoid SegmentMissingError by @ankitsultana in #9423
  • Handle Invalid timestamps by @KKcorps in #9355
  • refine minion worker event observer to track finer grained progress for tasks by @klsince in #9449
  • spark-connector should use v2/brokers endpoint by @itschrispeck in #9451
  • Remove netty server query support from presto-pinot-driver to remove pinot-core and pinot-segment-local dependencies by @xiangfu0 in #9455
  • Adaptive Server Selection: Address pending review comments by @vvivekiyer in #9462
  • track progress from within segment processor framework by @klsince in #9457
  • Decouple ser/de from DataTable by @Jackie-Jiang in #9468
  • collect file info like mtime, length while listing files for free by @klsince in #9466
  • Extract record keys, headers and metadata from Stream sources by @navina in #9224
  • [pinot-spark-connector] Bump spark connector max inbound message size by @cbalci in #9475
  • refine the minion task progress api a bit by @klsince in #9482
  • add parsing for AT TIME ZONE by @agavra in #9477
  • Eliminate explosion of metrics due to gapfill queries by @elonazoulay in #9490
  • ForwardIndexHandler: Change compressionType during segmentReload by @vvivekiyer in #9454
  • Introduce Segment AssignmentStrategy Interface by @GSharayu in #9309
  • Add query interruption flag check to broker groupby reduction by @jasperjiaguo in #9499
  • adding optional client payload by @walterddr in #9465
  • [feature] distinct from scalar functions by @61yao in #9486
  • Check data table version on server only for null handling by @Jackie-Jiang in #9508
  • Add docId and column name to segment read exception by @KKcorps in #9512
  • Sort scanning based operators by cardinality in AndDocIdSet evaluation by @jasperjiaguo in #9420
  • Do not fail CI when codecov upload fails by @Jackie-Jiang in #9522
  • [Upsert] persist validDocsIndex snapshot for Pinot upsert optimization by @deemoliu in #9062
  • broker filter by @dongxiaoman in #9391
  • [feature] coalesce scalar by @61yao in #9487
  • Allow setting custom time boundary for hybrid table queries by @saurabhd336 in #9356
  • [GHA] add cache timeout by @walterddr in #9524
  • Optimize PinotHelixResourceManager.hasTable() by @Jackie-Jiang in #9526
  • Include exception when upsert metadata manager cannot be created by @Jackie-Jiang in #9532
  • allow to config task expire time by @klsince in #9530
  • expose task finish time via debug API by @klsince in #9534
  • Remove the wrong warning log in KafkaPartitionLevelConsumer by @Jackie-Jiang in #9536
  • starting http server for minion worker conditionally by @klsince in #9542
  • Make StreamMessage generic and a bug fix by @vvivekiyer in #9544
  • Improve primary key serialization performance by @KKcorps in #9538
  • [Upsert] Skip removing upsert metadata when shutting down the server by @Jackie-Jiang in #9551
  • add array element at function by @walterddr in #9554
  • Handle the case when enableNullHandling is true and an aggregation function is used w/ a column that has an empty null bitmap by @nizarhejazi in #9566
  • Support segment storage format without forward index by @somandal in #9333
  • Adding SegmentNameGenerator type inference if not explicitly set in config by @timsants in #9550
  • add version information to JMX metrics & component logs by @agavra in #9578
  • remove unused RecordTransform/RecordFilter classes by @agavra in #9607
  • Support rewriting forward index upon changing compression type for existing raw MV column by @vvivekiyer in #9510
  • Support Avro's Fixed data type by @sajjad-moradi in #9642
  • [feature] [kubernetes] add loadBalancerSourceRanges to service-external.yaml for controller and broker by @jameskelleher in #9494
  • Limit up to 10 unavailable segments to be printed in the query exception by @Jackie-Jiang in #9617
  • remove more unused filter code by @agavra in #9620
  • Do not cache record reader in segment by @Jackie-Jiang in #9604
  • make first part of user agent header configurable by @rino-kadijk in #9471
  • optimize order by sorted ASC, unsorted and order by DESC cases by @gortiz in #8979
  • Enhance cluster config update API to handle non-string values properly by @Jackie-Jiang in #9635
  • Reverts recommender REST API back to PUT (reverts PR #9326) by @yuanbenson in #9638
  • Remove invalid pruner names from server config by @Jackie-Jiang in #9646
  • Using usageHelp instead of deprecated help in picocli commands by @navina in #9608
  • Handle unique query id on server by @Jackie-Jiang in #9648
  • stateless group marker missing several by @walterddr in #9673
  • Support reloading consuming segment using force commit by @Jackie-Jiang in #9640
  • Improve star-tree to use star-node when the predicate matches all the non-star nodes by @Jackie-Jiang in #9667
  • add FetchPlanner interface to decide what column index to prefetch by @klsince in #9668
  • Improve star-tree traversal using ArrayDeque by @Jackie-Jiang in #9688
  • Handle errors in combine operator by @Jackie-Jiang in #9689
  • return different error code if old version is not on master by @SabrinaZhaozyf in #9686
  • Support creating dictionary at runtime for an existing column by @vvivekiyer in #9678
  • check mutable segment explicitly instead of checking existence of indexDir by @klsince in #9718
  • Remove leftover file before downloading segmentTar by @npawar in #9719
  • add index key and size map to segment metadata by @walterddr in #9712
  • Use ideal state as source of truth for segment existence by @Jackie-Jiang in #9735
  • Close Filesystem on exit with Minion Tasks by @KKcorps in #9681
  • render the tables list even as the table sizes are loading by @jadami10 in #9741
  • Add Support for IP Address Function by @SabrinaZhaozyf in #9501
  • bubble up error messages from broker by @agavra in #9754
  • Add support to disable the forward index for existing columns by @somandal in #9740
  • show table metadata info in aggregate index size form by @walterddr in #9733
  • Preprocess immutable segments from REALTIME table conditionally when loading them by @klsince in #9772
  • revert default timeout nano change in QueryConfig by @agavra in #9790
  • AdaptiveServerSelection: Update stats for servers that have not responded by @vvivekiyer in #9801
  • Add null value index for default column by @KKcorps in #9777
  • [MergeRollupTask] include partition info into segment name by @zhtaoxiang in #9815
  • Adding a consumer lag as metric via a periodic task in controller by @navina in #9800
  • Deserialize Hyperloglog objects more optimally by @priyen in #9749
  • Download offline segments from peers by @wirybeaver in #9710
  • Thread Level Usage Accounting and Query Killing on Server by @jasperjiaguo in #9727
  • Add max merger and min mergers for partial upsert by @deemoliu in #9665
  • #9518 added pinot helm 0.2.6 with secure version pinot 0.11.0 by @bagipriyank in #9519
  • Combine the read access for replication config by @snleee in #9849
  • add v1 ingress in helm chart by @jhisse in #9862
  • Optimize AdaptiveServerSelection for replicaGroup based routing by @vvivekiyer in #9803
  • Do not sort the instances in InstancePartitions by @Jackie-Jiang in #9866
  • Merge new columns in existing record with default merge strategy by @navina in #9851
  • Support disabling dictionary at runtime for an existing column by @vvivekiyer in #9868
  • support BOOL_AND and BOOL_OR aggregate functions by @agavra in #9848
  • Use Pulsar AdminClient to delete unused subscriptions by @navina in #9859
  • add table sort function for table size by @jadami10 in #9844
  • In Kafka consumer, seek offset only when needed by @Jackie-Jiang in #9896
  • fallback if no broker found for the specified table name by @klsince in #9914
  • Allow liveness check during server shutting down by @Jackie-Jiang in #9915
  • Allow segment upload via Metadata in MergeRollup Minion task by @KKcorps in #9825
  • Add back the Helix workaround for missing IS change by @Jackie-Jiang in #9921
  • Allow uploading realtime segments via CLI by @KKcorps in #9861
  • Add capability to update and delete table config via CLI by @KKcorps in #9852
  • default to TAR if push mode is not set by @klsince in #9935
  • load startree index via segment reader interface by @klsince in #9828
  • Allow collections for MV transform functions by @saurabhd336 in #9908
  • Construct new IndexLoadingConfig when loading completed realtime segments by @vvivekiyer in #9938
  • Make GET /tableConfigs backwards compatible in case schema does not match raw table name by @timsants in #9922
  • feat: add compressed file support for ORCRecordReader by @etolbakov in #9884
  • Add Variance and Standard Deviation Aggregation Functions by @snleee in #9910
  • enable MergeRollupTask on realtime tables by @zhtaoxiang in #9890
  • Update cardinality when converting raw column to dict based by @vvivekiyer in #9875
  • Add back auth token for UploadSegmentCommand by @timsants in #9960
  • Improving gz support for avro record readers by @snleee in #9951
  • Default column handling of noForwardIndex and regeneration of forward index on reload path by @somandal in #9810
  • [Feature] Support coalesce literal by @61yao in #9958
  • Ability to initialize S3PinotFs with serverSideEncryption properties when passing client directly by @npawar in #9988
  • handle pending minion tasks properly when getting the task progress status by @klsince in #9911
  • allow gauge stored in metric registry to be updated by @zhtaoxiang in #9961
  • support case-insensitive query options in SET syntax by @agavra in #9912
  • pin versions-maven-plugin to 2.13.0 by @jadami10 in #9993
  • Pulsar Connection handler should not spin up a consumer / reader by @navina in #9893
  • Handle in-memory segment metadata for index checking by @Jackie-Jiang in #10017
  • Support the cross-account access using IAM role for S3 PinotFS by @snleee in #10009
  • report minion task metadata last update time as metric by @zhtaoxiang in #9954
  • support SKEWNESS and KURTOSIS aggregates by @agavra in #10021
  • emit minion task generation time and error metrics by @zhtaoxiang in #10026
  • Use the same default time value for all replicas by @Jackie-Jiang in #10029
  • Reduce the number of segments to wait for convergence when rebalancing by @saurabhd336 in #10028

UI Update & Improvement

  • Allow hiding query console tab based on cluster config (#9261)
  • Allow hiding pinot broker swagger UI by config (#9343)
  • Add UI to show fine-grained minion task progress (#9488)
  • Add UI to track segment reload progress (#9521)
  • Show minion task runtime config details in UI (#9652)
  • Redefine the segment status (#9699)
  • Show an option to reload the segments during edit schema (#9762)
  • Load schema UI async (#9781)
  • Fix blank screen when redirect to unknown app route (#9888)

Multi-Stage Query Engine

New join semantics support

New sql semantics support:

Performance enhancement

  • Thread safe query planning (#9344)
  • Partial query execution and round robin scheduling (#9753)
  • Improve data table serde (#9731)

Library version upgrade

  • Upgrade h3 lib from 3.7.2 to 4.0.0 to lower glibc requirement (#9335)
  • Upgrade ZK version to 3.6.3 (#9612)
  • Upgrade snakeyaml from 1.30 to 1.33 (#9464)
  • Upgrade RoaringBitmap from 0.9.28 to 0.9.35 (#9730)
  • Upgrade spotless-maven-plugin from 2.9.0 to 2.28.0 (#9877)
  • Upgrade decode-uri-component from 0.2.0 to 0.2.2 (#9941)

BugFixes