columnStats as a write option requires statistic for all indexed columns during an append #222

Jiaweihu08 · 2023-10-26T13:36:54Z

What went wrong?

This is an inconvenience rather than a BUG per se - If one is to provide columnStats during appends, stats for ALL indexed columns must be present.

How to reproduce?

Different steps about how to reproduce the problem.

1. Code that triggered the bug, or steps to reproduce:

// Index data with columns 'a' and 'b'
df1
  .write
  .format("qbeast")
  .option("columnsToIndex", "a,b")
  .save(targetPath)

// Provide stats only for column 'a' when appending
df2
  .write
  .format("qbeast")
  .option("columnsToIndex", "a,b")
  .option("columnStats", """{"a_min": 1, "a_max": 2}""")
  .save(targetPath)

2. Branch and commit id:

main, f066acf

3. Spark version:

3.4.1

4. Hadoop version:

3.3.4

5. How are you running Spark?

Locally

6. Stack trace:

java.lang.IllegalArgumentException: b_min does not exist. Available: a_max, a_min
  at org.apache.spark.sql.types.StructType.$anonfun$fieldIndex$1(StructType.scala:313)
  at scala.collection.immutable.Map$Map2.getOrElse(Map.scala:236)
  at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:312)
  at org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.fieldIndex(rows.scala:187)
  at org.apache.spark.sql.Row.getAs(Row.scala:373)
  at org.apache.spark.sql.Row.getAs$(Row.scala:373)
  at org.apache.spark.sql.catalyst.expressions.GenericRow.getAs(rows.scala:166)
  at io.qbeast.spark.table.IndexedTableImpl.$anonfun$isNewRevision$2(IndexedTable.scala:159)
  at io.qbeast.core.transform.LinearTransformer.makeTransformation(LinearTransformer.scala:43)
  at io.qbeast.spark.table.IndexedTableImpl.$anonfun$isNewRevision$1(IndexedTable.scala:159)
  at scala.collection.immutable.List.map(List.scala:297)
  at io.qbeast.spark.table.IndexedTableImpl.isNewRevision(IndexedTable.scala:158)
  at io.qbeast.spark.table.IndexedTableImpl.save(IndexedTable.scala:205)

The text was updated successfully, but these errors were encountered:

osopardo1 · 2023-10-26T13:46:27Z

Wuuuu, we really need to work on this Revision flow.......... opening an issue for redefining the steps.

osopardo1 · 2023-10-26T14:05:03Z

Issue related to this #223

Jiaweihu08 added the bug Something isn't working label Oct 26, 2023

Jiaweihu08 self-assigned this Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

columnStats as a write option requires statistic for all indexed columns during an append #222

columnStats as a write option requires statistic for all indexed columns during an append #222

Jiaweihu08 commented Oct 26, 2023

osopardo1 commented Oct 26, 2023

osopardo1 commented Oct 26, 2023

columnStats as a write option requires statistic for all indexed columns during an append #222

columnStats as a write option requires statistic for all indexed columns during an append #222

Comments

Jiaweihu08 commented Oct 26, 2023

What went wrong?

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

2. Branch and commit id:

3. Spark version:

4. Hadoop version:

5. How are you running Spark?

6. Stack trace:

osopardo1 commented Oct 26, 2023

osopardo1 commented Oct 26, 2023