Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

columnStats as a write option requires statistic for all indexed columns during an append #222

Open
Jiaweihu08 opened this issue Oct 26, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Jiaweihu08
Copy link
Member

What went wrong?

This is an inconvenience rather than a BUG per se - If one is to provide columnStats during appends, stats for ALL indexed columns must be present.

How to reproduce?

Different steps about how to reproduce the problem.

1. Code that triggered the bug, or steps to reproduce:

// Index data with columns 'a' and 'b'
df1
  .write
  .format("qbeast")
  .option("columnsToIndex", "a,b")
  .save(targetPath)

// Provide stats only for column 'a' when appending
df2
  .write
  .format("qbeast")
  .option("columnsToIndex", "a,b")
  .option("columnStats", """{"a_min": 1, "a_max": 2}""")
  .save(targetPath)

2. Branch and commit id:

main, f066acf

3. Spark version:

3.4.1

4. Hadoop version:

3.3.4

5. How are you running Spark?

Locally

6. Stack trace:

java.lang.IllegalArgumentException: b_min does not exist. Available: a_max, a_min
  at org.apache.spark.sql.types.StructType.$anonfun$fieldIndex$1(StructType.scala:313)
  at scala.collection.immutable.Map$Map2.getOrElse(Map.scala:236)
  at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:312)
  at org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.fieldIndex(rows.scala:187)
  at org.apache.spark.sql.Row.getAs(Row.scala:373)
  at org.apache.spark.sql.Row.getAs$(Row.scala:373)
  at org.apache.spark.sql.catalyst.expressions.GenericRow.getAs(rows.scala:166)
  at io.qbeast.spark.table.IndexedTableImpl.$anonfun$isNewRevision$2(IndexedTable.scala:159)
  at io.qbeast.core.transform.LinearTransformer.makeTransformation(LinearTransformer.scala:43)
  at io.qbeast.spark.table.IndexedTableImpl.$anonfun$isNewRevision$1(IndexedTable.scala:159)
  at scala.collection.immutable.List.map(List.scala:297)
  at io.qbeast.spark.table.IndexedTableImpl.isNewRevision(IndexedTable.scala:158)
  at io.qbeast.spark.table.IndexedTableImpl.save(IndexedTable.scala:205)
@Jiaweihu08 Jiaweihu08 added the bug Something isn't working label Oct 26, 2023
@Jiaweihu08 Jiaweihu08 self-assigned this Oct 26, 2023
@osopardo1
Copy link
Member

Wuuuu, we really need to work on this Revision flow.......... opening an issue for redefining the steps.

@osopardo1
Copy link
Member

Issue related to this #223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants