Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fetch data for a specific partition key when partition key is defined with more than 4 columns. #149

Open
parekuti opened this issue Jan 8, 2018 · 5 comments

Comments

@parekuti
Copy link
Contributor

parekuti commented Jan 8, 2018

dse-spark1.4.8 branch

For example =>
select * from loadtest.fdb_partition_test_chunks where "partition"=0x016101620163016401650166 and version=0; This query gives me data when i run it against C*.

But when i translate this query as a thrift query then no data returned.
select * from fdb_loadtest_partitiontest where field1='a' and field2='b' and field3='c' and field4='d'
and field5='e' and field6='f'

@velvia
Copy link
Member

velvia commented Jan 8, 2018 via email

@parekuti
Copy link
Contributor Author

parekuti commented Jan 9, 2018

Yes, it's spark SQL. I can see the filters passed down from the logs.

@parekuti
Copy link
Contributor Author

parekuti commented Jan 9, 2018

Also another thing i noticed when query table which has a partition key with 5 columns and one of the column is timestamp type then getting CastException. This is not happening in the case of table with partition key of 4 columns.
select * from reading_5keys where corp_cd='X' and cli_no='X' and spin_asset_id=X
and reading_date='2017-05-23 00:00:00' and reading_day_slot=6
ERROR o.a.s.s.h.t.SparkExecuteStatementOperation - Error executing query:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 62, 127.0.0.1): java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Timestamp
at filodb.core.SingleKeyTypes$TimestampKeyType$.toBytes(KeyType.scala:208)

@velvia
Copy link
Member

velvia commented Jan 10, 2018 via email

@parekuti
Copy link
Contributor Author

parekuti commented Jan 10, 2018

It's an issue with the order of filters that we send it to scan a partition. For example partition key defined in the order -> corp_cd,cli_no,vendor_code,spin_asset_id,reading_day_slot
FA,6210,NWF,3453352,07 -> Incoming filter order
6210,FA,NWF,3453352,07 -> outcome of the parseFilters function. This order is maintained and send to scan method which generates a wrong Hex key to fetch data. We need to fix this one in order to properly read that partition.

See below logs for more details.


[2018-01-10 10:11:10,417] INFO filodb.spark.FiloRelation$ - Incoming filters = List(EqualTo(corp_cd,FA), EqualTo(cli_no,6210), EqualTo(vendor_code,NWF), EqualTo(reading_day_slot,6), EqualTo(spin_asset_id,5555006))
[2018-01-10 10:11:10,420] INFO filodb.spark.FiloRelation$ - Incoming filters collect = List((corp_cd,EqualTo(corp_cd,FA)), (cli_no,EqualTo(cli_no,6210)), (vendor_code,EqualTo(vendor_code,NWF)), (reading_day_slot,EqualTo(reading_day_slot,6)), (spin_asset_id,EqualTo(spin_asset_id,5555006)))
[2018-01-10 10:11:10,429] INFO filodb.spark.FiloRelation - Incoming filters order after parsing: Map(cli_no -> List(EqualTo(cli_no,6210)), corp_cd -> List(EqualTo(corp_cd,FA)), vendor_code -> List(EqualTo(vendor_code,NWF)), reading_day_slot -> List(EqualTo(reading_day_slot,6)), spin_asset_id -> List(EqualTo(spin_asset_id,5555006)))
[2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column cli_no, filters List(EqualTo(cli_no,6210))
[2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column corp_cd, filters List(EqualTo(corp_cd,FA))
[2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column vendor_code, filters List(EqualTo(vendor_code,NWF))
[2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column reading_day_slot, filters List(EqualTo(reading_day_slot,6))
[2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column spin_asset_id, filters List(EqualTo(spin_asset_id,5555006))
[2018-01-10 10:11:10,450] INFO filodb.spark.FiloRelation$ - Push down partition predicates: List(Set(6210), Set(FA), Set(NWF), Set(6), Set(5555006))


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants