Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose correct dynamic type when filters are applied #454

Open
aangelisc opened this issue Aug 25, 2022 · 3 comments
Open

Choose correct dynamic type when filters are applied #454

aangelisc opened this issue Aug 25, 2022 · 3 comments
Labels
datasource/ADX type/bug Something isn't working

Comments

@aangelisc
Copy link
Contributor

See grafana/support-escalations#3478. When a filter is applied on a column with a dynamic type the schema used is for the full column. This can lead to the incorrect type being chosen when the data is cast in a summarize statement.

Expected behaviour:

The appropriate type should be chosen based on the remaining values after a filter is applied to the column.

@aangelisc aangelisc added type/bug Something isn't working datasource/ADX labels Aug 25, 2022
@yaelleC
Copy link
Contributor

yaelleC commented Sep 9, 2022

We might need UX input there to decide on solution for the issue

@andresmgot
Copy link
Contributor

Let me elaborate in the issue so we can decide on specific solutions.

The root of the issue is that a dynamic column needs to be parsed as a specific type so you cannot do:

| where DynamicCol > 1

Instead it should be casted:

| where tolong(DynamicCol) > 1

To know which specific types a dynamic column can have, we are getting the schema using the buildschema function from ADX. The schema is obtained once, and it ignores the current query. The final query executed in getDynamicSchema is:

MyTable
 | take 50000
 | where isnotnull(DynamicCol)
 | project DynamicCol
 | summarize buildschema(DynamicCol)

This returns the list of all the different types from the last 50000 rows of the table. The list can contain any type (e.g. [long, double, string]. When all the types are number types, we automatically choose double as the type of column since it's the one with more precission. If the type can be either a number or another thing (string, boolean...) we cannot make that assumption since converting a string to a double would return an incorrect value. In that case, we just select the first type reported from the list.

Give that, there are several things we can do to mitigate the issue:

  1. (effort/small) Change the condition in which we select the first type and always use double if it's an option. This would unblock https://github.com/grafana/support-escalations/issues/3478 and it's not necessarily more wrong than the current choice. Also note that doing todouble("foo") does not error, it simply return an empty value.
  2. (effort/medium) Update the stored schema based on the query filters. This would narrow down the scope of the schema resolution query, potentially filtering out spurious values. Note that this is not a final solution since it can still be the case that uncompatible types may be returned. The final query would look something like:
MyTable
| where $__timeFilter(TimeCol)
| where Col1 == 'value1'
| where Col2 == 'value2'
| take 50000
| where isnotnull(DynamicCol)
| project DynamicCol
| summarize buildschema(DynamicCol)
  1. (effort/medium) In the case that there still multiple types returned, render another selector that allows the user to select a type rather than doing that automatically. This may require input from UX to make it as friendly as possible. This requires the changes in Revamp query builder #391.

We can probably do 1. to unblock the user and keeps working towards 3. Thoughts?

@yaelleC
Copy link
Contributor

yaelleC commented Sep 19, 2022

#1 was done in 4.1.6 (PR: #471)

I'll move this issue back to the backlog for someone to pick up and implement 2 and 3 🙂

@yaelleC yaelleC removed their assignment Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource/ADX type/bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

3 participants