Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referencing a Decimal in Percentage Metrics causes division by zero error (table charts) #28170

Open
3 tasks done
aaron-manning opened this issue Apr 22, 2024 · 0 comments
Open
3 tasks done

Comments

@aaron-manning
Copy link

Bug description

There appears to be a bug in SuperSet in the way it generates the output used by 'Percentage Metrics' in Table charts when a Postgres 'Decimal' datatype is part of the Metric.

SuperSet: Version: 3.1.2, 3.1.0
Postgres: 16.1

What causes the bug

  1. A Metric is defined that references a Decimal column in its definition (curiously, this happens even when that data will never be actually used)
  2. There is at least one other % metric column output in the chart that ends up being null for each row

How to reproduce the bug

Create the following dataset (note that there seems to be no easy way to directly create a virtual dataset without first creating a physical one and modifying it)
image

SELECT
    *
FROM
    (
        VALUES
            (1, 10::decimal),
            (2, 20::decimal),
            (3, 30::decimal)
    ) AS t (id, val);

Create the following chart
image

% Metric 1
image

NULL

% Metric 2
image

SUM(CASE WHEN true THEN 1 ELSE val END)

The resulting error:

DB engine Error
division by zero

This may be triggered by:
Issue 1011 - Superset encountered an unexpected error. 

The query generated (as output via 'View Query' UI feature)

SELECT NULL AS "NULL",
       SUM(CASE
               WHEN true THEN 1
               ELSE val
           END) AS "SUM(CASE WHEN true THEN 1 ELSE val END)"
FROM
  (SELECT *
   FROM (
         VALUES (1,
                 10::decimal), (2,
                                20::decimal), (3,
                                               30::decimal)) AS t (id, val)) AS virtual_table
LIMIT 10000;

The query when run in SQL lab has no issues
image

Note the following

  1. Metric 1 and 2 can be re-ordered with no consequence.
  2. Metric 1 can be any query, but MUST output NULL for each row in the data (after filters) to demonstrate this bug.
  3. Metric 2 MUST reference a Decimal datatype column in the dataset to demonstrate this bug. Curiously it does not actually need to ever output this value, I imagine this is a key aspect of the bug.
  4. Whether the column referenced in Metric 2 is nullable, has null rows, or negative rows or has rows with the value 0, has no consequence.
  5. Whether the chart has other columns output (Metric, % Metric or other), or has different settings configured, appears to have no consequence.
  6. The SQL output in 'View Query' always executes without issue when run in the SQL lab. The problem is clearly limited to something related to % Metrics that lives outside of the generated queries displayed in the UI.
  7. This issue isn't limited to virtual datasets, the same happens with a regular physical dataset (querying over a view)

Expected Outcome

The chart should output the following, without error:
image
Note that to get this to output, the query used in % Metric 2 was changed to

SUM(CASE WHEN true THEN 1 ELSE NULL END)

(aka the reference to a Decimal column was removed)

Workaround

Any Decimal data types that end up used in % Metrics can be casted to Double to avoid this error.

SUM(CASE WHEN true THEN 1 ELSE val::DOUBLE PRECISION END)

Screenshots/recordings

No response

Superset version

3.1.2

Python version

Not applicable

Node version

Not applicable

Browser

Not applicable

Additional context

division by zero
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/views/base_api.py", line 127, in wraps
    raise ex
  File "/app/superset/views/base_api.py", line 121, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/app/superset/utils/core.py", line 1463, in time_function
    response = func(*args, **kwargs)
  File "/app/superset/utils/log.py", line 255, in wrapper
    value = f(*args, **kwargs)
  File "/app/superset/charts/data/api.py", line 256, in data
    return self._get_data_response(
  File "/app/superset/charts/data/api.py", line 412, in _get_data_response
    result = command.run(force_cached=force_cached)
  File "/app/superset/commands/chart/data/get_data_command.py", line 45, in run
    payload = self._query_context.get_payload(
  File "/app/superset/common/query_context.py", line 100, in get_payload
    return self._processor.get_payload(cache_query_context, force_cached)
  File "/app/superset/common/query_context_processor.py", line 587, in get_payload
    query_results = [
  File "/app/superset/common/query_context_processor.py", line 588, in <listcomp>
    get_query_results(
  File "/app/superset/common/query_actions.py", line 225, in get_query_results
    return result_func(query_context, query_obj, force_cached)
  File "/app/superset/common/query_actions.py", line 103, in _get_full
    payload = query_context.get_df_payload(query_obj, force_cached=force_cached)
  File "/app/superset/common/query_context.py", line 121, in get_df_payload
    return self._processor.get_df_payload(
  File "/app/superset/common/query_context_processor.py", line 156, in get_df_payload
    query_result = self.get_query_result(query_obj)
  File "/app/superset/common/query_context_processor.py", line 256, in get_query_result
    df = query_object.exec_post_processing(df)
  File "/app/superset/common/query_object.py", line 445, in exec_post_processing
    df = getattr(pandas_postprocessing, operation)(df, **options)
  File "/app/superset/utils/pandas_postprocessing/utils.py", line 129, in wrapped
    return func(df, **options)
  File "/app/superset/utils/pandas_postprocessing/contribution.py", line 74, in contribution
    numeric_df = numeric_df / numeric_df.values.sum(axis=axis, keepdims=True)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/ops/common.py", line 81, in new_method
    return method(self, other)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/arraylike.py", line 210, in __truediv__
    return self._arith_method(other, operator.truediv)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 7457, in _arith_method
    new_data = self._dispatch_frame_op(other, op, axis=axis)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 7496, in _dispatch_frame_op
    bm = self._mgr.operate_blockwise(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1545, in operate_blockwise
    return operate_blockwise(self, other, array_op)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/internals/ops.py", line 63, in operate_blockwise
    res_values = array_op(lvals, rvals)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/ops/array_ops.py", line 232, in arithmetic_op
    res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.10/site-packages/pandas/core/ops/array_ops.py", line 171, in _na_arithmetic_op
    result = func(left, right)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/computation/expressions.py", line 239, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/site-packages/pandas/core/computation/expressions.py", line 128, in _evaluate_numexpr
    result = _evaluate_standard(op, op_str, a, b)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/computation/expressions.py", line 70, in _evaluate_standard
    return op(a, b)

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant