Add "metric" to show actual value of column, i.e. no aggregate #19182

brylie · 2019-03-29T11:32:36Z

brylie
Mar 29, 2019

We have a query that returns timeseries data with two columns:

date
count

I would like to simply graph the data as a line chart. However, when selecting the line chart, it requires that I choose a metric for the data, e.g. sum, average, etc

Since my data are already in the desired form, how can I just tell the chart to use the data as-is?

Note: I can also select the "sum" metric, since I am not performing any additional aggregation, but this seems a bit "kludgy".

rumbin · 2019-03-30T07:10:50Z

rumbin
Mar 30, 2019

You can use a workaround:
Just define an aggregate that would yield the original value if only one element per group is present, like, e.g. sum, min, max, avg.
Then, make sure you configure your chart to use the original time granularity.

However, this workaround is only valid, as long as the timestamps are all distinct. Otherwise the grouping would actually aggregate more than one value...

2 replies

shawnesquivel Mar 17, 2023

You can use a workaround: Just define an aggregate that would yield the original value if only one element per group is present, like, e.g. sum, min, max, avg. Then, make sure you configure your chart to use the original time granularity.

However, this workaround is only valid, as long as the timestamps are all distinct. Otherwise the grouping would actually aggregate more than one value...

can you further explain how to do this with screenshots or a step by step process?

YayaDaxter Apr 12, 2023

I need an additional explanation too please!

brylie · 2019-03-30T12:11:07Z

brylie
Mar 30, 2019
Author

Good workaround.

I still would like to see this as a feature. I.e. aggregations should be optional, since they may be done during the query.

0 replies

2019-05-29T12:48:12Z

stale[bot]
bot May 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

0 replies

brylie · 2019-05-30T09:15:16Z

brylie
May 30, 2019
Author

Bump.

0 replies

2019-07-29T09:21:52Z

stale[bot]
bot Jul 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

0 replies

khuranabalvinder · 2020-05-15T11:32:11Z

khuranabalvinder
May 15, 2020

Bump

0 replies

drewgonzales360 · 2020-08-09T19:19:44Z

drewgonzales360
Aug 9, 2020

Bump

0 replies

benmaier · 2020-10-12T09:28:04Z

benmaier
Oct 12, 2020

I'd like to see this feature, too. having data in two columns and then plotting the first column against the second column is the base case, in my opinion. it's very confusing to not have this as an option for a first-time user (such as myself).

0 replies

villebro · 2020-10-12T09:36:03Z

villebro
Oct 12, 2020
Collaborator

This shouldn't be too difficult to implement. I'll take a stab at adding this in the coming weeks.

0 replies

junlincc · 2020-10-17T06:30:04Z

junlincc
Oct 17, 2020
Collaborator

@villebro I would love to see this feature happen, let's make it post 1.0 item! added to roadmap inbox :) https://github.com/apache-superset/superset-roadmap/projects/1

0 replies

Yattabyte · 2021-02-12T19:14:42Z

Yattabyte
Feb 12, 2021

Kinda baffled that this wouldn't have been the very first behaviour implemented, with aggregates coming after

0 replies

brian-visikon · 2021-05-07T04:22:52Z

brian-visikon
May 7, 2021

bump

0 replies

brian-robillard1 · 2021-05-27T22:34:41Z

brian-robillard1
May 27, 2021

bump

0 replies

mistercrunch · 2021-05-28T06:30:00Z

mistercrunch
May 28, 2021
Collaborator

Superset's explorer is used to explore multidimensional datasets, and semantically metrics in Superset are strictly defined aggregate expressions. That is the case for metric definitions in most BI tools. The dimensions / metric mental model is widely accepted and generally easy to reason about.

Doing a sum of a single row is valid, personally fail to see why people see this as a problem. If you happen to add other columns/dimensions to your dataset, things will still work.

There are complex implications here but if you'd like to say "I don't want Superset to not aggregate this" through the UI and happen to have duplicates in your dataset for whatever reason, or simply by check that box by mistake, there's a whole lot of implications, like dealing with high volume data and/or duplicates. In the current model, Superset has guaranties around the granularity of the queries it generates, if that's not the case, Superset has to trust that the user is right, or assert that the grain of the query is the one expected. Handling these exceptions and communicating them to the user "hey looks like you have dups an you should use an aggregate function" seem overall harder and less intuitive than the original proposition: "metrics are aggregate expressions".

0 replies

rumbin · 2021-05-28T06:58:05Z

rumbin
May 28, 2021

I think that we need to see that there are two worlds.
One ist the one of BI tools where it is very common to only show aggregated values.
Then there is the world of technical users who are used to plotting the unaggregated values with tools like Excel, Origin, MATLAB, Python.

The thing is, that these worlds happen to converge at some companies.
Technical data (shopfloor, production machines) gets stored in Data Warehouses together with correlated business data. And people desire to explore both with one single tool.
Technically oriented users, like e.g. product developers or quality engineers, might start their investigations on aggregated data but soon need to dig down to the unaggregated values in order to see things clearer.
Very often these people need to plot 2-D scatter plots of two raw dimensions for determining the correlation of these dimensions. As a real world example, take, e.g. pressure-temperature diagrams, hysteresis plots, current-voltage, .... In a technical world these are ubiquitous.

I agree on your concerns regarding the potentially huge amount or returned data, @mistercrunch. However, I think that the LIMITs that Superset applies anyway will minimize the damage here. We just need to ensure that the user is well aware of the applied limit.
In well-built dashboards the user would need to narrow down the amount of data by means of filters until displaying the unaggregated data is really useful.
This can to a certain amount be accomplished by employing Jinja logic, I suppose.

0 replies

rumbin · 2023-02-01T22:46:52Z

rumbin
Feb 1, 2023

@manikanta-dornala nicely summarised

0 replies

shawnesquivel · 2023-03-17T19:22:00Z

shawnesquivel
Mar 17, 2023

Bump x100 (is that allowed?)

I have simple time series data that corresponds to sensor data.

time (seconds) = [0, 1, 2, 3]
pressure (in atmospheres) = [1.0, 1.1, 1.2, 1.3]

I want to plot Time against Data, but I can't do it without aggregating values.

Can anyone explain to me why this is not supported? It's the only thing hindering me from using Superset.

import seaborn as sns
import matplotlib.pyplot as plt

time = [0, 1, 2, 3]  # seconds
pressure = [1.0, 1.1, 1.2, 1.3]  # atmospheres
sns.set_style('darkgrid')
fig, ax = plt.subplots()
sns.scatterplot(x=time, y=pressure, ax=ax)
ax.set(title='Sensor data over time',
       xlabel="Time (sec)",
       ylabel="Pressure (atm)")

plt.show()

4 replies

benmaier Mar 17, 2023

Yes, this is because developers know better than users or designers what users need and what they need is to jump through extra hoops for 4 years apparently

shawnesquivel Mar 17, 2023

@benmaier Going through the replies on the thread, it seems quite clear to me that a lot of people want this feature.

However, It doesn't seem like there is a lot of morale to add this feature.

Do you have any workarounds/other similar tools that may include the features that I need?

benmaier Mar 17, 2023

Nope, sorry, I am one of the people that complained higher up that this obvious feature has not been implemented lol

ian-lewis-d May 17, 2023

I've just come across this issue, having started an appraisal of Superset. Frankly, it's ridiculous that Superset can't natively handle pre-aggregated data (or simple value oriented series). Forcing the use of an aggregation adds complexity.

I can write my own aggregate queries, it's not necessary or appropriate for Superset to then force me to aggregate single points of data.

rumbin · 2023-03-17T20:14:32Z

rumbin
Mar 17, 2023

@rusackas, I know that tagging is immoral, but I hope that you rather appreciate to be made aware of this pain point which users have reported for years now.

I have been in contact with many different users of different backgrounds and at different companies, and this is the No. 1 (now that the generic x-axis is implemented).

Is there any awareness and/or roadmap for un-aggregated plotting?

I wonder if Preset users don't also request this feature.

0 replies

JakobEP16 · 2023-03-31T07:38:55Z

JakobEP16
Mar 31, 2023

Bump.

0 replies

villebro · 2023-03-31T07:42:07Z

villebro
Mar 31, 2023
Collaborator

@srinify I know your team is working on this, would you be able to shed some light on what you're building? Which reminds me I will deliver those action points I promised in our last chat.

0 replies

villebro · 2023-03-31T07:43:34Z

villebro
Mar 31, 2023
Collaborator

For the rest of the people I'd like to remind you that this is an Open Source Community, and everyone is happily encouraged to step up and help develop this feature if it's blocking them.

0 replies

jpedrick · 2023-03-31T17:38:36Z

jpedrick
Mar 31, 2023

Hey @villebro , I think others might be willing to work on this if the Superset team acknowledged that this is desirable. @mistercrunch made the case that this feature doesn't belong in Superset at all and that Superset is only meant for the subset of a subset of BI data visualization which only allows aggregated metrics. Given that the Superset team has verbally invalidated the needs for non-aggregated visualization in this tool, nobody will want to spend their valuable time forking and maintaining their own version Superset when there are other tools that already do the job that they can use.

0 replies

villebro · 2023-03-31T19:46:38Z

villebro
Mar 31, 2023
Collaborator

@jpedrick thanks for sharing some additional background. I think others can also chime in here, but I think it might be a good idea to reiterate some core pillars about how Apache projects operate. Firstly, no one person makes decisions for the project - rather, the project as a collective makes decisions. While it's true that people who have an active role in the codebase can be seen as having more influence over the project than, say, a person who hasn't contributed commits, this is not strictly so - anyone has the power to propose new features, and if if there is enough momentum and community support to back them up, they will most certainly be seriously considered. I can't count the times I've reversed my own position on something after getting pushback from non-committers. And IMO, this is one of the best aspects of OSS.

I completely agree that having a non-aggregating scatterplot absolutely makes sense, and just this week we met (virtually) with @srinify and team to discuss this very topic. Please check the Slack thread here that kickstarted the effort: https://apache-superset.slack.com/archives/C0170U650CQ/p1679361957397339 . If there are others who feel strongly about this feature, e.g. have ideas about how a non-aggregated scatterplot should work or would even be open to contribute to the feature, please do speak up! If there's broad interest in this I'm happy to setup a dedicated Slack channel for this to make it easier to coordinate collaboration, and it would definitely help to start a dev@ discussion about it to make sure it gets broad visibility.

At the end of the day this project lives and dies by its community, and if the community feels we're not serving their interests, then we're not doing a good job. For that reason I apologize for not having been more active in this discussion, and hope we can turn a new page on this discussion and make sure Superset keeps evolving in a direction that the community agrees with.

0 replies

rahulideas2it · 2023-05-25T09:13:33Z

rahulideas2it
May 25, 2023

Any update on this?

0 replies

jbest · 2023-06-22T18:57:40Z

jbest
Jun 22, 2023

I encountered the same need today, bump.

0 replies

tecbr · 2023-12-15T20:16:32Z

tecbr
Dec 15, 2023

I installed Superset and the first simple thing I needed I couldn't see: a simple graphic with x vs y with 2 columns of the table.
This is unacceptable in this type of software.

0 replies

jseparovic · 2024-03-12T04:39:41Z

jseparovic
Mar 12, 2024

I'd like to see a gauge of the latest value in my dataset whatever the time in the datesource may be.
If I'm using a time based filter of say 15 seconds, and my data is scheduled to arrive every 15 seconds, I can sometimes see a NaN if the db insert arrives late.
If I set it to say 30 seconds, then using an aggregation can shows intermittent averages when the value switches from say 1 to 10 which makes no sense in my particular use case.

2 replies

rumbin Mar 12, 2024

Depending on your database, you may succeed by employing a max_by() aggregation function to yield the value at the max timestamp.

jseparovic Mar 13, 2024

Ah yeh of course, I just need to play with the view to return only the latest record. Cheers

sandeepr43 · 2024-03-13T11:10:08Z

sandeepr43
Mar 13, 2024

+1 need this improvement.

0 replies

ZuzannaSadowska · 2024-04-24T07:37:26Z

ZuzannaSadowska
Apr 24, 2024

Bump

0 replies

rusackas · 2024-04-24T15:17:16Z

rusackas
Apr 24, 2024
Collaborator

Hi everyone,

This issue/discussion is 5 years old, and is not a bug but rather a feature request. I'm closing it for those reasons, mainly. There's also the consideration that if someone does want to implement this change, it's a fundamental departure from how Superset currently operates, and should be proposed as a SIP that considers all the implications for things like how unaggregated data would be visualized by all plugins (i.e. what happens with a line chart when you have many Y values on a single X value), what error/edge cases that night impose and how to solve them, what performance issues we might face (e.g. more data over the wire) scaling/performance issues, etc. We're open to proposals here, but this issue is not a bug, does not have an implementation plan, and is full of bump comments that do not contribute to its success.

Again, I'm not closing this to sweep it under the rug, but in this issue/discussion's current state, it's effectively unactionable. if anyone seriously wants to contribute towards this, we're happy to continue the discussion on the dev list, on a rebooted ideas thread here on GitHub Discussions, on slack, at Town Hall, or any other appropriate venue. This thread doesn't seem to be going in a constructive direction at this point, but I'm more than happy to reopen it if anyone disagrees.

0 replies

Add "metric" to show actual value of column, i.e. no aggregate #19182

Replies: 36 comments · 8 replies

brylie Mar 30, 2019 Author

stale[bot] bot May 29, 2019

brylie May 30, 2019 Author

stale[bot] bot Jul 29, 2019

villebro Oct 12, 2020 Collaborator

junlincc Oct 17, 2020 Collaborator

mistercrunch May 28, 2021 Collaborator

villebro Mar 31, 2023 Collaborator

villebro Mar 31, 2023 Collaborator

villebro Mar 31, 2023 Collaborator

rusackas Apr 24, 2024 Collaborator

Replies: 36 comments 8 replies

brylie
Mar 30, 2019
Author

stale[bot]
bot May 29, 2019

brylie
May 30, 2019
Author

stale[bot]
bot Jul 29, 2019

villebro
Oct 12, 2020
Collaborator

junlincc
Oct 17, 2020
Collaborator

mistercrunch
May 28, 2021
Collaborator

villebro
Mar 31, 2023
Collaborator

villebro
Mar 31, 2023
Collaborator

villebro
Mar 31, 2023
Collaborator

rusackas
Apr 24, 2024
Collaborator