Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection (like Vertica) support #7491

Open
simonaubertbd opened this issue Mar 27, 2024 · 12 comments
Open

Projection (like Vertica) support #7491

simonaubertbd opened this issue Mar 27, 2024 · 12 comments

Comments

@simonaubertbd
Copy link

Is your feature request related to a problem? Please describe.
When you design a dashboard in a Bi Tool (like Tableau), you don't write the exact query of the viz. You create a model with one or several tables and then you build your viz on this model (which must be at the finest granularity needed in the dashboard).
However, most of the time, the granularity of your viz is different, meaning there is a lot of grouping at distinct level.

Describe the solution you'd like
Vertica has a unique way to solve this issue by building projections. To make it simple, projections are stored "synonyms" of the table that is queried, it can contains less columns, being filtered, or aggregated (so less data and faster query) and the engine choose the best fitted projection for any query.
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Projections/WorkingWithProjections.htm
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Statements/CREATEPROJECTIONLiveAggregate.htm
Of course, it takes more space disk but it's way better for dashboarding.

Describe alternatives you've considered
N/A

Additional context
N/A

@njnes
Copy link
Contributor

njnes commented Mar 27, 2024

you mean updatable materialized view?

@simonaubertbd
Copy link
Author

Hello @njnes and thanks for answering. No, it's not the same.

Let's say we have the table order
order_id, order_date, amount

And a dashboard based on this table on Tableau (or any dashboarding tool) with three objects :
-a total amount KPI
-a line chart of amount by date
-a detail straight table

You can imagine having two aggregated projections with the total amount and the amount by date and when you query the table order, the database system will automatically switch to the good projection instead of querying the table. For the dataviz user, it's totally transparent, all he sees is a query to the order table.
The differences with the materialized view is that's automatically updated every time the table is updated but also, it's not a view/table object, it's "below" and the database engine will switch to the good projection according to the query. pretty fast on Vertica.

@njnes
Copy link
Contributor

njnes commented Mar 27, 2024

It sort of a materialized view, but many (ie one per projection) and updatable automatically, ie when base tables change update the aggregation.

@simonaubertbd
Copy link
Author

simonaubertbd commented Mar 27, 2024

@njnes I don't think so because there is this totally transparent switch to the most accurate projection and in my understanding, this is not a materialized cube feature.

I hope this slide found in the wild will help to understand what I try to mean :
image

NB : there is always the default projection which is exactly like the table. And then, you can add your own projections depending to the query you know that will be launched on this table.

@njnes
Copy link
Contributor

njnes commented Mar 27, 2024

your vertica slide confuses me even more. We are a column store just store some other order of the columns or subset isn't useful. Ordering on different column maybe usefull (we have order index for this). Most of the aggregations will then be order based.

@simonaubertbd
Copy link
Author

simonaubertbd commented Mar 27, 2024

@njnes Ok, this isn't the expected effect, sorry for the confusion. My point was not having less column but less rows, thanks to the aggregated projections. That's typically an issue on dashboarding when you query at different granularity level the same table, cf my example on the order table.

One table which is a virtual object
Three projections (not aggregated, sum of amount, sum of amount by date) that are stored on the base.

The dashboarding tool sends three queries (because three objects) on the same order table (well, that's the way Tableau is designed).

The database system chooses the good projection after parsing the query. It runs pretty fast since there is no live aggregation.

And we can imagine the same system with filters like a projection that only get the amount on the current year... or, as you noticed, with sorting.

I hope this is clearer?

@njnes
Copy link
Contributor

njnes commented Mar 27, 2024

your 3 projections are more or less 3 queries (or view definitions)?
ie select * from base.
select sum(amount) from base
select sum(amount) from base group by extract(year from date). No the match with projections isn't clear yet.

@simonaubertbd
Copy link
Author

"your 3 projections are more or less 3 queries (or view definitions)?"

Somehow.... but :
-it's stored in base (like a materialized view that would update each time the main table is updated)
-it's transparent for the user who only queries the order table and doesn't even know there are projections. Then, the system switch to the good projection.
the idea is to make the fewer live work by storing AND to be easy to use in dashboarding tools.

@njnes
Copy link
Contributor

njnes commented Mar 27, 2024

Okay so the idea is to define a virtual table (using a query which we call 'base') and then the system should keep (and update) aggregations done on this virtual table. Is interesting. But not all that easy. (I'm not sure the term projections is helping in any way). Keeping for example (date based) count or sums. Would make global (sum's) aggregations easy (and fairly cheap). So something is likely possible, but would need more time to design etc

@simonaubertbd
Copy link
Author

@njnes That's it :) I used the term projection because it's the one used in Vertica, I don't know if there is a better wording. And yes, I have no doubt it's pretty hard to implement, probably several months or years I guess but you're the expert here. However the impact on the final user experience is huge.

Best regards,

Simon

@mvdvm
Copy link
Contributor

mvdvm commented Mar 28, 2024

Hi Simon,

For our understanding, who is defining these projections, is it the dashboard application (Tableau) or a human (the developer of the dashboard)?
If it is a human, than this is part of tuning the query by telling the DBMS to store certain (aggregated/joined/filtered) data redundantly for faster retrieval (like a materialized view). But that only works for those specific (dashboard) queries. As soon as you change the aggregation level or a filter condition, it does not work anymore.

Also can the Vertica projections only work on columns of one table? So not on columns of joined tables?

@simonaubertbd
Copy link
Author

Hello @mvdvm and thanks for your interest. The projection is defined by the owner of the table, it can be a data engineer or an advanced business user... but yes, it's defined by a human who must have some knowledge of how the table will be used.

But that only works for those specific (dashboard) queries. As soon as you change the aggregation level or a filter condition, it does not work anymore.

No, that's not the spirit and that's why I talk about an automatic switch. Let's take again my order table example :

Let's say we have the table order
order_id, order_date, amount

And a dashboard based on this table on Tableau (or any dashboarding tool) with three objects :
-a total amount KPI
-a line chart of amount by date
-a detail straight table

You can have three projections but also one or two. If you have only the default detail projection and a projection of amount aggregated by date, the total amount KPI will be automatically calculated with the aggregated by date projection, because it's faster than with the detailed. The human define the projection with in mind the good ratio between gained time/disk space but the database system chooses the good projection for each query.

Idem if you have filtered your data with date>2020 on a projection, it will use even if you filter>2022 because it's still better than not filtered at all. you don't have to create a projection for each year or each date.

Also can the Vertica projections only work on columns of one table? So not on columns of joined tables?

No, projections are under one unique table.

Best regards,

Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants