Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXPLAIN query with TYPE IO should also return columns used. #22571

Open
jan119 opened this issue Apr 19, 2024 · 0 comments
Open

EXPLAIN query with TYPE IO should also return columns used. #22571

jan119 opened this issue Apr 19, 2024 · 0 comments

Comments

@jan119
Copy link

jan119 commented Apr 19, 2024

When you run a explain query and set type to IO, it returns the input tables, but doesn't return the columns used per table.

Expected Behavior or Use Case

CREATE TABLE user (
    name VARCHAR,
    age INT
);
EXPLAIN (TYPE IO) SELECT name FROM user;

You get the following result:

           Query Plan
---------------------------------
 {
   "inputTableColumnInfos" : [ {
     "table" : {
       "catalog" : "hive",
       "schemaTable" : {
         "schema" : "test_schema",
         "table" : "user"
       }
     },
     "columnConstraints" : [ ]
   } ]
 }

The expected result:

           Query Plan
---------------------------------
 {
   "inputTableColumnInfos" : [ {
     "table" : {
       "catalog" : "hive",
       "schemaTable" : {
         "schema" : "test_schema",
         "table" : "user"
       }
     },
     "columnConstraints" : [ ],
     "columns" : [ {
       "columnName" : "name",
       "type" : "varchar",
       "domain" : {
          ...
       }
     } ]
   } ]
 }

Presto Component, Service, or Connector

query plan

Possible Implementation

I think the query plan already does this computation since the other formats expose the columns but hard to parse and varies for connectors.

Example Screenshots (if appropriate):

Context

We are proving audit logs for our customers to find which scientists in their group are querying for sensitive data which are stored at the column level. Based on the column names we would provide logs which then they'd need to submit for compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant