Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-2811][Bug][FLINK]: flink unified catalog doesn't support mixed_hive #2814

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Aireed
Copy link
Contributor

@Aireed Aireed commented May 7, 2024

Why are the changes needed?

Close #2811 .

Brief change log

create table if not exists xxxx (
....
);

A unifiedCatalog which metastore is hms.

  • when create a mixed-hive table:

    • If there is a table in mixed-hive format with the same name, the program can execute correctly with no error thrown and the table is not overwrited;
    • If there is a table in hive format with the same name, the program will throw an error,
    • If there is no table with the same name, a mixed-hive table will be created.
  • when create a iceberg table

    • if there is a table in iceberg format with the same name, the program can execute correctly with no error ,and the table is not overwrited;
    • If there is no table with the same name, a iceberg table will be created.
  • when create a mixed-iceberg table

    • If there is a table in iceberg format with the same name, the program can execute correctly with no error thrown and the table is not overwrited;
    • If there is a table in hive format with the same name, the program should thrown an error to indicate the table does exist;
    • If there is no table with the same name, an iceberg table will be created.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added module:mixed-flink Flink moduel for Mixed Format module:mixed-hive Hive moduel for Mixed Format labels May 7, 2024
@Aireed
Copy link
Contributor Author

Aireed commented May 7, 2024

cc @YesOrNo828 @xieyi888 PTAL.

Comment on lines 229 to 244
if (!configuration.contains(TABLE_FORMAT)) {
// if user doesn't specify the table format, we try to load the table and get the table
// format.
try {
AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
LOG.warn(
"We can't load table {}.{} through unifiedCatalog, use default format {}",
tablePath.getDatabaseName(),
tablePath.getObjectName(),
format.name());
}
}
final TableFormat catalogFormat = format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to create a table. How to load the table before creating it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to create a table. How to load the table before creating it?

  public void createTable(ObjectPath tablePath, CatalogBaseTable table, boolean ignoreIfExists)

In the Flink SQL scenario, when users are writing SQL, this situation arises when they use "create table if not exists" to define a table that already exists. The parameter "ignoreIfExists" is used to ignore the existence of the table.

AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
Copy link
Contributor

@xieyi888 xieyi888 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

catch (org.apache.amoro.NoSuchTableException e) 

would be better

Comment on lines 229 to 247
if (!configuration.contains(TABLE_FORMAT)) {
// if user doesn't specify the table format, we try to load the table and get the table
// format.
try {
AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
LOG.warn(
"We can't load table {}.{} through unifiedCatalog, use default format {}",
tablePath.getDatabaseName(),
tablePath.getObjectName(),
format.name());
}
}
final TableFormat catalogFormat = format;
AbstractCatalog catalog =
getOriginalCatalog(format).orElseGet(() -> createOriginalCatalog(tableIdentifier, format));
getOriginalCatalog(format)
.orElseGet(() -> createOriginalCatalog(tableIdentifier, catalogFormat));
Copy link
Contributor

@xieyi888 xieyi888 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If table exists, we load table and get tableFormat for it, Otherwise, we use default tableFormat to create table, which is MIXED_ICEBERG.

Did we should force user to add 'table.format' option in create table statement here?

Copy link
Contributor Author

@Aireed Aireed May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we should force user to add 'table.format' option in create table statement?

Good idea. If the table cannot be loaded via unifiedCatalog and no format is specified, we should report an error to prompt the user to specify the format. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could support a property of catalog which define a defult table format when creating a table without specific the format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could support a property of catalog which define a defult table format when creating a table without specific the format.

IMO, Configuring the format on both the table and the catalog at the same time can easily mislead users. The "Unified Catalog" is designed to allow users not to worry about the underlying format (except when it is created).

Let's listen to some other opinions. WDYT @YesOrNo828 @xieyi888

@@ -221,8 +226,25 @@ public void createTable(ObjectPath tablePath, CatalogBaseTable table, boolean ig
TableIdentifier tableIdentifier =
TableIdentifier.of(
unifiedCatalog.name(), tablePath.getDatabaseName(), tablePath.getObjectName());
if (!configuration.contains(TABLE_FORMAT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!configuration.contains(TABLE_FORMAT)) {
if (!configuration.contains(TABLE_FORMAT) && ignoreIfExists && unfiiedCatalog.tableExists()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when the user applies the "create table xxx()" statement to create a table, it is still necessary to validate the format of the table. So there is no need to add additional conditions.

Comment on lines 229 to 247
if (!configuration.contains(TABLE_FORMAT)) {
// if user doesn't specify the table format, we try to load the table and get the table
// format.
try {
AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
LOG.warn(
"We can't load table {}.{} through unifiedCatalog, use default format {}",
tablePath.getDatabaseName(),
tablePath.getObjectName(),
format.name());
}
}
final TableFormat catalogFormat = format;
AbstractCatalog catalog =
getOriginalCatalog(format).orElseGet(() -> createOriginalCatalog(tableIdentifier, format));
getOriginalCatalog(format)
.orElseGet(() -> createOriginalCatalog(tableIdentifier, catalogFormat));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could support a property of catalog which define a defult table format when creating a table without specific the format.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 27.47%. Comparing base (a2753b6) to head (c17965a).
Report is 8 commits behind head on master.

Files Patch % Lines
...rg/apache/amoro/hive/catalog/MixedHiveCatalog.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2814      +/-   ##
============================================
- Coverage     35.95%   27.47%   -8.48%     
- Complexity      383     2483    +2100     
============================================
  Files            45      360     +315     
  Lines          4700    37210   +32510     
  Branches        513     5434    +4921     
============================================
+ Hits           1690    10225    +8535     
- Misses         2842    26026   +23184     
- Partials        168      959     +791     
Flag Coverage Δ
trino 27.47% <0.00%> (-8.48%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-flink Flink moduel for Mixed Format module:mixed-hive Hive moduel for Mixed Format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][FLINK]: flink unified catalog doesn't support mixed_hive
4 participants