Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mixed Hive Table can't Sync Hive data properly #1880

Open
1 task done
nicochen opened this issue Aug 23, 2023 · 2 comments · May be fixed by #1881
Open
1 task done

[Bug]: Mixed Hive Table can't Sync Hive data properly #1880

nicochen opened this issue Aug 23, 2023 · 2 comments · May be fixed by #1881
Labels
type:bug Something isn't working

Comments

@nicochen
Copy link
Contributor

What happened?

image
As you can see ArcticTableFlag will be set to true when a Hive partition has previously written any data through Amoro.
But when I delete the data from this partition, write it again with Hive and try to synchronize it to the Mixed Hive table, the files cannot be added to the Mixed Hive table with this “if” logic, because there is no data in this partition of the Mixed Hive table so filesMap.get(partitionData) == null at the same time
ArcticTableFlag exists because the Hive partition has not been deleted and data has been written to it.
So I think there is a problem with this logic.

Affects Versions

master

What engines are you seeing the problem on?

Core

How to reproduce

  1. Create a Mixed Hive Table with partition
  2. Insert overwrite some data
  3. Delete the data insert overwrite before
  4. Insert into the same data with Hive
  5. Use HiveDataSync to sync step4's data to Mixed Hive Table

Relevant log output

No response

Anything else

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@nicochen nicochen added the type:bug Something isn't working label Aug 23, 2023
@nicochen nicochen linked a pull request Aug 23, 2023 that will close this issue
3 tasks
@zhoujinsong
Copy link
Contributor

@nicochen Thanks for reporting this issue.

AFAIK, the reason why it is necessary to check whether a Hive partition has an Arctic Flag during the process of synchronizing Hive data when a new Hive partition is detected is:

  • If the deletion of a partition on a Mixed-Hive table results in a successful submission to Iceberg but a failed submission to Hive, AMS will detect and delete the corresponding data in Hive.
  • The If logic here is to distinguish between two scenarios.

Based on this, when deleting data under a Hive partition, we may need to delete the ARCTIC FLAG in the Partition meta in HMS.

@czy006
Copy link
Contributor

czy006 commented Apr 15, 2024

@nicochen Do you still have this problem after using the new version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants