Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自动创建Tags的Bug #3346

Closed
2 tasks done
Hi-luca-Gao opened this issue May 17, 2024 · 3 comments · Fixed by #3457
Closed
2 tasks done

自动创建Tags的Bug #3346

Hi-luca-Gao opened this issue May 17, 2024 · 3 comments · Fixed by #3457
Labels
bug Something isn't working

Comments

@Hi-luca-Gao
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

版本0.8

Compute Engine

flink

Minimal reproduce step

test表结构:
tEnv.executeSql("CREATE TABLE dim_log_2 (\n" +
" url STRING,\n" +
" ts BIGINT,\n" +
" color STRING,\n" +
" PRIMARY KEY (url) NOT ENFORCED" +
" ) WITH (\n" +
" 'merge-engine' = 'partial-update',\n" +
" 'changelog-producer' = 'input',\n" +
" 'tag.automatic-creation' = 'watermark',\n" +
" 'tag.creation-period' = 'daily',\n" +
" 'tag.num-retained-max' = '90'" +
");");
使用 WatermarkStrategy
.<>forBoundedOutOfOrderness(Duration.ofSeconds()).withTimestampAssigner(SerializableTimestampAssigner)

第一次数据处理,直接会触发paimon tryToCreateTags()方法,此时的watermark为-9223372036854775808,因为此策略watermark默认200ms下发一次,这样会导致 this.periodHandler.normalizeToPreviousTag(time);这个方法构建的返回值为+1705471-09-26,因此tagName命名成为tag-1705471-09-26,后续会因此不再有正常的例如2024-05-17这种正常事件的数据触发自动创建Tags,因为1705471-09-26这个日期过于大。

此为问题代码:
//实际是因为Timestamp.fromEpochMillis上限导致这里计算错误,但是根本问题在于watermark的处理没有考虑到第一次由event去触发创建目录的时候,watermark还没有来得及更新。
public LocalDateTime normalizeToPreviousTag(LocalDateTime time) {
long mills = Timestamp.fromLocalDateTime(time).getMillisecond();
long periodMills = this.onePeriod().toMillis();
//此处导致错误
LocalDateTime normalized = Timestamp.fromEpochMillis(mills / periodMills * periodMills).toLocalDateTime();
return normalized.minus(this.onePeriod());
}

What doesn't meet your expectations?

希望paimon越来越好,paimon是数据集成的未来,
加油!!!!
希望有机会可以提供一些向你们学习的机会。

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@Hi-luca-Gao Hi-luca-Gao added the bug Something isn't working label May 17, 2024
@JingsongLi
Copy link
Contributor

Thanks @Hi-luca-Gao for reporting. Can you use English?

I think we should ignore Long.MIN there.

@Hi-luca-Gao
Copy link
Author

You're welcome. @JingsongLi
Yes

@Hi-luca-Gao
Copy link
Author

@JingsongLi
image

I think this kind of repair will have unexpected dangers, so it should be watermark == Long.MIN.
A timestamp less than 00:00:00 on January 1, 1970 Greenwich Mean Time is a negative number.
This is a normal phenomenon。
for example:
-315585870 ===>1960-01-01T10:15:30+01:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants