Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][sftp] Ignore error records #6814

Open
3 tasks done
JevonYang opened this issue May 9, 2024 · 1 comment
Open
3 tasks done

[Feature][sftp] Ignore error records #6814

JevonYang opened this issue May 9, 2024 · 1 comment

Comments

@JevonYang
Copy link

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

In the process of importing a csv file into mysql database via sftp server, I have a certain column implemented as a numeric type. However, there is dirty data in the original csv file with string type data mixed in between, and I would like to somehow ignore the erroneous data to ensure the program continues.

Existing solutions are welcome.

The error message is shown below:

org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[COMMON-01], ErrorDescription:[SeaTunnel read file 'sftp://xxx/xxx.csv' failed.]
		at org.apache.seatunnel.common.exception.CommonError.fileOperationFailed(CommonError.java:60)
		at org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:65)
		at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:156)
		at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:116)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
		at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:121)
		at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
		at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		at java.lang.Thread.run(Thread.java:750)
	Caused by: java.lang.NumberFormatException: For input string: "-"
		at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
		at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
		at java.lang.Double.parseDouble(Double.java:538)
		at org.apache.seatunnel.format.text.TextDeserializationSchema.convert(TextDeserializationSchema.java:251)
		at org.apache.seatunnel.format.text.TextDeserializationSchema.deserialize(TextDeserializationSchema.java:152)
		at org.apache.seatunnel.format.text.TextDeserializationSchema.deserialize(TextDeserializationSchema.java:57)
		at org.apache.seatunnel.connectors.seatunnel.file.source.reader.TextReadStrategy.lambda$read$0(TextReadStrategy.java:95)
		at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
		at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
		at java.util.Iterator.forEachRemaining(Iterator.java:116)
		at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
		at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
		at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
		at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
		at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
		at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
		at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
		at org.apache.seatunnel.connectors.seatunnel.file.source.reader.TextReadStrategy.read(TextReadStrategy.java:91)
		at org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:63)
		... 11 more

Usage Scenario

Importing other databases from csv format

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@JevonYang
Copy link
Author

求助:

我在将csv文件导入mysql过程中,csv字段为数字类型的数据中间有脏数据,会有字符类型,这样就会导致入库中断。

我希望程序能够忽略错误类型,进行下一步。

现在是否已有相关方案,求大神解答,感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant