Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqoop could not parse record when exporting data from MaxCompute to PostgreSQL #9

Open
giaosudau opened this issue Sep 18, 2017 · 5 comments
Assignees

Comments

@giaosudau
Copy link

Hi Ali,
I am using sqoop to export data from Maxcompute to Postgres.

./odps-sqoop/bin/sqoop export --connect jdbc:postgresql://localhost:5432/replication_db --table dim_wmp_cabinet \
    --username replication_user --password replication_pass \
    --odps-table dim_wmp_cabinet --odps-project xxx --odps-accessid xxx \
    --odps-tunnel-endpoint http://xxxx \
    --odps-partition-spec ds=20170916 \
    --odps-accesskey xxxx --odps-endpoint http://sxxx/api

I am looking into this code
OdpsExportMapper.java

try {
      odpsImpl.parse(val);
      context.write(odpsImpl, NullWritable.get());

I am success to add tunnel endpoint into source code but couldn't get through this one.

Please help fix this issue.

17/09/18 18:16:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/09/18 18:16:46 INFO mapreduce.Job: Running job: job_local873411290_0001
17/09/18 18:16:46 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/09/18 18:16:46 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter
17/09/18 18:16:46 INFO mapred.LocalJobRunner: Waiting for map tasks
17/09/18 18:16:46 INFO mapred.LocalJobRunner: Starting task: attempt_local873411290_0001_m_000000_0
17/09/18 18:16:46 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
17/09/18 18:16:46 INFO mapred.Task:  Using ResourceCalculatorProcessTree : null
17/09/18 18:16:46 INFO mapred.MapTask: Processing split: org.apache.sqoop.mapreduce.odps.OdpsExportInputFormat$OdpsExportInputSplit@6a6d595f
17/09/18 18:16:46 ERROR odps.OdpsExportMapper: Exception raised during data export
17/09/18 18:16:46 ERROR odps.OdpsExportMapper: Exception:
java.lang.RuntimeException: Can't parse input data: '3'
	at dim_wmp_cabinet.__loadFromFields(dim_wmp_cabinet.java:2090)
	at dim_wmp_cabinet.parse(dim_wmp_cabinet.java:1533)
	at org.apache.sqoop.mapreduce.odps.OdpsExportMapper.map(OdpsExportMapper.java:77)
	at org.apache.sqoop.mapreduce.odps.OdpsExportMapper.map(OdpsExportMapper.java:35)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
	at dim_wmp_cabinet.__loadFromFields(dim_wmp_cabinet.java:2010)
	... 13 more
17/09/18 18:16:46 ERROR odps.OdpsExportMapper: On input: com.aliyun.odps.data.ArrayRecord@4c665e0d
17/09/18 18:16:46 ERROR odps.OdpsExportMapper: At position 0
17/09/18 18:16:46 ERROR odps.OdpsExportMapper:
@oyz oyz self-assigned this Sep 18, 2017
@oyz
Copy link
Contributor

oyz commented Sep 18, 2017

can you provide the schemas of table 'dim_wmp_cabinet' both in postgresql and MaxCompute respectively and give some example data please ?

@oyz
Copy link
Contributor

oyz commented Sep 19, 2017

The reason of this error was found out, it's because of the null data in the table. The latest code has fixed this.

@giaosudau
Copy link
Author

giaosudau commented Sep 19, 2017

Why don't include partition field value in the result?
and why do you require partitionSpec?

@oyz
Copy link
Contributor

oyz commented Sep 19, 2017

In maxcompute ,to read a partitioned table must specify the partitionSpec.
but the result record read from a specific partition not include partition value.

for sqoop, maybe we should add an option to enable appending partition values to result record for convenience.

@giaosudau
Copy link
Author

giaosudau commented Sep 19, 2017

It should be added because we want to load all data but without partition columns, we need to add a step to update partition data into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants