Skip to content
This repository has been archived by the owner on Jul 9, 2021. It is now read-only.

Changes for Postgresql Copy Export: #68

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from
Open

Changes for Postgresql Copy Export: #68

wants to merge 3 commits into from

Conversation

RobertBHamilton
Copy link

  1. Support for empty null string --null-string ''
  2. Support for non-xml delim --fields-terminated-by'\0x1c'
  3. Added line buffering perf --batch
  4. optional TEXT mode instead of CSV -Dpostgresql.format.text=true
  5. Support for postgres Version 8x -Dpostgresql.targetdb.ver=8
  6. Optional disable escape sequences -Dpostgresql.input.israw=true

 1. Support for empty null string        --null-string ''
 2. Support for non-xml delim            --fields-terminated-by'\0x1c'
 3. Added line buffering perf            --batch
 4. optional TEXT mode instead of CSV    -Dpostgresql.format.text=true
 5. Support for postgres Version 8       -Dpostgresql.targetdb.ver=8
 6. Optional disable escape sequences    -Dpostgresql.input.israw=true
@RobertBHamilton
Copy link
Author

This is in process. These changes were motivated by a project at GM to move significant data sets to Greenplum. Because Greenplum is currently still in version 8x of postgress we added support for the 8x syntax of the copy command. We also noticed that implementing line buffering in the mapper would significantly enhance performance. Also we conventionally use a char for field delim which happens to be invalid XML char so we added support for non-xml delims with direct mode.
TODO: put the same changes to the 9.x COPY syntax.

Robert B Hamilton added 2 commits January 21, 2019 22:18
1. moved LineBuffer to inner class of the Export Mapper
2. extended support to 9.x direct copy
3. Added test case
@szvasas
Copy link
Contributor

szvasas commented Feb 1, 2019

Hi Robert,

Thanks for sharing these improvements!
As you described in your email it seems that there are a few different improvements covered in this PR so I would like to ask you to split these changes into separate patches and raise separate Sqoop issues for them on JIRA (https://issues.apache.org/jira/projects/SQOOP/issues).

I haven't done an in-depth review yet but I noticed that the indentation and the formatting you used is quite different from the surrounding code so please try to follow those conventions, e.g.:

  • Add spaces before and after '=' character (e.g. bufferMode = false instead of bufferMode=false)

  • The content of a block should go in a new line, e.g.

public void clear() {
  sb.setLength(0);
}

instead of
public void clear(){ sb.setLength(0); }

Apart from this I have seen a few commented out code lines, please remove those as well.

Regards,
Szabolcs

@Fokko
Copy link
Contributor

Fokko commented Mar 9, 2019

Rebasing onto master will fix the CI again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants