Skip to content

Latest commit

 

History

History
166 lines (113 loc) · 6.55 KB

README.md

File metadata and controls

166 lines (113 loc) · 6.55 KB

PostgresWriter Plugin Documentation

Quick Introduction

The PostgresWriter plugin enables writing data to Postgres/Greenplum databases. Under the hood, PostgresWriter connects to remote Postgres/Greenplum databases using github.com/lib/pq and database/sql, executing corresponding SQL statements to write data into the Postgres/Greenplum database.

Implementation Principles

PostgresWriter connects to remote Postgres/Greenplum databases via github.com/lib/pq. It generates SQL statements for writing based on user-provided configuration information and go-etl's custom data types from the Reader. These statements are then sent to the remote Postgres/Greenplum database for execution.

Postgres/Greenplum implements specific queries by utilizing the query process defined in dbmswriter, calling go-etl's custom storage/database DBWrapper. DBWrapper encapsulates numerous interfaces from database/sql and abstracts the database dialect, Dialect. For Postgres/Greenplum, it adopts the Dialect implemented in storage/database/postgres.

Based on your configured writeMode, it generates either:

  • insert into... (which may fail to insert conflicting rows in case of primary key/unique index conflicts)

or

  • copy in ... which behaves similarly to insert into but offers faster performance. For optimal performance, data is buffered in memory and written only when the memory reaches a predefined threshold.

Functionality Description

Configuration Example

Configuring a job to synchronously write data to a Postgres/Greenplum database:

{
    "job":{
        "content":[
            {
               "writer":{
                    "name": "postgreswriter",
                    "parameter": {
                        "username": "postgres",
                        "password": "123456",
                        "writeMode": "insert",
                        "column": ["*"],
                        "session": [],
                        "preSql": [],
                        "connection":  {
                                "url": "postgres://192.168.15.130:5432/postgres?sslmode=disable",
                                "table": {
                                    "schema":"destination",
                                    "name":"type_table"
                                }
                         },
                        "preSql": ["create table a like b"],
                        "postSql": ["drop table a"],
                        "batchTimeout": "1s",
                        "batchSize":1000
                    }
               }
            }
        ]
    }
}

Parameter Description

url

  • Description: Primarily used to configure the connection information for the remote end. The basic format is: postgres://ip:port/db, where ip:port represents the IP address and port of the Postgres database, and db indicates the default database to connect to. It is similar to the connection configuration information of pq, except that the username and password are extracted from the connection configuration for easier encryption in the future. Unlike pq, read/write timeouts can be configured using readTimeout/writeTimeout in the same format as batchTimeout.
  • Required: Yes
  • Default: None

username

  • Description: Used to configure the username for the Postgres database.
  • Required: Yes
  • Default: None

password

  • Description: Used to configure the password for the Postgres database.
  • Required: Yes
  • Default: None

name

Describes the Postgres table information.

schema
  • Description: Primarily used to configure the schema name of the Postgres table.
  • Required: Yes
  • Default: None
table
  • Description: Primarily used to configure the table name of the Postgres table.
  • Required: Yes
  • Default: None

column

  • Description: A set of column names from the configured table that need to be synchronized, described using a JSON array. Users can use * to indicate that all columns should be used by default, for example, ["*"].

    Supports column pruning, allowing only selected columns to be exported.

    Supports column reordering, meaning columns can be exported in an order different from the table schema.

    Supports constant configuration. Users need to follow the PostgreSQL syntax format: ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] where id is a regular column name, 'hello'::varchar is a string constant, true is a boolean value, 2.5 is a floating-point number, and power(2,3) is a function.

  • Required: Yes

  • Default: None

writeMode

  • Description: Write mode. "insert" represents writing data using the insert into method, while "copyIn" represents writing data using the copy in method.
  • Required: No
  • Default: insert

batchTimeout

  • Description: Primarily used to configure the timeout interval for each batch write operation. The format is: number + unit, where the unit can be s for seconds, ms for milliseconds, or us for microseconds. If the specified time interval is exceeded, the data will be written directly. This parameter, along with batchSize, can be adjusted for optimal write performance.
  • Required: No
  • Default: 1s

batchSize

  • Description: Primarily used to configure the size of each batch write operation. If the specified size is exceeded, the data will be written directly. This parameter, along with batchTimeout, can be adjusted for optimal write performance.
  • Required: No
  • Default: 1000

preSql

  • Description: Primarily used for SQL statement groups executed before writing data. Do not use select statements as they will result in an error.
  • Required: No
  • Default: None

postSql

  • Description: Primarily used for SQL statement groups executed after writing data. Do not use select statements as they will result in an error.
  • Required: No
  • Default: None

Type Conversion

Currently, PostgresWriter supports most Postgres types, but there may be some individual types that are not supported. Please check your types accordingly.

Below is a conversion table for PostgresWriter with regards to Postgres types:

go-etl Type Postgres Data Type
bool boolean
bigInt bigint, bigserial, integer, smallint, serial, smallserial
decimal double precision, decimal, numeric, real
string varchar, text
time date, time, timestamp
bytes char

Performance Report

Pending testing.

Constraints and Limitations

Database Encoding Issues

Currently, only the utf8 character set is supported.

FAQ

(Frequently Asked Questions section to be added if applicable.)