JDBC Connector


JDBC Connector

Parent document: Connectors

The JDBC Connector directly connects to the database through JDBC, and imports data into other storages or imports other stored data into the database in a batch manner. JDBC connectors internally read from slaves to minimize the impact on DB.

Currently, supports reading and writing three kinds of data sources including MySQL, Oracle, PgSQL, SqlServer.

Supported data types

Supported by MySQL

  • bit
  • tinyint
  • tinyint unsigned
  • smallint
  • smallint unsigned
  • mediumint
  • mediumint unsigned
  • enum
  • int
  • int unsigned
  • bigint
  • bigint unsigned
  • timestamp
  • datetime
  • float
  • float unsigned
  • double
  • double unsigned
  • decimal
  • decimal unsigned
  • real
  • date
  • time
  • year
  • char
  • varchar
  • longvarchar
  • nvar
  • nvarchar
  • longnvarchar
  • tinytext
  • mediumtext
  • longtext
  • text
  • string
  • json
  • blob
  • mediumblob
  • tinyblob
  • longblob
  • binary
  • longvarbinary
  • varbinary
  • geometry
  • multipolygon
  • set

Supported by Oracle

  • char
  • varchar
  • interval day
  • interval year
  • intervalds
  • intervalym
  • varchar2
  • nchar
  • nvarchar2
  • long
  • blob
  • clob
  • nclob
  • string
  • character
  • number
  • integer
  • int
  • smallint
  • float
  • double
  • double precision
  • numeric
  • decimal
  • real
  • bit
  • bool
  • date
  • timestamp
  • timestamp with time zone
  • timestamp with local time zone
  • datetime
  • blob
  • bfile
  • raw
  • long raw
  • rowid
  • urowid
  • xmltype
  • binary_float
  • binary_double

Supported by PgSQL

  • char
  • bpchar
  • varchar
  • text
  • character varying
  • string
  • character
  • bigint
  • int8
  • integer
  • int
  • int4
  • smallserial
  • serial
  • bigserial
  • smallint
  • int2
  • double
  • money
  • double precision
  • float8
  • numeric
  • decimal
  • real
  • float
  • float4
  • boolean
  • bool
  • date
  • time
  • timetz
  • timestamp
  • timestamptz
  • bytea
  • bit
  • bit varying
  • varbit
  • uuid
  • cidr
  • xml
  • inet
  • macaddr
  • enum
  • json
  • jsonb
  • aclitem
  • _aclitem
  • _int2
  • _int4
  • _float4
  • _text
  • _char
  • cid
  • inet
  • int2vector
  • interval
  • oid
  • _oid
  • pg_node_tree

Supported by SqlServer

  • char
  • varchar
  • text
  • nchar
  • nvarchar
  • ntext
  • bigint
  • int
  • int identity
  • integer
  • smallint
  • tinyint
  • float
  • double precision
  • numeric
  • decimal
  • money
  • real
  • bit
  • date
  • timestamp
  • datetime
  • datetime2
  • time
  • binary
  • varbinary
  • image
  • datetimeoffset
  • smalldatetime
  • sql_variant
  • uniqueidentifier
  • xml

JDBC Source

Please add property `permitMysqlScheme in connection url when use MySQL.

Main function

  • Supports multiple sharding algorithms
  • Support the reading of sub-database and sub-table database
  • Support table synchronization and SQL synchronization
  • Support filter statement

Parameters

General parameters

Param nameDefault valueRequiredParameter typeRecommended value / Example valueDescription
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.JDBCInputFormatReader class name for mysql
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.OracleInputFormatReader class name for Oracle
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.PostgresqlInputFormatReader class name for Pgsql
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.SqlServerInputFormatReader class name for SqlServer
columns-Nolist<map<string,string>>"[ { "name":"id", "type":"int" }, { "name":"name", "type":"varchar" } ]Describing fields' names and types. It needs to be consistent with the number of columns specified by the writer.

Database connection configuration

Param nameDefault valueRequiredParameter typeRecommended value / Example valueDescription
user_name-YesstringabcJDBC username
password-YesstringpasswordJDBC password
query_timeout_seconds300Noint300JDBC connection timeout (s)
query_retry_times3Noint3Max retry times for JDBC query
connections-Yes[ { "slaves": [ {"db_url": "jdbc:mysql://address=(protocol=tcp)(host=192.168.1.202)(port=3306)/test?permitMysqlScheme&rewriteBatchedStatements=true&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&jdbcCompliantTruncation=false"} ]} ]JDBC connection urls

Table synchronization configuration parameters

Param nameDefault valueRequiredParameter typeRecommended value / Example valueDescription
db_name-YesstringdbJDBC connection database name
table_schema-NostringschemaJDBC connection schema name, usually only used for PgSql
table_name-Necessary if using table synchronizationstringtableTable to read
split_pk-Necessary if using table synchronizationstringidThe primary key used by the shard
split_pk_jdbc_typeintNostringInt/StringShard key field type, supports numeric and string types
shard_split_modeaccurateNostringquick, accurate, parallelismSplitting mode
accurate: ensure that only reader_fetch_size if pulled from table in each request.
parallelism: Splitting all data according to the reader parallelism num. The splitting will be fast, but may be nonuniform.
reader_fetch_size10000Noint10000Number of data pulled each time

SQL Synchronization Configuration Parameters

Param nameDefault valueRequiredParameter typeRecommended value / Example valueDescription
customized_sql-Necessary if using SQL SynchronizationstringSelect id,name from xx.xx where id > 10Custom SQL Statement for pulling data from table

Other parameters

Param nameDefault valueRequiredParameter typeRecommended value / Example valueDescription
filter-Nostringid>100Filter conditions when pulling data. Will be placed after the query statement through the where statement

JDBC Sink

Please add property `permitMysqlScheme in connection url when use MySQL.

Main function

  • Supports TTL.
    • Before executing the import task, expired data will be deleted according to the TTL parameter configured by the user.
    • The default TTL is 0, that is, the data is permanently valid.
  • Supports multiple write modes: clear write mode and overwrite write mode
    • Clear write: A time partition field is required. When writing, if the time partition already exists, clear the existing time partition data, and then write.
    • Overwrite write: No time partition field is required. When writing, the data is not cleared. According to the unique key upsert, the old data is overwritten with the new data. When a duplicate key appears in the write, the on duplicate key update operation will be performed to update the field. In addition, note that sharding and sharding do not support updating shards. You need to configure the job.writer.shard_key parameter. The value is sharding. Multiple shards are separated by ','.

Parameters

General parameters

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.JDBCOutputFormatWriter class name for mysql
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.OracleOutputFormatWriter class name for Oracle
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.PostgresqlOutputFormatWriter class name for Pgsql
class-Yesstringcom.bytedance.bitsail.connector.legacy.jdbc.source.SqlServerOutputFormatWriter class name for SqlServer
columns-Yeslist<map<string,string>>"[ { "name":"id", "type":"int" }, { "name":"name", "type":"varchar" } ]Describing fields' names and types. It needs to be consistent with the number of columns specified by the reader.

Database connection configuration

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
user_name-YesstringabcJDBC username
password-YesstringpasswordJDBC password
connections-Yes[ { "db_url": "jdbc:mysql://address=(protocol=tcp)(host=192.168.1.202)(port=3306)/test?rewriteBatchedStatements=true&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&jdbcCompliantTruncation=false" } ]JDBC connection urls
db_name-YesstringdbDatabase to connect
table_schema"public" for PgSql
"dbo" for Sqlserver
NostringschemaSchema to connect,usually used only in PgSql
table_name-YesstringtableTable to write

Write Mode Configuration Parameter

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
write_modeinsertNostringinsertInsert Write mode. In order to ensure the consistency of repeated execution results, data is cleared according to the partition column before writing. The resulting write statement is similar to INSERT INTO xx (xx) VALUES (xx)
write_modeoverwriteOverwrite write mode. Data is not cleared before writing. The resulting write statement looks like INSERT INTO xx (xx) VALUES (xx) ON DUPLICATE KEY UPDATE (xx) VALUES(xx)

In insert mode, data will be deleted according to partition information. The following parameters are for insert mode:

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
partition_name-YesstringdatePartition name, this is a logical concept, meaning the data of partition value will be deleted according to this field before writing data.
partition_value-Yesstring20220727Partition value
partition_pattern_format-NostringyyyyMMdd/yyyy-MM-ddPartition Field format
mysql_data_ttl0Noint0The number of days that data is kept in database. The delete operation will be performed according to the value of the configured ddl and partition_name fields.
For example, if ttl is set to 3, partition name is date, and partition value is set to 20220727, all data with date<=20220724 in the database will be deleted.
delete_threshold10000Noint10000When deleting, the number of pieces of data deleted each time
delete_interval_ms100Noint100Interval between deletes

Parameters for batch write

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
write_batch_interval100Noint100write batch interval
write_retry_times3Noint3max retry time when writing
retry_interval_seconds10Noint10retry interval (s)

Other parameters

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
pre_query-NostringSelect 1The first statement to execute after connecting to the database
verify_query-NostringSelect 1Validation statement to be executed after the task runs
shard_key-Necessary for sharded databasestringidThe shard key of the sharded database, no need to configure the non-sharded database

Parameters for PgSql

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
primary_key-NostringidThe primary key of the table, if you need to limit the rate when pgSQL deletes, you need to use the primary key value to use the select limit statement to limit the delete rate
upsert_key-NostringidUnique index, supports overwriting, PG only supports overwriting for a single unique index
delete_threshold_enabledTRUENostringTruefalseWhether to limit the deletion rate, the default is true, when false, you do not need to provide the primary key
is_truncate_modeFALSENostringTruefalseWhether it is truncate mode, true will delete the whole table first and no partition column is required; non-truncate mode requires a partition column

Parameters for Oracle

Param nameDefault valueIs necessaryParameter typeRecommended value / Example valueDescription
primary_key-Yes in Sinkstring (case sensitive)IDThe primary key of the table, if you need to limit the rate when Oracle deletes, you need to use the primary key value to use the select limit statement to limit the delete rate
partition_name-Yesstring (case sensitive)DATETIMESame as general parameters except value is case sensitive.
db_name-Yesstring (case sensitive)DBSame as general parameters except value is case sensitive.
columns.nameopen in new window-Nostring (case sensitive)COLUMNSame as general parameters except value is case sensitive.

Configuration examples: JDBC connector example