Kudu connector


Kudu connector

Parent document: Connectors

BitSail Kudu connector supports reading and writing kudu tables.

Maven dependency

<dependency>
   <groupId>com.bytedance.bitsail</groupId>
   <artifactId>connector-kudu</artifactId>
   <version>${revision}</version>
</dependency>

Kudu Reader

Kudu reader us scanner to read table, supporting common Kudu data types:

  • Integer: int8, int16, int32, int64'
  • Float number: float, double, decimal
  • Bool: boolean
  • Date & Time: date, timestamp
  • String: string, varchar
  • Binary: binary, string_utf8

Parameters

The following mentioned parameters should be added to job.reader block when using, for example:

{
  "job": {
    "reader": {
      "class": "com.bytedance.bitsail.connector.kudu.source.KuduSource",
      "kudu_table_name": "kudu_test_table",
      "kudu_master_address_list": ["localhost:1234", "localhost:4321"]
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classyesKudu reader's class name, com.bytedance.bitsail.connector.kudu.source.KuduSource
kudu_table_nameyesKudu table to read
kudu_master_address_listyesKudu master addresses in list format
columnsyesThe name and type of columns to read
reader_parallelism_numnoreader parallelism
Param nameRequiredOptional valueDescription
kudu_admin_operation_timeout_msnoKudu client admin operation's timeout. Unit is ms, default 30000ms
kudu_operation_timeout_msnoKudu client operation's timeout. Unit is ms, default 30000ms
kudu_connection_negotiation_timeout_msnoUnit is ms,default 10000ms
kudu_disable_client_statisticsnoIf to enable statistics in kudu client
kudu_worker_countnoclient worker number.
sasl_protocol_namenoDefault "kudu"
require_authenticationnoIf to enable authentication.
encryption_policynoOPTIONAL
REQUIRED_REMOTE
REQUIRED
encryption polocy.
Param nameRequiredOptional valueDescription
read_modenoREAD_LATEST
READ_AT_SNAPSHOT
read mode
snapshot_timestamp_usyes if read_mode=READ_AT_SNAPSHOTspecify which snapshot to read
enable_fault_tolerantnoIf to enable fault tolerant
scan_batch_size_bytesnoMax bytes number in single batch
scan_max_countnoMax number of rows to scan
enable_cache_blocksnoIf to enable cache blocks, default false
scan_timeout_msnoscan timeout. Unit is ms, default 30000ms
scan_keep_alive_period_msno
predicatesnopredicate json string

predicates

Query predicates on columns. Unlike traditional SQL syntax, the simple query predicates are represented in a simple JSON syntax. Three types of predicates are supported, including 'Comparison', 'InList' and 'IsNull'.

  • The 'Comparison' type support <=, <, =, > and >=, which can be represented as '[operator, column_name, value]',

    e.g. '[">=", "col1", "value"]'

  • The 'InList' type can be represented as '["IN", column_name, [value1, value2, ...]]'

    e.g. '["IN", "col2", ["value1", "value2"]]'

  • The 'IsNull' type determine whether the value is NULL or not, which can be represented as '[operator, column_name]'

    e.g. '["NULL", "col1"]', or '["NOTNULL", "col2"]'

Predicates can be combined together with predicate operators using the syntax [operator, predicate, predicate, ..., predicate]. For example, ["AND", [">=", "col1", "value"], ["NOTNULL", "col2"]] The only supported predicate operator is AND.) type: string default: ""


Kudu Writer

Supported data type

Support common Kudu data types:

  • Integer: int8, int16, int32, int64'
  • Float number: float, double, decimal
  • Bool: boolean
  • Date & Time: date, timestamp
  • String: string, varchar
  • Binary: binary, string_utf8

Supported operation type

Support the following operations:

  • INSERT, INSERT_IGNORE
  • UPSERT
  • UPDATE, UPDATE_IGNORE

Parameters

The following mentioned parameters should be added to job.writer block when using, for example:

{
  "job": {
    "writer": {
      "class": "com.bytedance.bitsail.connector.kudu.sink.KuduSink",
      "kudu_table_name": "kudu_test_table",
      "kudu_master_address_list": ["localhost:1234", "localhost:4321"],
      "kudu_worker_count": 2
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classyesKudu writer's class name, com.bytedance.bitsail.connector.kudu.sink.KuduSink
kudu_table_nameyesKudu table to write
kudu_master_address_listyesKudu master addresses in list format
columnsyesThe name and type of columns to write
writer_parallelism_numnowriter parallelism
Param nameRequiredOptional valueDescription
kudu_admin_operation_timeout_msnoKudu client admin operation's timeout. Unit is ms, default 30000ms
kudu_operation_timeout_msnoKudu client operation's timeout. Unit is ms, default 30000ms
kudu_connection_negotiation_timeout_msnoUnit is ms,default 10000ms
kudu_disable_client_statisticsnoIf to enable statistics in kudu client
kudu_worker_countnoclient worker number.
sasl_protocol_namenoDefault "kudu"
require_authenticationnoIf to enable authentication.
encryption_policynoOPTIONAL
REQUIRED_REMOTE
REQUIRED
encryption polocy.
Param nameRequiredOptional valueDescription
kudu_session_flush_modenoAUTO_FLUSH_SYNC
AUTO_FLUSH_BACKGROUND
Session's flush mode. Default AUTO_FLUSH_BACKGROUND
kudu_mutation_buffer_sizenoThe number of operations that can be buffered
kudu_session_flush_intervalnosession flush interval,unit is ms
kudu_session_timeout_msnoTimeout for operations. The default timeout is 0, which disables the timeout functionality.
kudu_session_external_consistency_modenoCLIENT_PROPAGATED
COMMIT_WAIT
External consistency mode for kudu session, default CLIENT_PROPAGATED
kudu_ignore_duplicate_rowsnoWhether ignore all the row errors if they are all of the AlreadyPresent type. Throw exceptions if false. Default false.

Configuration example: Kudu connector example