Doris connector


Doris connector

Parent document: Connectors

BitSail Doris connector supports writing doris. The main function points are as follows:

  • Use StreamLoad to write doris.
  • Support firstly create and then write partition

Maven dependency

<dependency>
   <groupId>com.bytedance.bitsail</groupId>
   <artifactId>bitsail-connector-doris</artifactId>
   <version>${revision}</version>
</dependency>

Doris Writer

Supported data type

Doris writer uses stream load, and the content type can be csv or json. It supports common data type in doris:

  • CHAR
  • VARCHAR
  • TEXT
  • BOOLEAN
  • BINARY
  • VARBINARY
  • DECIMAL
  • DECIMALV2
  • INT
  • TINYINT
  • SMALLINT
  • INTEGER
  • INTERVAL_YEAR_MONTH
  • INTERVAL_DAY_TIME
  • BIGINT
  • LARGEINT
  • FLOAT
  • DOUBLE
  • DATE
  • DATETIME

Parameters

The following mentioned parameters should be added to job.writer block when using, for example:

{
  "job": {
    "writer": {
      "class": "com.bytedance.bitsail.connector.doris.sink.DorisSink",
      "db_name": "test_db",
      "table_name": "test_doris_table"
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classyesDoris writer class name, com.bytedance.bitsail.connector.doris.sink.DorisSink
fe_hostsyesDoris FE address, multi addresses separated by comma
mysql_hostsyesDoris jdbc query address , multi addresses separated by comma
useryesDoris account user
passwordyesDoris account password, can be empty
db_nameyesdatabase to write
table_nameyestable to write
partitionsYes if target table has partitiontarget partition to write
table_has_partitionYes if target table does not have partitionTrue if target table does not have partition

Notice, partitions has the following requirements:

  1. You can determine multi partitions
  2. Each partition should contain:
    1. name: name of the partition
    2. start_range, end_range: left and right range of the partition

partitions example:

{
  "partitions": [
    {
      "name": "p20220210_03",
      "start_range": [
        "2022-02-10",
        "3"
      ],
      "end_range": [
        "2022-02-10",
        "4"
      ]
    },
    {
      "name": "p20220211_03",
      "start_range": [
        "2022-02-11",
        "3"
      ],
      "end_range": [
        "2022-02-11",
        "4"
      ]
    }
  ]
}

Optional parameters

Param nameRequiredOptional valueDescription
writer_parallelism_numnoWriter parallelism num
sink_flush_interval_msnoFlush interval in upsert mode, default 5000 ms
sink_max_retriesnoMax retry times, default 3
sink_buffer_sizenoMax size of buffer, default 20971520 bytes (20MB)
sink_buffer_countnoMax number of records can be buffered, default 100000
sink_write_modenoSTREAMING_UPSERT
BATCH_UPSERT
BATCH_REPLACE
Write mode.
stream_load_propertiesnoStream load parameters that will be append to the stream load url. Format is standard json map.
load_contend_typenocsv
json
Content format of streamload, default json
csv_field_delimiternofield delimiter used in csv, default ","
csv_line_delimiternoline delimiter used in csv, default "\n"

Configuration examples: Doris connector example