Elasticsearch connector


Elasticsearch connector

Parent document: Connectors

Main function

The Elasticsearch connector can be used in stream and batch scenarios, providing the ability to write elasticsearch in 'At Least Once' mode, and providing flexible write request construction.

Supported version

  • Support Elasticsearch 7.X

Maven depedency

<dependency>
   <groupId>com.bytedance.bitsail</groupId>
   <artifactId>connector-elasticsearch</artifactId>
   <version>${revision}</version>
</dependency>

Supported data types

Basic data types supported by Elasticsearch connectors:

  • String type:
    • string
    • text
    • keyword
  • Integer type:
    • long
    • integer
    • short
    • byte
  • Float type:
    • double
    • float
    • half_float
    • scaled_float
  • Bool type:
    • boolean
  • Binary type:
    • binary
  • Date type:
    • date

Parameters

Users can add parameters to job.writer block in task configuration files.

Necessary parameters

Param nameDefault valueOptional valueDescription
class-Class name of Elasticsearch connector,com.bytedance.bitsail.connector.elasticsearch.sink.ElasticsearchSink
es_hosts-Address list for Elasticsearch handling REST requests
es_index-Elasticsearch index
columns-Describing fields' names and types

Optional parameters

General optional parameters

Param nameDefault valueOptional valueDescription
writer_parallelism_numwriter parallelism

Parameters for construct REST request

Param nameDefault valueOptional valueDescription
request_path_prefix-The path prefix used by the http client when making a request
connection_request_timeout_ms10000Timeout (ms) used by http connection manager when requesting a connection
connection_timeout_ms10000Http connection establishment timeout (ms)
socket_timeout_ms60000Socket timeout for http connection (ms)

Parameters for bulk request

Param nameDefault valueOptional valueDescription
bulk_flush_max_actions300When the number of requests reaches, execute a bulk operation
bulk_flush_max_size_mb10When the request data size (in MB) reaches, execute a bulk operation
bulk_flush_interval_ms10000How often to execute bulk operation (unit: ms)
bulk_backoff_policyEXPONENTIALCONSTANT
EXPONENTIAL
NONE
Backoff policy when bulk operation fails:
1. CONSTANT: fixed delay backoff
2. EXPONENTAIL: exponential backoff
3. NONE: no backoff
bulk_backoff_delay_ms100Failure retry delay (ms) of bulk operation
bulk_backoff_max_retry_count5The maximum number of failed retries for bulk operations

Parameters for building ActionRequests

Param nameDefault valueOptional valueDescription
es_operation_type"index""index"
"create"
"update"
"upsert"
"delete"
Type of ActionRequest
es_dynamic_index_field-Get the index name of this data to insert from this field
es_operation_type_field-Get the ActionRequest type of this data from this field
es_version_field-Get the version information of this data from this field
es_id_fields""Get the document ID from this field.
The format is ',' separated string, e.g. "1,2"
doc_exclude_fields""When creating a document, ignore these fields. The format is ',' separated string, for example: "1,2"
ignore_blank_valuefalseWhether to ignore fields with null values when creating documents
flatten_mapfalseWhether to expand the Map type data into the document when creating the document
id_delimiter#The separator used when merging multiple fields into one document id
json_serializer_features-Json features used when building json strings. The format is ',' separated string, for example: "QuoteFieldNames,UseSingleQuotes"

Configuration examples: Elasticsearch connector example