MongoDB connector


MongoDB connector

Parent document: Connectors

BitSail MongoDB connector supports reading and writing MongoDB. The main function points are as follows:

  • Support batch read documents from give collection.
  • Support batch write to target collection.

Maven dependency

<dependency>
   <groupId>com.bytedance.bitsail</groupId>
   <artifactId>bitsail-connector-mongodb</artifactId>
   <version>${revision}</version>
</dependency>

MongoDB Reader

Supported data types

MongoDB parse data according to schema. The following data types are supported:

Basic data type

  • string, character
  • boolean
  • short, int, long, float, double, bigint
  • date, time, timestamp

Complex data type

  • array, list
  • map

Parameters

The following mentioned parameters should be added to job.reader block when using, for example:

{
  "job": {
    "reader": {
      "class": "com.bytedance.bitsail.connector.legacy.mongodb.source.MongoDBInputFormat",
      "host": "localhost",
      "port": 1234,
      "db_name": "test_db",
      "collection_name": "test_collection"
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classyesClass name of MongoDB reader, com.bytedance.bitsail.connector.legacy.mongodb.source.MongoDBInputFormat
db_nameYesdatabase to read
collection_nameyescollection to read
hosts_strAddress of MongoDB, multi addresses are separated by comma
hosthost of MongoDB
portport of MongoDB
split_pkyesField for splitting
  • Note, You need only set either (hosts_str) or (hosts_str). (hosts_str) has higher priority.
  • Format of hosts_str: host1:port1,host2:port2,...

Optional parameters

Param nameRequiredOptional valueDescription
reader_parallelism_numnoMongoDB reader parallelism num
user_namenouser name for authentication
passwordnopassword for authentication
auth_db_namenodb for authentication
reader_fetch_sizenoMax number of documents fetched once .Default 100000
filternoFilter for collections.

MongoDB Writer

Supported data types

MongoDB writer build a document for each record according to schema, and then insert it into collection.

Supported data types are:

Basic data type

  • undefined
  • string
  • objectid
  • date
  • timestamp
  • bindata
  • bool
  • int
  • long
  • object
  • javascript
  • regex
  • double
  • decimal

Complex data type

  • array

Parameters

The following mentioned parameters should be added to job.writer block when using, for example:

{
  "job": {
    "writer": {
      "class": "com.bytedance.bitsail.connector.legacy.mongodb.sink.MongoDBOutputFormat",
      "unique_key": "id",
      "client_mode": "url",
      "mongo_url": "mongodb://localhost:1234/test_db",
      "db_name": "test_db",
      "collection_name": "test_collection",
      "columns": [
        {
          "index": 0,
          "name": "id",
          "type": "string"
        },
        {
          "index": 1,
          "name": "string_field",
          "type": "string"
        }
      ]
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classyesClass name for MongoDB writer, com.bytedance.bitsail.connector.legacy.mongodb.sink.MongoDBOutputFormat
db_nameyesdatabase to write
collection_nameyescollection to write
client_modeyesurl
host_without_credential
host_with_credential
how to create mongo client
urlYes if client_mode=urlUrl for connecting MongoDB, like "mongodb://localhost:1234"
mongo_hosts_strAddress of MongoDb, multi addresses are separated by comma
mongo_hosthost of MongoDB
mongo_portport of MongoDB
user_nameYes if client_mode=host_with_credentialuser name for authentication
passwordYes if client_mode=host_with_credentialpassword for authentication
  • Note, when client_mode为host_without_credential or host_with_credential, you have to set either (mongo_hosts_str) or (mongo_host, mongo_port).

Optional parameters

Param nameRequiredOptional valueDescription
writer_parallelism_numNoWriter parallelism num
pre_sqlnoSql executed before inserting collections.
auth_db_namenodb name for authentication
batch_sizenoBatch write number of documents, Default 100
unique_keynoField for determining if document is unique
connect_timeout_msnoconnection timeout,default 10000 ms
max_wait_time_msnotimeout when getting connection from connection pool,default 120000 ms
socket_timeout_msnosocket timeout,default 0 (means infinity)
write_concernno0, 1, 2, 3Data writing guarantee level, default 1

Configuration examples: MongoDB connector example