Job Configuration Guide


Job Configuration Guide

English | 简体中文


BitSail script configuration is managed by JSON structure, follow scripts show the complete structure:

{
    "job":{
        "common":{
        ...
        },
        "reader":{
        ...
        },
        "writer":{
        ...
        }
    }
}
Module NameDescription
commonIt is mainly responsible for general setting, such job metadata, plugins setting.
reader/readersIt is mainly responsible for setting relevant parameter information on the source data side. Taking the MySQL data source as an example, you need to set the JDBC connection information and the database table information of the operation under the subdomain of the reader.
writer/writersMainly responsible for setting the relevant parameters of the target data source, etc. Taking the Hive target data source as an example, you need to set the Hive metastore connection information under the writer's subdomain, and set the Hive database table and partition related information.

Common Module

Example:

{
    "job":{
        "common":{
            "user_name":"bytedance_dts",
            "instance_id":-1L,
            "job_id":-1L,
            "job_name":"",
            "min_parallelism":1,
            "max_parallelism":5,
            "parallelism_chain":false,
            "max_dirty_records_stored_num":50,
            "dirty_records_count_threshold":-1,
            "dirty_records_percentage_threshold":-1
        }
    }
}

Description:

Metadata Parameters:

Parameter nameRequiredDefaultDescriptionExample
user_nameTRUE-job's submitterbitsail
job_idTRUE-job' unique id12345
instance_idTRUE-job's instance id, maybe use in some scheduler system.12345
job_nameTRUE-job's namebitsail_conf

Parameter parallelism:

Parameter nameRequiredDefaultDescriptionExample
min_parallelismFALSE1The minimum parallelism of the job, the parallelism from automatic calculation will be greater than or equal to the minimum parallelism.2
max_parallelismFALSE512The maximum parallelism of the job, the parallelism from automatic calculation will be less than or equal to the maximum parallelism.2
parallelism_chainFALSEFALSEWhether chain the operator between operators. If this option is enabled, will select min parallelism between readers and writers as final parallelism.2

Dirty record setting:(Only in batch mode)

Parameter nameRequiredDefaultDescriptionExample
max_dirty_records_stored_numFALSE50Every task collect size for dirty record.50
dirty_records_count_thresholdFALSE-1The threshold of the total dirty records, if dirty records bigger than the threshold, job will fail in final-1
dirty_record_percentage_thresholdFALSE-1The percent threshold of the total dirty records, if dirty records percent bigger than the threshold, job will fail in final.-1

Flow control setting:

Parameter nameRequiredDefaultDescriptionExample
reader_transport_channel_speed_byteFALSE-1This param controls the traffic of a single concurrent reading, X bytes per second10
reader_transport_channel_speed_recordFALSE-1This param controls the speed of a single concurrent reading, X rows per second10
writer_transport_channel_speed_byteFALSE-1This param controls the traffic of a single concurrent writing, Y bytes per second10
writer_transport_channel_speed_recordFALSE-1This param controls the speed of a single concurrent writing, Y rows per second10

Reader Module

Examples:

{
    "job":{
        "reader":
   
            {
                "class":"com.bytedance.bitsail.connector.legacy.jdbc.source.JDBCInputFormat",
                "columns":[
                    {
                        "name":"id",
                        "type":"bigint"
                    },
                    {
                        "name":"name",
                        "type":"varchar"
                    }
                ],
                "table_name":"your table name",
                "db_name":"your database name",
                "password":"your database connection password",
                "user_name":"your database connection username",
                "split_pk":"your table primary key",
                "connections":[
                    {
                        "slaves":[
                            {
                                "port":"your connection's port",
                                "db_url":"your connection's url",
                                "host":"your connection's host"
                            }
                        ],
                        "shard_num":0,
                        "master":{
                            "port":"your connection's port",
                            "db_url":"your connection's url",
                            "host":"your connection's host"
                        }
                    }
                ]
            }
        
    }
}

Common Parameter:

Parameter nameRequiredDefaultDescriptionExample
classTRUE-Connector's class namecom.bytedance.bitsail.connector.legacy.jdbc.source.JDBCInputFormat
reader_parallelism_numFALSE-Specify the parallelism for the reader operator.2

Other parameters please check the connector

Writer Module

{
    "writer":
        {
            "class":"com.bytedance.bitsail.connector.legacy.hive.sink.HiveParquetOutputFormat",
            "db_name":"your hive database' name.",
            "table_name":"your hive database' table name.",
            "partition":"your partition which want to add.",
            "metastore_properties":"{\"hive.metastore.uris\":\"thrift://localhost:9083\"}",
            "columns":[
                {
                    "name":"id",
                    "type":"bigint"
                }
            ],
            "write_mode":"overwrite",
            "writer_parallelism_num":1
        }
}

Common Parameters:

Parameter nameRequiredDefaultDescriptionExample
classTRUE-Connector's class namecom.bytedance.bitsail.connector.legacy.hive.sink.HiveParquetOutputFormat
writer_parallelism_numFALSE-Specify Writer's parallelism, default bitsail will calculate write parallelism for the job.2

Other parameters please check the connector