Assert connector


Assert connector

Parent document: Connectors

BitSail Assert connector can validate data against user-defined rules. The main function points are as follows:

  • Support multiple custom check rules

Maven dependency

<dependency>
    <groupId>com.bytedance.bitsail</groupId>
    <artifactId>connector-assert</artifactId>
    <version>${revision}</version>
    <scope>provided</scope>
</dependency>

Supported data types

  • Basic data types supported:
    • Integer type:
      • tinyint
      • smallint
      • int
      • bigint
    • Float type:
      • float
      • double
      • decimal
    • Time type:
      • timestamp
      • date
    • String type:
      • string
      • varchar
      • char
    • Bool type:
      • boolean

Parameters

The following mentioned parameters should be added to job.writer block when using, for example:

{
  "job": {
    "writer": {
      "class": "com.bytedance.bitsail.connector.assertion.sink.AssertSink",
      "columns": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "price",
          "type": "double"
        }
      ],
      "row_rules": {
        "min_row": 10,
        "max_row": 20
      },
      "column_rules": {
        "name": {
          "not_null": true,
          "min_len": 1,
          "max_len": 1000
        },
        "price": {
          "not_null": true,
          "min": 2,
          "max": 180
        }
      }
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classyesAssert writer's class name, com.bytedance.bitsail.connector.assertion.sink.AssertSink
columnsyesThe name and type of columns to write

Optional parameters

Param nameRequiredOptional valueDescription
writer_parallelism_numnoWriter parallelism num
row_rulesnoCustom row check rule
column_rulesnoCustom column check rule

Check rules

RuleDescriptionParameter Type
min_rowThe minimum number of rowsint
max_rowThe maximum number of rowsint
not_nullThe value can't be nullboolean
minThe minimum value of datadouble
maxThe maximum value of datadouble
min_lenThe minimum string length of a string dataint
max_lenThe maximum string length of a string dataint

Descriptions

  • If row_rules is declared, the parallelism of Assert Sink will be forced to 1 and the custom writer_parallelism_num parameter value will be disabled.

Configuration examples: Assert connector example