Hive connector


Hive connector

Parent document: Connectors

The BitSail hive connector supports reading and writing to hive tables. The main function points are as follows:

  • Support reading partitioned and non-partitioned tables
  • Support writing to partition table
  • Support reading and writing hive tables in multiple formats, such as parquet, orc and text

Supported hive versions

  • 1.2
    • 1.2.0
    • 1.2.1
    • 1.2.2
  • 2.0
    • 2.0.0
    • 2.1.0
    • 2.1.1
    • 2.3.0
    • 2.3.9
  • 3.0
    • 3.1.0
    • 3.1.2

Maven dependency

<dependency>
   <groupId>com.bytedance.bitsail</groupId>
   <artifactId>bitsail-connector-hive</artifactId>
   <version>${revision}</version>
</dependency>

Hive reader

Supported data types

  • Basic Data types:
    • Integer type:
      • tinyint
      • smallint
      • int
      • bigint
    • Float type:
      • float
      • double
      • decimal
    • Time type:
      • timestamp
      • date
    • String type:
      • string
      • varchar
      • char
    • Bool type:
      • boolean
    • Binary type:
      • binary
  • Composited data types:
    • map
    • array

Parameters

The following mentioned parameters should be added to job.reader block when using, for example:

{
  "job": {
    "reader": {
      "db_name": "test_database",
      "table_name": "test_table"
    }
  }
}

Necessary parameters

Param nameRequiredOptional valueDescription
classYesHive read connector class name, com.bytedance.bitsail.connector.legacy.hive.source.HiveInputFormat
db_nameYeshive database name
table_nameYeshive table name
metastore_propertiesYesMetastore property string in standard json format

Optional parameters

Param nameRequiredOptional valueDescription
partitionNoThe hive partition to read. If not set, read the entire hive table
columnsNoDescribing fields' names and types. If not set, get it from metastore
reader_parallelism_numNoRead parallelism num

Hive writer

Supported data type

  • Basic data types supported:
    • Integer type:
      • tinyint
      • smallint
      • int
      • bigint
    • Float type:
      • float
      • double
      • decimal
    • Time type:
      • timestamp
      • date
    • String type:
      • string
      • varchar
      • char
    • Bool type:
      • boolean
    • Binary type:
      • binary
  • Composited data types supported:
    • map
    • array

Parameters

The following mentioned parameters should be added to job.writer block when using, for example:

{
  "job": {
    "writer": {
      "db_name": "test_database",
      "table_name": "test_table"
    }
  }
}

Necessary parameters

Param nameIs necessaryOptional valueDescription
db_nameYeshive database name
table_nameYeshive table name
partitionYeshive partition to write
columnsNoDescribing fields' names and types. If not set, get it from metastore

Optional parameters

Param nameIs necessaryOptional valueDescription
date_to_string_as_longfalseWhether to convert date data to integer timestamp
null_string_as_nullfalseWhether to convert null data to null. If false, convert it to an empty string ""
date_precisionsecondsecond
millisecond
Precision when converting date data to integer timestamps
convert_error_column_as_nullfalseWhether to set the conversion error field to null. If false, throw an exception if the conversion fails
hive_parquet_compressiongzipWhen the hive file is in parquet format, specify the compression method of the file

Configuration examples: Hive connector example