bitsail-component-format-flink-hive
bitsail-component-format-flink-hive
Parent document: bitsail-component-format-flink
This module provides HiveGeneralRowBuilder
for supportinig converting hive Writable
data into Row
.
how to use
The working principle is to first obtain the meta information of the target hive table from the hive metastore, and then convert data according to the meta information ObjectInspector
.
So we need two kinds of parameters to construct a HiveGeneralRowBuilder
:
- Parameters for getting hive meta information:
database
: hive database nametable
: hive table namehiveProperties
: Properties of hive configuration to connect to hive metastore, which is stored as a Map.
columnMapping
: The fields order of row to construct, which si stored as Map<String, Integer>. Map key is the field name, while value is the index of this field in hive table.
Example
Take the following hive table test_db.test_table
as a example:
- Thrift uri for hive metastore is:
thrift://localhost:9083
field name | field type |
---|---|
id | BIGINT |
state | STRING |
county | STRING |
So we can use the following codes to construct a HiveGeneralRowBuilder
:
Map<String, Integer> columnMapping = ImmutableMap.of(
"id", 0,
"state", 1,
"county", 2
);
RowBuilder rowBuilder = new HiveGeneralRowBuilder(
columnMapping,
"test_db",
"test_table",
ImmutableMap.of("metastore_uri", "thrift://localhost:9083")
);
How to parse writable
To parse Writable
data, one needs deserializer
and ObjectInspector
information from hive table.
HiveGeneralRowBuilder
supports getting these meta information according to the hive information (including database, table, and some other properties used to connect to the metastore) passed in by the user.
// step1. Get hive meta info from metastore.
HiveMetaClientUtil.init();
HiveConf hiveConf = HiveMetaClientUtil.getHiveConf(hiveProperties);
StorageDescriptor storageDescriptor = HiveMetaClientUtil.getTableFormat(hiveConf, db, table);
// step2. Construct deserializer.
deserializer = (Deserializer) Class.forName(storageDescriptor.getSerdeInfo().getSerializationLib()).newInstance();
SerDeUtils.initializeSerDe(deserializer, conf, properties, null);
// step3. Construct `ObjectInspector`
structObjectInspector = (StructObjectInspector) deserializer.getObjectInspector();
structFields = structObjectInspector.getAllStructFieldRefs();
How to Convert to Row
HiveGeneralRowBuilder
implements the following interface to convert hive data to a Row
:
void build(Object objectValue, Row reuse, String mandatoryEncoding, RowTypeInfo rowTypeInfo) throws BitSailException {
In this method, it has three steps:
According to "How to parse writable", it gets the
deseiralizer
andstructFields
for parsing.According to
rowTypeInfo
, it extracts fields in order fromcolumnMapping
andstructField
. Based on these two information, it can extract the raw data.According to the field type in
rowTypeInfo
, it converts the extracted raw data intocom.bytedance.bitsail.common.column.Column
, and then wraps it withorg.apache.flink.types.Row
.
Take the above mentioned hive table test_db.test_table
as an example, one can build rowTypeInfo
and columnMapping
as follows:
import com.bytedance.bitsail.flink.core.typeinfo.PrimitiveColumnTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.typeutils.RowTypeInfo;
TypeInformation<?>[] fieldTypes = new TypeInformation[] {
PrimitiveColumnTypeInfo.LONG_COLUMN_TYPE_INFO,
PrimitiveColumnTypeInfo.STRING_COLUMN_TYPE_INFO,
PrimitiveColumnTypeInfo.STRING_COLUMN_TYPE_INFO
};
RowTypeInfo rowTypeInfo = new RowTypeInfo(
fieldTypes,
new String[] {"id_field", "state_field", "county_field"}
);
Map<String, Integer> columnMapping = ImmutableMap.of(
"id_field", 0,
"state_field", 1,
"county_field", 2
);
Using the above rowTypeInfo
and columnMapping
, one can get a row
of field id
, state
, county
by calling build method.
Supported data types
HiveGeneralRowBuiler
supports parsing common hive built-in data types, including all basic data types, and two complex data types, Map and List.
We support some types of data type conversion as follows:
Hive data type | You can convert the hive data type to | Description |
---|---|---|
TINYINT SMALLINT INT BIGINT | 1. StringColumn 2. LongColumn 3. DoubleColumn | Take 1234L as an example,the converted columns are:1. StringColumn : "1234" 2. LongColumn : 1234 3. DoubleColumn : 1234.0 |
BOOLEAN | 1. StringColumn 2. BooleanColumn | Take false as an example,the converted columns are:1. StringColumn : "false" 2. BooleanColumn : false |
FLOAT DOUBLE DECIMAL | 1. StringColumn 2. DoubleColumn | Take 3.141592 as an example,the converted columns are:1. StringColumn : "3.141592" 2. DoubleColumn : 3.141592 |
STRING CHAR VARCHAR | 1. StringColumn 2. LongColumn 3. DoubleColumn 4. BooleanColumn 5. DateColumn | 1. LongColumn : Use BigDecimal to convert string to integer.2. DoubleColumn : Use Double.parseDouble to convert string to float number3. BooleanColumn : Only recognize"0", "1", "true", "false" |
BINARY | 1. StringColumn 2. BytesColumn | Take byte[]{1, 2, 3} as an example,the converted columns are:1. StringColumn : "[B@1d29cf23" 2. BytesColumn : AQID |
TIMESTAMP | 1. StringColumn 2. LongColumn | Take 2022-01-01 10:00:00 as an example,the converted columns are:1. StringColumn : "2022-01-01 10:00:00" 2. LongColumn : 1641002400 |
DATE | 1. StringColumn 2. DateColumn 3. LongColumn | Take 2022-01-01 as example,the converted columns are:1. StringColumn : "2022-01-01" 2. DateColumn : 2022-01-01 3. LongColumn : 1640966400 |
Example
Take the above mentioned hive table test_db.test_table
as an example,the following codes show how to convert the Writable
data in thie hive table to Row
.
field name | field type |
---|---|
id | BIGINT |
state | STRING |
county | STRING |
- Thrift uri of metastore:
thrift://localhost:9083
/**
* @param rawData Writable data for building a row.
*/
public Row buildRow(Writable rawData) {
// 1. Initialize hive row builder
String database = "test_db";
String table = "test_table";
Map<String, Integer> columnMapping = ImmutableMap.of(
"id", 0,
"state", 1,
"county", 2
);
Map<String, String> hiveProperties = ImmutableMap.of(
"metastore_uri", "thrift://localhost:9083"
);
RowBuilder rowBuilde = new HiveGeneralRowBuilder(
columnMapping, database, table, hiveProperties
);
// 2. Construct row type infomation.
TypeInformation<?>[] typeInformationList = {
PrimitiveColumnTypeInfo.LONG_COLUMN_TYPE_INFO,
PrimitiveColumnTypeInfo.STRING_COLUMN_TYPE_INFO,
PrimitiveColumnTypeInfo.STRING_COLUMN_TYPE_INFO
};
RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformationList,
new String[] {"id", "state", "county"}
);
// 3. Parse rawData and build row.
Row reuse = new Row(3);
rowBuilder.build(rawData, reuse, "UTF-8", rowTypeInfo);
return reuse;
}