sqoop1实战 数据导入
Last updated on January 17, 2025 am
🧙 Questions
☄️ Ideas
Note:
注意驱动是oracle还是mysql
Mysql数据导入
sqoop import \
-Dorg.apache.sqoop.export.text.dump_data_on_error=true \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--driver com.mysql.cj.jdbc.Driver \
--connect "jdbc:oracle:thin:@192.168.15.56:1521:orcl" \
--username c\#\#jacbomqk0721 \
--password bomtest$\6880 \
--query "select PART.MODIFIER,TO_CHAR(PART.UPDATED, 'YYYY-MM-DD'), from PART where \$CONDITIONS" \
--split-by PART.OID \
--target-dir /user/hive/warehouse/cdh_dev.db/define_part \
--delete-target-dir \
--fields-terminated-by '\001' \
--hive-drop-import-delims \
--num-mappers 2 \
--null-string '\\N' \
--null-non-string '\\N'
导入到avro类型hive表中
avro数据同步,时间类型字段
sqoop import \
-Dorg.apache.sqoop.export.text.dump_data_on_error=true \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--connect "jdbc:mysql://47.103.203.73:3306/VATtest" \
--username admin \
--password gsw921226 \
--target-dir /user/hive/warehouse/rd_dev.db/ispong_table_avro3 \
--query "select ispong_table.username username,ispong_table.age t_age,cast((unix_timestamp(ispong_table.lucky_date) / 60 / 60 / 24) as signed) lucky_date from ispong_table where \$CONDITIONS" \
--split-by ispong_table.username \
--hive-drop-import-delims \
--append \
--num-mappers 2 \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by '\001' \
--as-avrodatafile \
--map-column-java lucky_date=Integer
Oracle 同步到 Hive
CREATE TABLE rd_dev.ispong_table_avro4(
t_age int,
t_username string,
lucky_date date
)
COMMENT "just drop the schema right into the HQL"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.howdy",
"name": "some_schema",
"doc":"Sqoop import of QueryResult",
"tableName": "ispong_table_avro4",
"type": "record",
"fields": [
{"name":"T_USERNAME","type":["null","string"],"default":null,"columnName":"T_USERNAME","sqlType":"12"},
{"name":"T_AGE","type":["null","double"],"default":null,"columnName":"T_AGE","sqlType":"4"},
{"name":"LUCKY_DATE","type":["null","long"],"default":null,"columnName":"LUCKY_DATE","sqlType":"-5"}
]
}');
CREATE TABLE LEACWH_AVRO_2(
T_LEAC string,
AMNTXSCK double
)
COMMENT "just drop the schema right into the HQL"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.howdy",
"name": "some_schema",
"doc":"Sqoop import of QueryResult",
"tableName": "EPC_ZKVIN",
"type": "record",
"fields": [
{"name":"T_LEAC","type":["null","string"],"default":null,"columnName":"T_LEAC","sqlType":"12"},
{"name":"AMNTXSCK","type":["null","double"],"default":null,"columnName":"AMNTXSCK","sqlType":"4"}
]
}');
sqoop import \
-Dorg.apache.sqoop.export.text.dump_data_on_error=true \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--connect "jdbc:oracle:thin:@isxcode:51952/EE.oracle.docker" \
--username ispong \
--password ispong123 \
--target-dir /user/hive/warehouse/rd_dev.db/ispong_table_avro4 \
--query "select ispong_table.t\$username t\$username,ispong_table.t\$age t\$age,TO_NUMBER(TO_DATE(TO_CHAR(ispong_table.lucky_date, 'YYYY-MM-DD'), 'YYYY-MM-DD') - TO_DATE('1970-01-01', 'YYYY-MM-DD')) lucky_date from ispong_table where \$CONDITIONS" \
--split-by T\$USERNAME \
--hive-drop-import-delims \
--delete-target-dir \
--num-mappers 2 \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by '\001' \
--as-avrodatafile \
--map-column-java LUCKY_DATE=Integer,T\$AGE=Double
参数说明
verbose
打印更多日志
delete-target-dir
先清除hive表中的所有数据,再执行数据插入
append
将数据库中的数据全部插入
query
手写sql查询语句,注意结尾必须添加
where $CONDITIONS
必须指定split-by切分键 ,指定的参数未sql中select中的参数
当需要转换时间格式的时候,可以使用TO_CHAR(PART.UPDATED, ‘YYYY-MM-DD’),对数据做替换hive-drop-import-delims
删除无用的数据,此参数不加会出现数据同步条数变多的情况
num-mappers
任务并发数,注意当数据比较少的时候,并发数不能设置太多,否则会出现重复插入
split-by 如果使用-num-mappers>1 则需要指定列表target-dir
hive表的在hdfs上的存储位置
fields-terminated-by
字段分割符,建议使用
'\001'
null-string
如果string为null如何添加数值,建议使用
'\\N'
null-non-string
如果不是string类型的值为null如何写入数值,建议使用
'\\N'
as-avrodatafile
导入到avro类型表中
set hive.exec.dynamic.partition.mode=nonstrict;