sqoop1实战 数据导入

Last updated on January 17, 2025 am

🧙 Questions

☄️ Ideas

Note:
注意驱动是oracle还是mysql

Mysql数据导入

sqoop import \
    -Dorg.apache.sqoop.export.text.dump_data_on_error=true \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --driver com.mysql.cj.jdbc.Driver \
    --connect "jdbc:oracle:thin:@192.168.15.56:1521:orcl" \
    --username c\#\#jacbomqk0721 \
    --password bomtest$\6880 \
    --query "select PART.MODIFIER,TO_CHAR(PART.UPDATED, 'YYYY-MM-DD'), from PART where \$CONDITIONS" \
    --split-by PART.OID \
    --target-dir /user/hive/warehouse/cdh_dev.db/define_part \
    --delete-target-dir \
    --fields-terminated-by '\001' \
    --hive-drop-import-delims \
    --num-mappers 2 \
    --null-string '\\N' \
    --null-non-string '\\N'
导入到avro类型hive表中

avro数据同步,时间类型字段

sqoop import \
  -Dorg.apache.sqoop.export.text.dump_data_on_error=true \
  -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
  --verbose \
  --connect "jdbc:mysql://47.103.203.73:3306/VATtest" \
  --username admin \
  --password gsw921226 \
  --target-dir /user/hive/warehouse/rd_dev.db/ispong_table_avro3 \
  --query "select ispong_table.username username,ispong_table.age t_age,cast((unix_timestamp(ispong_table.lucky_date) / 60 / 60 / 24) as signed) lucky_date from ispong_table where \$CONDITIONS" \
  --split-by ispong_table.username \
  --hive-drop-import-delims \
  --append \
  --num-mappers 2 \
  --null-string '\\N' \
  --null-non-string '\\N' \
  --fields-terminated-by '\001' \
  --as-avrodatafile \
  --map-column-java lucky_date=Integer

Oracle 同步到 Hive

CREATE TABLE rd_dev.ispong_table_avro4(
    t_age int,
    t_username string,
    lucky_date date
)
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS 
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.literal'='{
      "namespace": "com.howdy",
      "name": "some_schema",
      "doc":"Sqoop import of QueryResult",
      "tableName": "ispong_table_avro4",
      "type": "record",
      "fields": [ 
            {"name":"T_USERNAME","type":["null","string"],"default":null,"columnName":"T_USERNAME","sqlType":"12"},
            {"name":"T_AGE","type":["null","double"],"default":null,"columnName":"T_AGE","sqlType":"4"},
            {"name":"LUCKY_DATE","type":["null","long"],"default":null,"columnName":"LUCKY_DATE","sqlType":"-5"}
      ]
    }');
CREATE TABLE LEACWH_AVRO_2(
    T_LEAC string,
    AMNTXSCK double
)
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS 
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.literal'='{
      "namespace": "com.howdy",
      "name": "some_schema",
      "doc":"Sqoop import of QueryResult",
      "tableName": "EPC_ZKVIN",
      "type": "record",
      "fields": [ 
            {"name":"T_LEAC","type":["null","string"],"default":null,"columnName":"T_LEAC","sqlType":"12"},
            {"name":"AMNTXSCK","type":["null","double"],"default":null,"columnName":"AMNTXSCK","sqlType":"4"}
      ]
    }');
sqoop import \
  -Dorg.apache.sqoop.export.text.dump_data_on_error=true \
  -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
  --verbose \
  --connect "jdbc:oracle:thin:@isxcode:51952/EE.oracle.docker" \
  --username ispong \
  --password ispong123 \
  --target-dir /user/hive/warehouse/rd_dev.db/ispong_table_avro4 \
  --query "select ispong_table.t\$username t\$username,ispong_table.t\$age t\$age,TO_NUMBER(TO_DATE(TO_CHAR(ispong_table.lucky_date, 'YYYY-MM-DD'), 'YYYY-MM-DD') - TO_DATE('1970-01-01', 'YYYY-MM-DD')) lucky_date from ispong_table where \$CONDITIONS" \
  --split-by T\$USERNAME \
  --hive-drop-import-delims \
  --delete-target-dir \
  --num-mappers 2 \
  --null-string '\\N' \
  --null-non-string '\\N' \
  --fields-terminated-by '\001' \
  --as-avrodatafile \
  --map-column-java LUCKY_DATE=Integer,T\$AGE=Double
参数说明
  • verbose

    打印更多日志

  • delete-target-dir

    先清除hive表中的所有数据,再执行数据插入

  • append

    将数据库中的数据全部插入

  • query

    手写sql查询语句,注意结尾必须添加 where $CONDITIONS
    必须指定split-by切分键 ,指定的参数未sql中select中的参数
    当需要转换时间格式的时候,可以使用TO_CHAR(PART.UPDATED, ‘YYYY-MM-DD’),对数据做替换

  • hive-drop-import-delims

    删除无用的数据,此参数不加会出现数据同步条数变多的情况

  • num-mappers

    任务并发数,注意当数据比较少的时候,并发数不能设置太多,否则会出现重复插入
    split-by 如果使用-num-mappers>1 则需要指定列表

  • target-dir

    hive表的在hdfs上的存储位置

  • fields-terminated-by

    字段分割符,建议使用 '\001'

  • null-string

    如果string为null如何添加数值,建议使用 '\\N'

  • null-non-string

    如果不是string类型的值为null如何写入数值,建议使用 '\\N'

  • as-avrodatafile

    导入到avro类型表中

set hive.exec.dynamic.partition.mode=nonstrict;


sqoop1实战 数据导入
https://ispong.isxcode.com/hadoop/sqoop/sqoop1实战 数据导入/
Author
ispong
Posted on
November 17, 2021
Licensed under