sqoop1教程 单节点安装

Last updated on September 15, 2024 pm

🧙 Questions

安装sqoop 1.4.7 版本

☄️ Ideas

下载
# 下载hadoop
nohup wget https://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz >> download_sqoop.log 2>&1 &
tail -f download_sqoop.log

# 创建文件夹
sudo mkdir -p /data/sqoop/
sudo chown -R ispong:ispong /data/sqoop/

# 解压且创建软连接
tar -vzxf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /data/sqoop/
sudo ln -s /data/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop

# 配置环境变量
sudo vim /etc/profile
# === sudo vim /etc/profile ===
export SQOOP_HOME=/opt/sqoop
export PATH=$PATH:$SQOOP_HOME/bin 
# === sudo vim /etc/profile ===
source /etc/profile
配置
cp /opt/sqoop/conf/sqoop-env-template.sh /opt/sqoop/conf/sqoop-env.sh
vim /opt/sqoop/conf/sqoop-env.sh

# === /opt/sqoop/conf/sqoop-env.sh ===
export HADOOP_COMMON_HOME=/opt/hadoop/
export HADOOP_MAPRED_HOME=/opt/hadoop/
export HIVE_HOME=/opt/hive/
# === /opt/sqoop/conf/sqoop-env.sh ===
添加依赖

添加commons-lang

wget https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
cp commons-lang-2.6.jar /opt/sqoop/lib/

添加 mysqllib

wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.22.tar.gz
tar vzxf mysql-connector-java-8.0.22.tar.gz
cp mysql-connector-java-8.0.22/mysql-connector-java-8.0.22.jar /opt/sqoop/lib/
测试
CREATE TABLE IF NOT EXISTS default.ispong_table(
    username   STRING  COMMENT 'comment for username',
    age        INT     COMMENT 'comment for age'
    )
    ROW FORMAT 
    DELIMITED FIELDS TERMINATED BY ',' 
    LINES TERMINATED BY '\n'
    STORED AS TEXTFILE;
# 检测
sqoop version

sqoop import \
    -Dorg.apache.sqoop.export.text.dump_data_on_error=true \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --driver com.mysql.cj.jdbc.Driver \
    --connect "jdbc:mysql://192.168.66.66:30102/ispong" \
    --username root \
    --password ispong2021 \
    --query "select ispong_table.nickname,ispong_table.age from ispong_table where \$CONDITIONS" \
    --split-by ispong_table.age \
    --target-dir /user/hive/warehouse/ispong_table \
    --fields-terminated-by ',' \
    --append

title: sqoop1 sqoop1安装
subtitle: Sqoop Sqoop1安装
tags:


🧙 Questions

安装sqoop1

☄️ Ideas

前提

  • java
  • hadoop
  • hive

1. 下载安装包

wget https://mirrors.bfsu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -O sqoop-1.4.7.tar.gz
tar -vzxf sqoop-1.4.7.tar.gz
mv sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop

2. 配置环境变量

vim ~/.bashrc
export SQOOP_HOME=/data/sqoop/sqoop1
export JAVA_HOME=/data/jdk1.8.0_271
export HADOOP_HOME=/data/hadoop-3.3.0
export HIVE_HOME=/data/hive
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin
source ~/.bashrc
3. 上传数据源驱动
mysql
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.23.tar.gz -O mysql-connector-jdbc.tar.gz
tar vzxf mysql-connector-jdbc.tar.gz
cp ./mysql-connector-java-8.0.23/mysql-connector-java-8.0.23.jar /sqoop/lib/
hive
cp /hive/lib/hive-jdbc-3.1.2.jar /sqoop/lib
sqlServer
wget https://go.microsoft.com/fwlink/?linkid=2155949 -O sqlserver-connector-jdbc.tar.gz
tar vzxf sqlserver-connector-jdbc.tar.gz
cp ./sqljdbc_9.2/enu/mssql-jdbc-9.2.1.jre8.jar /sqoop/lib
oracle
4. 创建配置文件
cp /data/sqoop/sqoop1/conf/sqoop-env-template.sh /data/sqoop/sqoop1/conf/sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/data/hadoop-3.3.0
export HADOOP_MAPRED_HOME=/data/hadoop-3.3.0
export HIVE_HOME=/data/apache-hive-3.1.2-bin
5. 测试安装
# 查询数据库的源
sqoop list-databases \
	--connect jdbc:mysql://isxcode:30002/data_assets \
	--username ispong_data \
	--password define2021

title: Sqoop Sqoop1 同步命令
subtitle: Sqoop Sqoop1 同步命令
tags:


🧙 Questions

☄️ Ideas

数据导出

sqoop export数据导出,将hive中的数据导出到别的数据库
–driver com.mysql.cj.jdbc.Driver \

hive 导出到 mysql
sqoop export \
    --verbose \
    --connect "jdbc:mysql://isxcode:30102/data_assets" \
    --username ispong \
    --password ispong2021 \
    --table ispong_table \
    --export-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
    --num-mappers 1 \
    --columns username,age,birth \
    --input-fields-terminated-by ',' \
    --input-null-string '' \
    --input-null-non-string 0
导出到hbase中
sqoop export \
    --verbose \
    --driver com.mysql.cj.jdbc.Driver \
    --connect jdbc:mysql://isxcode:30002/data_assets \
    --username root \
    --password ispongdev2020 \
    --table ispong \
    --columns id,username,age \
    --export-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong \
    --input-fields-terminated-by ' ' \
    --input-lines-terminated-by '\t' \
    --num-mappers 2
常用参数
example desc
–verbose 打印更多的日志
–table ispong 同步到指定表
–columns id,age,username 选择hive中的同步字段,字段顺序需要注意字段类型,必须严格匹配 如果sqoop执行错误,部分合法数据还是会同步进去
–export-dir /data/hive/warehouse/ispong_demo.db/access_log hive的数据来源
–num-mappers 1 任务并发数
–input-fields-terminated-by ‘ ‘ 字段分割符
–input-lines-terminated-by ‘\t’ 行分割符
–update-mode allowinsert 配合update-key生效,更新模式 updateonly(default 只更新) / allowinsert (允许插入)
–update-key id 配合update-mode生效,如果多个主键,使用逗号隔开
增量更新
sqoop export \
    --verbose \
    --connect jdbc:mysql://isxcode:30002/data_assets \
    --username root \
    --password ispongdev2020 \
    --table ispong \
    --columns id,username,age,last_date \
    --export-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong \
    --input-fields-terminated-by ' ' \
    --input-lines-terminated-by '\t' \
    --num-mappers 2 \
    --update-mode updateonly \
    --update-key id

数据导入

将其他数据导入到hive表中

mysql 到 hive

–driver com.mysql.cj.jdbc.Driver \

sqoop import \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --connect "jdbc:mysql://isxcode:30102/data_assets" \
    --username ispong \
    --password ispong2021 \
    --target-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
    --query "select ispong_table.username,ispong_table.age,ispong_table.birth from ispong_table where \$CONDITIONS" \
    --split-by ispong_table.username \
    --hive-drop-import-delims \
    --delete-target-dir \
    --num-mappers 1 \
    --fields-terminated-by ',' \
    --null-string ' ' \
    --null-non-string 0
oracle 到 hive

如果hive的字段类型为date,则oracle需要转化时间格式

sqoop import \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --connect "jdbc:oracle:thin:@isxcode:51952/EE.oracle.docker" \
    --username ispong \
    --password ispong123 \
    --target-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
    --query "select ISPONG_TABLE.USERNAME,ISPONG_TABLE.AGE,TO_CHAR(ISPONG_TABLE.BIRTH, 'YYYY-MM-DD') from ISPONG_TABLE where \$CONDITIONS" \
    --split-by ispong_table.USERNAME \
    --hive-drop-import-delims \
    --delete-target-dir \
    --num-mappers 1 \
    --fields-terminated-by "," \
    --null-string " " \
    --null-non-string 0

sqoop import \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --connect "jdbc:oracle:thin:@isxcode:51952/EE.oracle.docker" \
    --username ispong \
    --password ispong123 \
    --target-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
    --query "select ISPONG_TABLE.USERNAME,ISPONG_TABLE.AGE,ISPONG_TABLE.BIRTH from ISPONG_TABLE where \$CONDITIONS" \
    --split-by ispong_table.USERNAME \
    --hive-drop-import-delims \
    --delete-target-dir \
    --num-mappers 1 \
    --fields-terminated-by "," \
    --null-string " " \
    --null-non-string 0
mysql导入hbase数据
sqoop import \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    -Dsqoop.hbase.add.row.key=true \
    --verbose \
    --connect jdbc:mysql://isxcode:30102/data_assets \
    --username ispong \
    --password ispong2021 \
    --query 'select age,username from ispong_table where $CONDITIONS' \
    --split-by age \
    --hbase-table ispong_dev:ispong_table \
    --column-family ispong \
    --hbase-row-key age
example desc
-Dsqoop.hbase.add.row.key=true 添加
–hbase-table hbase表名
–column-family hbase列族 mysql中数据只可以同步到一个列族中
–hbase-row-key age 选择mysql中一个字段作为rowKey去同步数据
常用参数
example desc
-Dorg.apache.sqoop.splitter.allow_text_splitter=true 必须放第一行
–append 数据全部插入
–delete-target-dir 清理hive表数据后再插入数据
–query ‘select id,title,bv_num from popular where $CONDITIONS’ 自定义sql查询 必须有split-by字段,字段类型顺序需要注意
–split-by id 切分键
–num-mappers 1 任务并发数
–target-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong hive数据的存储位置
–fields-terminated-by ‘ ‘ 字段分割符
–null-string ‘ ‘ null值是string类型的 赋予默认值
–null-non-string 0 null值不是string类型的 赋予默认值 当前一个字段为空格的时候 无法识别
增量更新

sqoop 会默认不得超过当前最新时间,设置时间字段,同步比当前时间之后的数据

sqoop import \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --driver com.mysql.cj.jdbc.Driver \
    --connect jdbc:mysql://isxcode:30002/data_assets \
    --username root \
    --password ispongdev2020 \
    --target-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong \
    --query ' select id,username,age,last_date from ispong where $CONDITIONS' \
    --split-by id \
    --num-mappers 2 \
    --fields-terminated-by ' ' \
    --null-string ' ' \
    --null-non-string 0 \
    --append \
    --check-column last_date \
    --incremental lastmodified\
    --last-value '2021-06-03'
同步数据到分区表

Note:
分区键数据不可以同步
–append 模式不可用 ,只有–delete-target-dir模式可用
直接指定hive的地址 同步

sqoop import \
    -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
    --verbose \
    --connect jdbc:mysql://isxcode:30002/data_assets \
    --username root \
    --password ispongdev2020 \
    --query 'select id,username,age from ispong where $CONDITIONS' \
    --target-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong/last_date=2020-12-12/prev_date=2020-12-14 \
    --split-by id \
    --num-mappers 2 \
    --fields-terminated-by ' ' \
    --delete-target-dir

sqoop1教程 单节点安装
https://ispong.isxcode.com/hadoop/sqoop/sqoop1教程 单节点安装/
Author
ispong
Posted on
February 12, 2022
Licensed under