sqoop1教程 单节点安装
Last updated on January 17, 2025 am
🧙 Questions
安装sqoop 1.4.7 版本
☄️ Ideas
下载
# 下载hadoop
nohup wget https://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz >> download_sqoop.log 2>&1 &
tail -f download_sqoop.log
# 创建文件夹
sudo mkdir -p /data/sqoop/
sudo chown -R ispong:ispong /data/sqoop/
# 解压且创建软连接
tar -vzxf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /data/sqoop/
sudo ln -s /data/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop
# 配置环境变量
sudo vim /etc/profile
# === sudo vim /etc/profile ===
export SQOOP_HOME=/opt/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
# === sudo vim /etc/profile ===
source /etc/profile
配置
cp /opt/sqoop/conf/sqoop-env-template.sh /opt/sqoop/conf/sqoop-env.sh
vim /opt/sqoop/conf/sqoop-env.sh
# === /opt/sqoop/conf/sqoop-env.sh ===
export HADOOP_COMMON_HOME=/opt/hadoop/
export HADOOP_MAPRED_HOME=/opt/hadoop/
export HIVE_HOME=/opt/hive/
# === /opt/sqoop/conf/sqoop-env.sh ===
添加依赖
添加commons-lang
wget https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar
cp commons-lang-2.6.jar /opt/sqoop/lib/
添加 mysqllib
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.22.tar.gz
tar vzxf mysql-connector-java-8.0.22.tar.gz
cp mysql-connector-java-8.0.22/mysql-connector-java-8.0.22.jar /opt/sqoop/lib/
测试
CREATE TABLE IF NOT EXISTS default.ispong_table(
username STRING COMMENT 'comment for username',
age INT COMMENT 'comment for age'
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
# 检测
sqoop version
sqoop import \
-Dorg.apache.sqoop.export.text.dump_data_on_error=true \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--driver com.mysql.cj.jdbc.Driver \
--connect "jdbc:mysql://192.168.66.66:30102/ispong" \
--username root \
--password ispong2021 \
--query "select ispong_table.nickname,ispong_table.age from ispong_table where \$CONDITIONS" \
--split-by ispong_table.age \
--target-dir /user/hive/warehouse/ispong_table \
--fields-terminated-by ',' \
--append
🔗 Links
title: sqoop1 sqoop1安装
subtitle: Sqoop Sqoop1安装
tags:
- sqoop
categories: - Sqoop
index_img: ‘https://img.isxcode.com/index_img/sqoop/index.png'
mermaid: false
math: false
hide: false
comments: true
date: 2021-03-12 16:14:14
🧙 Questions
安装sqoop1
☄️ Ideas
前提
- java
- hadoop
- hive
1. 下载安装包
wget https://mirrors.bfsu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -O sqoop-1.4.7.tar.gz
tar -vzxf sqoop-1.4.7.tar.gz
mv sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop
2. 配置环境变量
vim ~/.bashrc
export SQOOP_HOME=/data/sqoop/sqoop1
export JAVA_HOME=/data/jdk1.8.0_271
export HADOOP_HOME=/data/hadoop-3.3.0
export HIVE_HOME=/data/hive
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin
source ~/.bashrc
3. 上传数据源驱动
mysql
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.23.tar.gz -O mysql-connector-jdbc.tar.gz
tar vzxf mysql-connector-jdbc.tar.gz
cp ./mysql-connector-java-8.0.23/mysql-connector-java-8.0.23.jar /sqoop/lib/
hive
cp /hive/lib/hive-jdbc-3.1.2.jar /sqoop/lib
sqlServer
wget https://go.microsoft.com/fwlink/?linkid=2155949 -O sqlserver-connector-jdbc.tar.gz
tar vzxf sqlserver-connector-jdbc.tar.gz
cp ./sqljdbc_9.2/enu/mssql-jdbc-9.2.1.jre8.jar /sqoop/lib
oracle
4. 创建配置文件
cp /data/sqoop/sqoop1/conf/sqoop-env-template.sh /data/sqoop/sqoop1/conf/sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/data/hadoop-3.3.0
export HADOOP_MAPRED_HOME=/data/hadoop-3.3.0
export HIVE_HOME=/data/apache-hive-3.1.2-bin
5. 测试安装
# 查询数据库的源
sqoop list-databases \
--connect jdbc:mysql://isxcode:30002/data_assets \
--username ispong_data \
--password define2021
title: Sqoop Sqoop1 同步命令
subtitle: Sqoop Sqoop1 同步命令
tags:
- sqoop
categories: - Sqoop
index_img: ‘https://img.isxcode.com/index_img/sqoop/index.png'
mermaid: false
math: false
hide: false
comments: true
date: 2021-03-12 16:23:13
🧙 Questions
☄️ Ideas
数据导出
sqoop export数据导出,将hive中的数据导出到别的数据库
–driver com.mysql.cj.jdbc.Driver \
hive 导出到 mysql
sqoop export \
--verbose \
--connect "jdbc:mysql://isxcode:30102/data_assets" \
--username ispong \
--password ispong2021 \
--table ispong_table \
--export-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
--num-mappers 1 \
--columns username,age,birth \
--input-fields-terminated-by ',' \
--input-null-string '' \
--input-null-non-string 0
导出到hbase中
sqoop export \
--verbose \
--driver com.mysql.cj.jdbc.Driver \
--connect jdbc:mysql://isxcode:30002/data_assets \
--username root \
--password ispongdev2020 \
--table ispong \
--columns id,username,age \
--export-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong \
--input-fields-terminated-by ' ' \
--input-lines-terminated-by '\t' \
--num-mappers 2
常用参数
example | desc |
---|---|
–verbose | 打印更多的日志 |
–table ispong | 同步到指定表 |
–columns id,age,username | 选择hive中的同步字段,字段顺序需要注意字段类型,必须严格匹配 如果sqoop执行错误,部分合法数据还是会同步进去 |
–export-dir /data/hive/warehouse/ispong_demo.db/access_log | hive的数据来源 |
–num-mappers 1 | 任务并发数 |
–input-fields-terminated-by ‘ ‘ | 字段分割符 |
–input-lines-terminated-by ‘\t’ | 行分割符 |
–update-mode allowinsert | 配合update-key生效,更新模式 updateonly(default 只更新) / allowinsert (允许插入) |
–update-key id | 配合update-mode生效,如果多个主键,使用逗号隔开 |
增量更新
sqoop export \
--verbose \
--connect jdbc:mysql://isxcode:30002/data_assets \
--username root \
--password ispongdev2020 \
--table ispong \
--columns id,username,age,last_date \
--export-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong \
--input-fields-terminated-by ' ' \
--input-lines-terminated-by '\t' \
--num-mappers 2 \
--update-mode updateonly \
--update-key id
数据导入
将其他数据导入到hive表中
mysql 到 hive
–driver com.mysql.cj.jdbc.Driver \
sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--connect "jdbc:mysql://isxcode:30102/data_assets" \
--username ispong \
--password ispong2021 \
--target-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
--query "select ispong_table.username,ispong_table.age,ispong_table.birth from ispong_table where \$CONDITIONS" \
--split-by ispong_table.username \
--hive-drop-import-delims \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by ',' \
--null-string ' ' \
--null-non-string 0
oracle 到 hive
如果hive的字段类型为date,则oracle需要转化时间格式
sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--connect "jdbc:oracle:thin:@isxcode:51952/EE.oracle.docker" \
--username ispong \
--password ispong123 \
--target-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
--query "select ISPONG_TABLE.USERNAME,ISPONG_TABLE.AGE,TO_CHAR(ISPONG_TABLE.BIRTH, 'YYYY-MM-DD') from ISPONG_TABLE where \$CONDITIONS" \
--split-by ispong_table.USERNAME \
--hive-drop-import-delims \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "," \
--null-string " " \
--null-non-string 0
sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--connect "jdbc:oracle:thin:@isxcode:51952/EE.oracle.docker" \
--username ispong \
--password ispong123 \
--target-dir /user/hive/warehouse/cdh_dev.db/ispong_table \
--query "select ISPONG_TABLE.USERNAME,ISPONG_TABLE.AGE,ISPONG_TABLE.BIRTH from ISPONG_TABLE where \$CONDITIONS" \
--split-by ispong_table.USERNAME \
--hive-drop-import-delims \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "," \
--null-string " " \
--null-non-string 0
mysql导入hbase数据
sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dsqoop.hbase.add.row.key=true \
--verbose \
--connect jdbc:mysql://isxcode:30102/data_assets \
--username ispong \
--password ispong2021 \
--query 'select age,username from ispong_table where $CONDITIONS' \
--split-by age \
--hbase-table ispong_dev:ispong_table \
--column-family ispong \
--hbase-row-key age
example | desc |
---|---|
-Dsqoop.hbase.add.row.key=true | 添加 |
–hbase-table | hbase表名 |
–column-family | hbase列族 mysql中数据只可以同步到一个列族中 |
–hbase-row-key age | 选择mysql中一个字段作为rowKey去同步数据 |
常用参数
example | desc |
---|---|
-Dorg.apache.sqoop.splitter.allow_text_splitter=true | 必须放第一行 |
–append | 数据全部插入 |
–delete-target-dir | 清理hive表数据后再插入数据 |
–query ‘select id,title,bv_num from popular where $CONDITIONS’ | 自定义sql查询 必须有split-by字段,字段类型顺序需要注意 |
–split-by id | 切分键 |
–num-mappers 1 | 任务并发数 |
–target-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong | hive数据的存储位置 |
–fields-terminated-by ‘ ‘ | 字段分割符 |
–null-string ‘ ‘ | null值是string类型的 赋予默认值 |
–null-non-string 0 | null值不是string类型的 赋予默认值 当前一个字段为空格的时候 无法识别 |
增量更新
sqoop 会默认不得超过当前最新时间,设置时间字段,同步比当前时间之后的数据
sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--driver com.mysql.cj.jdbc.Driver \
--connect jdbc:mysql://isxcode:30002/data_assets \
--username root \
--password ispongdev2020 \
--target-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong \
--query ' select id,username,age,last_date from ispong where $CONDITIONS' \
--split-by id \
--num-mappers 2 \
--fields-terminated-by ' ' \
--null-string ' ' \
--null-non-string 0 \
--append \
--check-column last_date \
--incremental lastmodified\
--last-value '2021-06-03'
同步数据到分区表
Note:
分区键数据不可以同步
–append 模式不可用 ,只有–delete-target-dir模式可用
直接指定hive的地址 同步
sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--verbose \
--connect jdbc:mysql://isxcode:30002/data_assets \
--username root \
--password ispongdev2020 \
--query 'select id,username,age from ispong where $CONDITIONS' \
--target-dir hdfs://isxcode:30110/user/hive/warehouse/dev.db/ispong/last_date=2020-12-12/prev_date=2020-12-14 \
--split-by id \
--num-mappers 2 \
--fields-terminated-by ' ' \
--delete-target-dir
🔗 Links
🔗 Links
sqoop1教程 单节点安装
https://ispong.isxcode.com/hadoop/sqoop/sqoop1教程 单节点安装/