hive 问题
Last updated on January 18, 2025 am
🧙 Questions
- java.net.SocketTimeoutException: Read timed out
- [Permission denied: user=ispong, access=WRITE](#Permission denied: user=ispong, access=WRITE)
- Hive字段备注乱码问题
- 统计数据条数不准
☄️ Ideas
java.net.SocketTimeoutException: Read timed out
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
解决方案
这是由于jdbc的登录时间设置的太短了
DriverManager.setLoginTimeout(100);
Permission denied: user=ispong, access=WRITE
2021-04-21 11:22:04.478 INFO 2743 --- [io-30104-exec-8] c.d.d.a.m.h.c.HdfsFIieController : uploadHdfsFile==> Permission denied: user=ispong, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.resolvePathForStartFile(FSDirWriteFileOp.java:323)
at org.apache.ha
2022-03-22 22:36:48 ERROR:Cannot create staging directory 'hdfs://cdh-master:8020/user/hive/warehouse/ispong_test02.db/ods_s_xy_app_count/.hive-staging_hive_2022-03-22_22-30-01_832_8923653986493560028-6': Permission denied: user=root, access=WRITE, inode="/user/hive/warehouse/ispong_test02.db/ods_s_xy_app_count":ispong:hive:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2022-03-22 22:36:48 ERROR Current task status: ERROR
解决方案
因为hive的用户依赖于hdfs,hdfs的权限依赖于Linux系统权限。
supergroup是linux默认用户组,但现在系统默认自动创建。
此问题,可以通过新建supergroup用户组,并将用户添加到用户组的方式解决。
# 创建supergroup用户组
sudo groupadd supergroup
# 将ispong用户添加到supergroup用户组中
sudo usermod -a -G supergroup yarn
# 刷新supergroup用户组
newgrp supergroup
统计数据条数不准
使用desc formatted 查询的表条数,和表的大小不一致,hive返回的条数不准确
解决方案
-- 使用analyze解析一下表,不支持分区表
analyze table users compute statistics;
脏数据
定位数据位置
- 看日志 大多数是时间错位 数字错位
- 二分法,找到错误数据
^@ NU 0x00 0 NULL (NUL)N \u0000
^A SH 0x01 1 START OF HEADING (SOH) \u0001
^B SX 0x02 2 START OF TEXT (STX) \u0002
^C EX 0x03 3 END OF TEXT (ETX) \u0003
^D ET 0x04 4 END OF TRANSMISSION (EOT) \u0004
^E EQ 0x05 5 ENQUIRY (ENQ) \u0005
^F AK 0x06 6 ACKNOWLEDGE (ACK) \u0006
^G BL 0x07 7 BELL (BEL) \u0007
^H BS 0x08 8 BACKSPACE (BS) \u0008
^I HT 0x09 9 CHARACTER TABULATION (HT) \u0009
^@ LF 0x0a 10 LINE FEED (LF) \u0010
^K VT 0x0b 11 LINE TABULATION (VT) \u0011
^L FF 0x0c 12 FORM FEED (FF) \u0012
^M CR 0x0d 13 CARRIAGE RETURN (CR) \u0013
<!-- 脏数据 -->
162554448^B18757471839^BLJ12FKT36M4012088^B21322337L22230TB2021301SEG^B1^B2022-03-29 23:59:54^B^B\N
162554062^B15168374597^BLJ11KBBD6L1330035^B20A1379^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B1^B2022-03-29 23:52:27^B^B\N
创建错误数据表
drop table ads_t_user_record_define;
CREATE TABLE IF NOT EXISTS ads_t_user_record_define(
id INT comment '主键ID',
user_name STRING comment '用户名',
vin STRING comment '登录所在车VIN',
sn STRING comment '登录所在车MP5的SN',
user_type INT comment '用户类型',
login_time TIMESTAMP comment '登录时间',
content STRING comment '内容',
solution STRING comment '解决方法'
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' LINES TERMINATED BY '\n' NULL DEFINED AS '\\N' STORED AS TEXTFILE
-- # 导入错误数据
insert overwrite table ads_t_user_record_define
select id,user_name,vin,sn,user_type,login_time,content,solution from ads_t_user_record t
where t.id in ('162554448','162554062')
# 下载数据
hadoop fs -get /user/hive/warehouse/ispong_test.db/ads_t_user_record_define/part-00000-4878ea6d-62e3-4bb8-98e6-a0b2757f3957-c000 ~/
select regexp_replace(sn,'\u0000','') from ads_t_user_record_define limit 10