hive 问题

Last updated on September 15, 2024 pm

🧙 Questions

☄️ Ideas

java.net.SocketTimeoutException: Read timed out

org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
解决方案

这是由于jdbc的登录时间设置的太短了

DriverManager.setLoginTimeout(100);

Permission denied: user=ispong, access=WRITE

2021-04-21 11:22:04.478  INFO 2743 --- [io-30104-exec-8] c.d.d.a.m.h.c.HdfsFIieController         : uploadHdfsFile==> Permission denied: user=ispong, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.resolvePathForStartFile(FSDirWriteFileOp.java:323)
        at org.apache.ha

2022-03-22 22:36:48 ERROR:Cannot create staging directory 'hdfs://cdh-master:8020/user/hive/warehouse/ispong_test02.db/ods_s_xy_app_count/.hive-staging_hive_2022-03-22_22-30-01_832_8923653986493560028-6': Permission denied: user=root, access=WRITE, inode="/user/hive/warehouse/ispong_test02.db/ods_s_xy_app_count":ispong:hive:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2022-03-22 22:36:48 ERROR Current task status: ERROR

解决方案

因为hive的用户依赖于hdfs,hdfs的权限依赖于Linux系统权限。
supergroup是linux默认用户组,但现在系统默认自动创建。
此问题,可以通过新建supergroup用户组,并将用户添加到用户组的方式解决。

# 创建supergroup用户组
sudo groupadd supergroup
# 将ispong用户添加到supergroup用户组中
sudo usermod -a -G supergroup yarn
# 刷新supergroup用户组
newgrp supergroup

统计数据条数不准

使用desc formatted 查询的表条数,和表的大小不一致,hive返回的条数不准确

解决方案
-- 使用analyze解析一下表,不支持分区表
analyze table users compute statistics;

脏数据

定位数据位置
  • 看日志 大多数是时间错位 数字错位
  • 二分法,找到错误数据

^@ NU 0x00 0 NULL (NUL)N \u0000
^A SH 0x01 1 START OF HEADING (SOH) \u0001
^B SX 0x02 2 START OF TEXT (STX) \u0002
^C EX 0x03 3 END OF TEXT (ETX) \u0003
^D ET 0x04 4 END OF TRANSMISSION (EOT) \u0004
^E EQ 0x05 5 ENQUIRY (ENQ) \u0005
^F AK 0x06 6 ACKNOWLEDGE (ACK) \u0006
^G BL 0x07 7 BELL (BEL) \u0007
^H BS 0x08 8 BACKSPACE (BS) \u0008
^I HT 0x09 9 CHARACTER TABULATION (HT) \u0009
^@ LF 0x0a 10 LINE FEED (LF) \u0010
^K VT 0x0b 11 LINE TABULATION (VT) \u0011
^L FF 0x0c 12 FORM FEED (FF) \u0012
^M CR 0x0d 13 CARRIAGE RETURN (CR) \u0013

<!-- 脏数据 -->
162554448^B18757471839^BLJ12FKT36M4012088^B21322337L22230TB2021301SEG^B1^B2022-03-29 23:59:54^B^B\N
162554062^B15168374597^BLJ11KBBD6L1330035^B20A1379^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B1^B2022-03-29 23:52:27^B^B\N
创建错误数据表
drop table ads_t_user_record_define;

CREATE TABLE IF NOT EXISTS ads_t_user_record_define(
    id INT comment '主键ID',
    user_name STRING comment '用户名',
    vin STRING comment '登录所在车VIN',
    sn STRING comment '登录所在车MP5的SN',
    user_type INT comment '用户类型',
    login_time TIMESTAMP comment '登录时间',
    content STRING comment '内容',
    solution STRING comment '解决方法'
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' LINES TERMINATED BY '\n' NULL DEFINED AS '\\N' STORED AS TEXTFILE
-- #  导入错误数据
insert overwrite table ads_t_user_record_define
    select  id,user_name,vin,sn,user_type,login_time,content,solution  from ads_t_user_record t 
     where t.id in ('162554448','162554062')
#  下载数据
hadoop fs -get /user/hive/warehouse/ispong_test.db/ads_t_user_record_define/part-00000-4878ea6d-62e3-4bb8-98e6-a0b2757f3957-c000 ~/
select regexp_replace(sn,'\u0000','') from ads_t_user_record_define limit 10

hive 问题
https://ispong.isxcode.com/hadoop/hive/hive 问题/
Author
ispong
Posted on
April 30, 2021
Licensed under