cdh 问题合集

Last updated on November 22, 2024 pm

🧙 Questions

☄️ Ideas

Request to the Service Monitor failed. This may cause slow page responses

java.util.NoSuchElementException: This installation is currently running Cloudera Express.
        at com.cloudera.api.dao.impl.LicenseManagerDaoImpl.readLicense(LicenseManagerDaoImpl.java:86)
        at com.cloudera.api.v1.impl.ClouderaManagerResourceImpl.readLicense(ClouderaManagerResourceImpl.java:57)
        at com.cloudera.api.v32.impl.ClouderaManagerResourceV32Impl.readLicense(ClouderaManagerResourceV32Impl.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /172.23.39.205:30108
        at com.ning.http.client.providers.netty.future.NettyResponseFuture.abort(NettyResponseFuture.java:231)
        at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:137)
        at com.ning.http.client.providers.netty.request.NettyConnectListener.operationComplete(NettyConnectListener.java:145)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:400)
        at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: /172.23.39.205:30108
        at com.ning.http.client.providers.netty.request.NettyConnectListener.onFutureFailure(NettyConnectListener.java:133)
        ... 13 more
Caused by: java.net.ConnectException: Connection refused: /172.23.39.205:30108
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        ... 8 more
2021-09-29 10:23:19,382 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: Unable to retrieve remote parcel repository manifest
Request to the Service Monitor failed. This may cause slow page responses
Unable to issue query: the Service Monitor is not running
Can't open /var/run/cloudera-scm-agent/process/328-cloudera-mgmt-SERVICEMONITOR/supervisor_status: Permission denied.
Error creating LevelDB timeseries store in directory /data/cloudera/cloudera-service-monitor/ts
java.io.IOException: Unable to create new directory at /data/cloudera/cloudera-service-monitor/ts/ts_entity_metadata
	at com.cloudera.cmon.tstore.leveldb.LDBUtils.openVersionedDB(LDBUtils.java:257)
	at com.cloudera.cmon.tstore.leveldb.LDBUtils.openVersionedDB(LDBUtils.java:212)
	at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesMetadataStore.openMetadataDB(LDBTimeSeriesMetadataStore.java:147)
	at com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesMetadataStore.<init>(LDBTimeSeriesMetadataStore.java:137)
	at com.cloudera.cmon.firehose.Main.main(Main.java:477)
解决方案
# cdh的monitor无法正常启动的问题
点击cms 点击instances 重新运行service monitor
# sudo chmod -R 777 /var/run/cloudera-scm-agent

# 查看
sudo systemctl status cloudera-scm-agent 
sudo systemctl restart cloudera-scm-agent

# 重新创建service monitor 节点
mkdir /data/cloudera/cloudera-service-monitor/ts

####

FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 2d5ee4ed-bbb2-4020-b3c7-b8a52624569c_0: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error ?
解决方案
各个节点内存加起来不足以使用 通过对yarn设置  将单节点可用内存设置大一点
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb

yarn.app.mapreduce.am.resource.mb :AM能够申请的最大内存,默认值为1536MB
yarn.nodemanager.resource.memory-mb : 10g nodemanager能够申请的最大内存,默认值为8192MB
yarn.scheduler.minimum-allocation-mb :1g 调度时一个container能够申请的最小资源,默认值为1024MB
yarn.scheduler.maximum-allocation-mb :10g 调度时一个container能够申请的最大资源,默认值为8192MB

hive中可以配置
spark.driver.memory 3g 如果数据较大直接调大
spark.executor.memory 5g
spark.yarn.am.memory

spark.yarn.executor.memoryOverhead:值为executorMemory * 0.07, with minimum of 384
spark.yarn.driver.memoryOverhead:值为driverMemory * 0.07, with minimum of 384
spark.yarn.am.memoryOverhead:值为AM memory * 0.07, with minimum of 384

hive.spark.client.connect.timeout

hive.spark.client.future.timeout=60000
hive.spark.client.connect.timeout=1000
hive.spark.client.server.connect.timeout=90000

3,597 under replicated blocks in the cluster. 3,624 total blocks in the cluster. Percentage under replicated blocks: 99.25%. Critical threshold: 40.00%.

Unable to issue query: the Service Monitor is not running

Mon Sep 27 15:52:28 CST 2021
JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
CONF_DIR=/var/run/cloudera-scm-agent/process/326-cloudera-mgmt-SERVICEMONITOR
CMF_CONF_DIR=
Removing any leveldbjni library files left over from previous runs
Executing: /usr/java/jdk1.8.0_181-cloudera/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt-SERVICEMONITOR-ispong-demo.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dfirehose.schema.dir=/opt/cloudera/cm/schema -XX:PermSize=128m -Dsun.rmi.transport.tcp.handshakeTimeout=10000 -Dsun.rmi.transport.tcp.responseTimeout=10000 -Dlibrary.leveldbjni.path=/run/cloudera-scm-agent/process/326-cloudera-mgmt-SERVICEMONITOR -Xms2147483648 -Xmx2147483648 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/mgmt_mgmt-SERVICEMONITOR-fffb3351784571e873e959ceb7c00c5d_pid4379.hprof -XX:OnOutOfMemoryError=/opt/cloudera/cm-agent/service/common/killparent.sh -cp /run/cloudera-scm-agent/process/326-cloudera-mgmt-SERVICEMONITOR:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*: com.cloudera.cmon.firehose.Main --pipeline-type SERVICE_MONITORING --mgmt-home /opt/cloudera/cm

Cloudera 离线集群安装

sudo yum --disablerepo=* --enablerepo=cloudera* clean all
sudo yum --disablerepo=* --enablerepo=cloudera* list installed oracle-j2sdk1.8
sudo yum --disablerepo=* --enablerepo=cloudera* info cloudera-manager-agent
cd /etc/yum.repos.d
##三台服务器都要
yum clean all && yum makecache

sudo yum install cloudera-manager-agent
sudo yum install oracle-j2sdk1.8
python-psycopg2
openssl-devel
/lib/lsb/init-function
MySQL-python
mod_ssl
~~```~~
##### 解决方案

```bash
/data/cdh/cloudera/cm/lib/agent

Unable to issue query: the Service Monitor is not running

1.png

解决思路

初步判断: 服务的监控出问题了,估计是agent代理挂掉了

查看agent
# 查看agent日志
sudo tail -n 400 /var/log/cloudera-scm-agent/cloudera-scm-agent.log
service cloudera-scm-agent stop

如果命令不存在,说明可能agent被谁删除了,或者卸载了,所以将host移出cloudera manager 然后重新安装

中文乱码
yarn.app.mapreduce.am.command-opts: -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8 -Duser.lanuage=zh
mapreduce.map.java.opts: -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8 -Duser.lanuage=zh
mapreduce.reduce.java.opts: -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8 -Duser.lanuage=zh
agent无法连接
可能因为重启服务器,hostname被修改导致,cdh会默认自动,使用hosts中最上面的域名

# 关于agent的相关文件
/var/log/cloudera-scm-agent
/var/log/cloudera-scm-agent/cloudera-scm-agent.log
/var/lib/cloudera-scm-agent
/etc/default/cloudera-scm-agent
/etc/systemd/system/multi-user.target.wants/cloudera-scm-agent.service
/etc/cloudera-scm-agent
find: ‘/proc/31302’: No such file or directory
/usr/lib/systemd/system/cloudera-scm-agent.service
/run/cloudera-scm-agent

# 重启代理
sudo systemctl restart cloudera-scm-agent
sudo systemctl stop cloudera-scm-agent
netstat -nlpt | grep 9000

# 查看cdh的server的服务
sudo tail -F /var/log/cloudera-scm-server/cloudera-scm-server.log
sudo systemctl restart cloudera-scm-server
sudo systemctl status cloudera-scm-server

# 查看代理的host是否配置正确
vim /etc/cloudera-scm-agent/config.ini
Failed to validate the identity of Cloudera Manager
vim /etc/default/cloudera-scm-server
-Dcom.cloudera.server.cmf.components.scmActive.killOnError=false
export CMF_JAVA_OPTS="-Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dcom.cloudera.server.cmf.components.scmActive.killOnError=false”
No route to host
 24/07/15 09:50:13 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1383231278_309492709
2024-07-15 09:50:13.452  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 : java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.17.81:9866
2024-07-15 09:50:13.452  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
2024-07-15 09:50:13.452  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
2024-07-15 09:50:13.452  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1778)
2024-07-15 09:50:13.452  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
2024-07-15 09:50:13.452  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2024-07-15 09:50:13.453  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 : 24/07/15 09:50:13 WARN hdfs.DataStreamer: Abandoning BP-1724925173-192.168.21.71-1635211203085:blk_1383231278_309492709
2024-07-15 09:50:13.457  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 : 24/07/15 09:50:13 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.17.81:9866,DS-39d62b65-c0e6-4410-b67b-25e4591d5965,DISK]
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 : 24/07/15 09:50:13 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1383231279_309492710
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 : java.net.NoRouteToHostException: No route to host
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1725)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 :   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2024-07-15 09:50:13.468  INFO 100171 --- [auncher-proc-14] a.test2_c7fb1d1ad0f041e5a3b9c1a7a4a2adb1 : 24/07/15 09:50:13 WARN hdfs.DataStreamer: Abandoning BP-1724925173-192.168.21.71-1635211203085:blk_1383231279_309492710
解决方案
java.net.NoRouteToHostException: No route to host
防火墙没关
重启服务器host不生效
无法初始化表

以下日志为正常日志

JAVA_HOME=/usr/lib/jvm/java-openjdk
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing:  /usr/lib/jvm/java-openjdk/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/data/cdh/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
log4j:ERROR Could not find value for key log4j.appender.A
log4j:ERROR Could not instantiate appender named "A".
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
[2024-08-07 17:40:25,557] INFO     0[main] - com.cloudera.enterprise.dbutil.DbCommandExecutor.testDbConnection(DbCommandExecutor.java) - Successfully connected to database.
All done, your SCM database is configured correctly!

cdh 问题合集
https://ispong.isxcode.com/hadoop/cloudera/cdh 问题合集/
Author
ispong
Posted on
September 28, 2021
Licensed under