hadoop yarn调度压测

Last updated on July 19, 2024 pm

🧙 Questions

压测yarn调度,使用所有的yarn资源

☄️ Ideas

Capacity Scheduler

容量调度器
为不同的用户,不同队列分配独立的资源容量

基础yarn配置

yarn.scheduler.minimum-allocation-mb      1GB
yarn.scheduler.minimum-allocation-vcores  1
yarn.scheduler.maximum-allocation-mb      4GB
yarn.scheduler.maximum-allocation-vcores  4
yarn.nodemanager.resource.cpu-vcores      16
yarn.nodemanager.resource.memory-mb       16GB
vim capacity-scheduler.xml

设置调度模式

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

创建队列

<!-- root下创建 default,queue1 队列 -->
<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>default,queue1</value>
</property>

<!-- 一定要配置队列[root/default/queue1]容量大小,否则无法启动yarn -->
<!-- default + queue1 = 100 否则无法启动yarn -->
<property>
  <name>yarn.scheduler.capacity.root.capacity</name>
  <value>100</value>
</property>

<!-- Configured Capacity: 90% -->
<!-- Absolute Configured Capacity: 90% -->
<!-- Max Applications: 9000 -->
<!-- Max Applications Per User: 9000 -->
<property>
  <name>yarn.scheduler.capacity.root.default.capacity</name>
  <value>90</value>
</property>

<property>
  <name>yarn.scheduler.capacity.root.queue1.capacity</name>
  <value>10</value>
</property>

20240515172212

此时队列只能一个一个运行作业,因为Max AM Resource最大资源只有2048,只够一个容器运行

配置队列最大资源比例
<!-- 队列最多可以申请的资源 -->
<!-- yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent -->
<!-- 默认0.1 -->
<!-- Configured Max Application Master Limit : 10.0 -->
<!-- Max Application Master Resources:           <memory:2048, vCores:1> -->
<!-- Max Application Master Resources Per User:  <memory:2048, vCores:1> -->

<!-- 使用全部资源: 1-->
<!-- Configured Max Application Master Limit : 100.0 -->
<!-- Max Application Master Resources:           <memory:16384, vCores:1> -->
<!-- Max Application Master Resources Per User:  <memory:16384, vCores:1> -->
<!-- 但是用户最高能使用 <memory:15360, vCores:1> -->
<property>
  <name>yarn.scheduler.capacity.root.default.maximum-am-resource-percent</name>
  <value>1</value>
</property>

20240515173040

每个容器2GB,最多跑了7个容器正常, 并发的数量太高了

设置队列中并发数

不生效

<!-- 队列最大并行app -->
<!-- yarn.scheduler.capacity.<queue-path>.max-parallel-apps -->
<!-- 对所有队列生效 -->
<property>
  <name>yarn.scheduler.capacity.max-parallel-apps</name>
  <value>2</value>
</property>

<!-- 用户最大并行app -->
<!-- yarn.scheduler.capacity.user.<username>.max-parallel-apps -->
<property>
  <name>yarn.scheduler.capacity.user.ispong.max-parallel-apps</name>
  <value>2</value>
</property>
控制提交数量
<!-- 最大应用数 -->
<!-- yarn.scheduler.capacity.<queue-path>.maximum-applications -->
<!-- Max Applications           4
     Max Applications Per User  4                              -->
<!-- 超过数量的应用会提交失败 -->
<property>
  <name>yarn.scheduler.capacity.root.default.maximum-applications</name>
  <value>10</value>
</property>
这是用户在队列中申请上限
<!-- Configured User Limit Factor 用户可以使用的资源上限-->
<!-- Max Applications Per User 应用个数也会受到影响 -->
<!-- yarn.scheduler.capacity.<queue-path>.user-limit-factor -->
<!-- 每个用户的限制,默认是1 -->
<!-- -1: 表示关闭限制 -->
<property>
  <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
  <value>0.5</value>
</property>
配置队列最大资源
<!-- 队列最大资源占用 -->
<!-- 默认值100% -->
<!-- yarn.scheduler.capacity.<queue-path>.maximum-capacity -->
<!-- Configured Max Capacity: 99% -->
<!-- Absolute Configured Max Capacity: 99% -->
<!-- 要大于yarn.scheduler.capacity.root.default.capacity -->
<property>
  <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
  <value>99</value>
</property>
配置最大内存
<!-- 最大获取的内存 -->
<!-- yarn.scheduler.capacity.<queue-path>.maximum-allocation-mb -->
<!-- 覆盖 yarn.scheduler.maximum-allocation-mb,小于集群最大值 -->
<property>
  <name>yarn.scheduler.capacity.root.default.maximum-allocation-mb</name>
  <value>4096</value>
</property>

<!-- 最大获取的核心 -->
<!-- 覆盖 yarn.scheduler.maximum-allocation-vcores,小于集群最大值 -->
<!-- yarn.scheduler.capacity.<queue-path>.maximum-allocation-vcores -->
<property>
  <name>yarn.scheduler.capacity.root.default.maximum-allocation-vcores</name>
  <value>2</value>
</property>

Fair Scheduler

cdh中默认是Fair模式
公平调度,作业的优先级和资源需求进行公平的调度

Fifo Scheduler

新版都找不到说明了
先进先出 (first in first out)
所有作业按照提交的顺序依次进行调度


hadoop yarn调度压测
https://ispong.isxcode.com/hadoop/hadoop/hadoop yarn调度压测/
Author
ispong
Posted on
May 15, 2024
Licensed under