spark 资源动态分配

Last updated on July 18, 2025 am

🧙 Questions

实现spark资源的动态分配

☄️ Ideas

spark.dynamicAllocation.enabled

默认 false
根据资源动态申请执行器
spark.dynamicAllocation.executorIdleTimeout

默认 60s
超时执行器将被删除
spark.dynamicAllocation.cachedExecutorIdleTimeout

默认 infinity (无穷)
缓存执行起超时
spark.dynamicAllocation.initialExecutors

默认 spark.dynamicAllocation.minExecutors
如果spark.executor.instances大于这个值,使用spark.dynamicAllocation.minExecutors的值
spark.dynamicAllocation.maxExecutors

默认 infinity
最大动态申请执行器
spark.dynamicAllocation.minExecutors

默认 0
最小执行器
spark.dynamicAllocation.executorAllocationRatio

默认 1
默认情况下，动态分配将根据要处理的任务数量请求足够的执行器以最大化并行度。虽然这可以最大限度地减少作业的延迟，但对于小任务，此设置可能会由于执行程序分配开销而浪费大量资源，因为某些执行程序甚至可能不执行任何工作。此设置允许设置一个比率，用于减少执行器的数量。完全并行。默认为 1.0 以提供最大并行度。 0.5 会将目标执行器数量除以 2 由dynamicAllocation 计算出的目标执行器数量仍然可以被spark.dynamicAllocation.minExecutors 和spark.dynamicAllocation.maxExecutors 设置覆盖
spark.dynamicAllocation.schedulerBacklogTimeout/spark.dynamicAllocation.sustainedSchedulerBacklogTimeou

默认 1s
超时重试等待时间
spark.dynamicAllocation.shuffleTracking.enabled

默认 true
启动文件跟踪
spark.dynamicAllocation.shuffleTracking.timeout

默认 infinity

配置测试

开启动态分配
确实会自动添加executor,不过一开始的资源比较少,加载执行器比较慢

{
  "spark.driver.cores": "1",
  "spark.driver.memory": "1g",
  "spark.executor.cores": "1",
  "spark.executor.memory": "2g",
  "spark.memory.fraction": "0.9",
  "spark.executor.instances": "1",
  "hive.metastore.uris": "thrift://127.0.0.1:30123",
  "spark.cores.max": "1",
  "spark.driver.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.executor.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.sql.legacy.timeParserPolicy": "LEGACY",
  "spark.sql.storeAssignmentPolicy": "LEGACY",
  "spark.dynamicAllocation.enabled": true
}

maxExecutors 只能控制executor
initialExecutors : 基于spark.executor.instances + initialExecutors是容器数量 1+0 = 1
maxExecutors: 基于spark.executor.instances + maxExecutors是最大数量 1+3 = 4

{
  "spark.driver.cores": "1",
  "spark.driver.memory": "1g",
  "spark.executor.cores": "1",
  "spark.executor.memory": "2g",
  "spark.memory.fraction": "0.9",
  "spark.executor.instances": "1",
  "hive.metastore.uris": "thrift://127.0.0.1:30123",
  "spark.cores.max": "1",
  "spark.driver.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.executor.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.sql.legacy.timeParserPolicy": "LEGACY",
  "spark.sql.storeAssignmentPolicy": "LEGACY",
  "spark.dynamicAllocation.enabled": true,
  "spark.dynamicAllocation.initialExecutors": "0",
  "spark.dynamicAllocation.minExecutors": "0"
}

🔗 Links

spark

spark 资源动态分配

https://ispong.isxcode.com/hadoop/spark/spark 资源动态分配/

Author

ispong

Posted on

May 16, 2024

Licensed under

mac 7z文件解压 Previous

hadoop yarn调度压测 Next