spark 资源动态分配

Last updated on January 17, 2025 am

🧙 Questions

实现spark资源的动态分配

☄️ Ideas

  • spark.dynamicAllocation.enabled

    默认 false
    根据资源动态申请执行器

  • spark.dynamicAllocation.executorIdleTimeout

    默认 60s
    超时执行器将被删除

  • spark.dynamicAllocation.cachedExecutorIdleTimeout

    默认 infinity (无穷)
    缓存执行起超时

  • spark.dynamicAllocation.initialExecutors

    默认 spark.dynamicAllocation.minExecutors
    如果spark.executor.instances大于这个值,使用spark.dynamicAllocation.minExecutors的值

  • spark.dynamicAllocation.maxExecutors

    默认 infinity
    最大动态申请执行器

  • spark.dynamicAllocation.minExecutors

    默认 0
    最小执行器

  • spark.dynamicAllocation.executorAllocationRatio

    默认 1
    默认情况下,动态分配将根据要处理的任务数量请求足够的执行器以最大化并行度。 虽然这可以最大限度地减少作业的延迟,但对于小任务,此设置可能会由于执行程序分配开销而浪费大量资源,因为某些执行程序甚至可能不执行任何工作。 此设置允许设置一个比率,用于减少执行器的数量。 完全并行。 默认为 1.0 以提供最大并行度。 0.5 会将目标执行器数量除以 2 由dynamicAllocation 计算出的目标执行器数量仍然可以被spark.dynamicAllocation.minExecutors 和spark.dynamicAllocation.maxExecutors 设置覆盖

  • spark.dynamicAllocation.schedulerBacklogTimeout/spark.dynamicAllocation.sustainedSchedulerBacklogTimeou

    默认 1s
    超时重试等待时间

  • spark.dynamicAllocation.shuffleTracking.enabled

    默认 true
    启动文件跟踪

  • spark.dynamicAllocation.shuffleTracking.timeout

    默认 infinity

配置测试

开启动态分配
确实会自动添加executor,不过一开始的资源比较少,加载执行器比较慢

{
  "spark.driver.cores": "1",
  "spark.driver.memory": "1g",
  "spark.executor.cores": "1",
  "spark.executor.memory": "2g",
  "spark.memory.fraction": "0.9",
  "spark.executor.instances": "1",
  "hive.metastore.uris": "thrift://127.0.0.1:30123",
  "spark.cores.max": "1",
  "spark.driver.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.executor.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.sql.legacy.timeParserPolicy": "LEGACY",
  "spark.sql.storeAssignmentPolicy": "LEGACY",
  "spark.dynamicAllocation.enabled": true
}

maxExecutors 只能控制executor
initialExecutors : 基于spark.executor.instances + initialExecutors是容器数量 1+0 = 1
maxExecutors: 基于spark.executor.instances + maxExecutors是最大数量 1+3 = 4

{
  "spark.driver.cores": "1",
  "spark.driver.memory": "1g",
  "spark.executor.cores": "1",
  "spark.executor.memory": "2g",
  "spark.memory.fraction": "0.9",
  "spark.executor.instances": "1",
  "hive.metastore.uris": "thrift://127.0.0.1:30123",
  "spark.cores.max": "1",
  "spark.driver.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.executor.extraJavaOptions": "-Dfile.encoding=utf-8",
  "spark.sql.legacy.timeParserPolicy": "LEGACY",
  "spark.sql.storeAssignmentPolicy": "LEGACY",
  "spark.dynamicAllocation.enabled": true,
  "spark.dynamicAllocation.initialExecutors": "0",
  "spark.dynamicAllocation.minExecutors": "0"
}

spark 资源动态分配
https://ispong.isxcode.com/hadoop/spark/spark 资源动态分配/
Author
ispong
Posted on
May 16, 2024
Licensed under