spark 资源动态分配
Last updated on January 17, 2025 am
🧙 Questions
实现spark资源的动态分配
☄️ Ideas
spark.dynamicAllocation.enabled
默认 false
根据资源动态申请执行器spark.dynamicAllocation.executorIdleTimeout
默认 60s
超时执行器将被删除spark.dynamicAllocation.cachedExecutorIdleTimeout
默认 infinity (无穷)
缓存执行起超时spark.dynamicAllocation.initialExecutors
默认 spark.dynamicAllocation.minExecutors
如果spark.executor.instances大于这个值,使用spark.dynamicAllocation.minExecutors的值spark.dynamicAllocation.maxExecutors
默认 infinity
最大动态申请执行器spark.dynamicAllocation.minExecutors
默认 0
最小执行器spark.dynamicAllocation.executorAllocationRatio
默认 1
默认情况下,动态分配将根据要处理的任务数量请求足够的执行器以最大化并行度。 虽然这可以最大限度地减少作业的延迟,但对于小任务,此设置可能会由于执行程序分配开销而浪费大量资源,因为某些执行程序甚至可能不执行任何工作。 此设置允许设置一个比率,用于减少执行器的数量。 完全并行。 默认为 1.0 以提供最大并行度。 0.5 会将目标执行器数量除以 2 由dynamicAllocation 计算出的目标执行器数量仍然可以被spark.dynamicAllocation.minExecutors 和spark.dynamicAllocation.maxExecutors 设置覆盖spark.dynamicAllocation.schedulerBacklogTimeout/spark.dynamicAllocation.sustainedSchedulerBacklogTimeou
默认 1s
超时重试等待时间spark.dynamicAllocation.shuffleTracking.enabled
默认 true
启动文件跟踪spark.dynamicAllocation.shuffleTracking.timeout
默认 infinity
配置测试
开启动态分配
确实会自动添加executor,不过一开始的资源比较少,加载执行器比较慢
{
"spark.driver.cores": "1",
"spark.driver.memory": "1g",
"spark.executor.cores": "1",
"spark.executor.memory": "2g",
"spark.memory.fraction": "0.9",
"spark.executor.instances": "1",
"hive.metastore.uris": "thrift://127.0.0.1:30123",
"spark.cores.max": "1",
"spark.driver.extraJavaOptions": "-Dfile.encoding=utf-8",
"spark.executor.extraJavaOptions": "-Dfile.encoding=utf-8",
"spark.sql.legacy.timeParserPolicy": "LEGACY",
"spark.sql.storeAssignmentPolicy": "LEGACY",
"spark.dynamicAllocation.enabled": true
}
maxExecutors 只能控制executor
initialExecutors : 基于spark.executor.instances + initialExecutors是容器数量 1+0 = 1
maxExecutors: 基于spark.executor.instances + maxExecutors是最大数量 1+3 = 4
{
"spark.driver.cores": "1",
"spark.driver.memory": "1g",
"spark.executor.cores": "1",
"spark.executor.memory": "2g",
"spark.memory.fraction": "0.9",
"spark.executor.instances": "1",
"hive.metastore.uris": "thrift://127.0.0.1:30123",
"spark.cores.max": "1",
"spark.driver.extraJavaOptions": "-Dfile.encoding=utf-8",
"spark.executor.extraJavaOptions": "-Dfile.encoding=utf-8",
"spark.sql.legacy.timeParserPolicy": "LEGACY",
"spark.sql.storeAssignmentPolicy": "LEGACY",
"spark.dynamicAllocation.enabled": true,
"spark.dynamicAllocation.initialExecutors": "0",
"spark.dynamicAllocation.minExecutors": "0"
}