当前位置:网站首页>HashShuffleManager

HashShuffleManager

2022-01-15 02:23:54 manba_ yqq

1. Common mechanisms

 Insert picture description here

  • Execute the process
    1. every last map task Write different results into different buffer in , Every buffer The size is 32K.buffer Play the role of data cache .
    2. Every buffer The last file corresponds to a small disk file .
    3.reduce task To pull the corresponding disk file .
  • summary
    1.map task The results of the calculation are based on the partition ( The default is hashPartitioner) To decide which disk file to write to . ReduceTask Will go to Map End pull the corresponding disk file .
    2. The number of small disk files produced : M(map task The number of )*R(reduce task The number of )
  • The problem is , There are too many small files on the disk , Will cause the following problems :
    1. stay Shuffle Write In the process, there will be many objects that write small files on the disk .
    2. stay Shuffle Read There will be many objects to read small files on the disk
    3. stay JVM Too many objects in heap memory will cause frequent gc,gc It can't be solved yet Memory needed to run Words , will OOM.
    4. There will be frequent network communication in the process of data transmission , Frequent network communication Now the possibility of communication failure is greatly increased , Once the network communication fails, it will Lead to shuffle file cannot find Because of this mistake task loss Defeat ,TaskScheduler Not responsible for retrying , from DAGScheduler be responsible for retry Stage.

2. Merger mechanism

 Insert picture description here

  • summary
    The number of small files produced on the disk :C(core The number of )*R(reduce The number of )

版权声明
本文为[manba_ yqq]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/12/202112122242278947.html

随机推荐