Skip to content

Latest commit

 

History

History
42 lines (29 loc) · 3 KB

File metadata and controls

42 lines (29 loc) · 3 KB

SPARK PARALLELIZATION GENERAL CONCEPTS

SPARK JOINS

Concepts

  • From my understanding it seems that both sort merge join and shuffle hash join 1. partition the data but in 2. step the shuffle join creates an hash table (For 1 partition) where shuffle join relies on sorted lists and merging the data. Mostly sorts merge join is preferred

References