Skip to content

njmarko/tbb-matrix-multiplication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 

Repository files navigation

tbb-matrix-multiplication

Parallel matrix multiplication using Intel TBB library. This approached achieved the fastest running time in the Parallel programming class in 2020.

Solution

  • Intel C++ compiler v18
  • Second matrix was transposed
    • Enables vectorization
    • Better cache efficiency
  • AVX 256 instructions
    • Doubles the speed of vectorized functions compared to the VS compiler
    • Uses 256bit registers
  • std::inner_product
    • Standard library function that is already optimized and vectorizes pretty well
  • Tbb tasks achieved the best time
    • Tree like hierarchy was created with tasks

Hardware Specification

  • Windows 10
  • Intel i7 9750h processor (6 cores, 12 logical) @2.56GHz (4.5GHz boost)
  • Nvidia GTX1660TI
  • 16GB ddr4 ram (dual channel)
  • 512 GB M2-SSD

Results

tabelar-results

Ilustration 1 - Achieved results.

execution-time-visualization

Ilustration 2 - Execution time.

speedup-compared-to-serial-baseline

Ilustration 3 - Speedup compared to serial baseline.