Matrix Multiplication using Strassen’s Algorithm on CPU & GPU
Keywords:
GPU, CUDA, Matrix Multiplication, Strassen’s Algorithm, Cache, SpeedupAbstract
In this paper we have successfully implemented Matrix Multiplication using Strassen's Algorithm on a NVIDIA GPU using CUDA. We have used the multiple cores of the GPU to reduce the computation time drastically. We have also compared the time taken by matrix multiplication using Strassen's algorithm on both CPU and GPU. We have found that the GPU implementation was much faster, but only when the recursion was performed till a certain limit. Beyond that limit, the computation took much more time than expected. Also, we found that implementing Matrix Multiplication using Strassen's algorithm on the CPU yielded some very positive results. By conducting experiments, we came to the conclusion that the recursion limit can be comparatively smaller for matrix multiplication using Strassen's algorithm on CPU than for matrix multiplication using Strassen's algorithm on GPU.
References
Francois Le Gall, “Powers of Tensors and Fast Matrix Multiplication,” Cornell University Library, arXiv:1401.7714 [cs.DS], 2014.
Wikipedia, Strassen Algorithm,
https://en.wikipedia.org/wiki/Strassen_algorithm.
Junjie Li, Sanjay Ranka, Sartaj Sahni, “Strassen’s Matrix Multiplication on GPUs,” 2011 IEEE 17th International Conference on Parallel and Distributed Systems(ICPADS), pp. 157-164, 2011.
C. P. Patidar and Meena Sharma, "Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics", ISROSET-International Journal of Scientific Research in Computer Science and Engineering, Volume-01, Issue-04, Page No (1-6), Aug 2013
Pujianto Yugopuspito, Sutrisno, Robertus Hudi, “Breaking through memory limitation in GPU parallel processing using Strassen Algorithm,” 2013 International Conference on Computer, Control, Informatics and Its Applications(IC3INA), pp. 201-205, 2013.
Ayaz ul Hasan Khan, Mayez Al-Mouhamed, Allam Fatayer, “Optimizing strassen matrix multiply on GPUs”, 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1-6, 2015.
V. Strassen, “Gaussian elimination is not optimal,” Numerische Mathematik, Vol. 13, No. 4, pp. 354-356, August 1969.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Livest, Clifford Stein, Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. Chapter 28: Section 28.2: Strassen’s algorithm for matrix multiplication, pp. 735-741.
CUDA C Programming Guide,
https://docs.nvidia.com/cuda/cuda-c-programming-guide
John Nickolls, “GPU parallel computing architecture and CUDA programming model,” 2007 IEEE Hot Chips 19 Symposium (HCS), pp. 1-12, 2007.
Fazlul Kader Murshed Nawaz, Arnab Chattopadhyay, Kirthan G J, Girish D Mane, Rohith N Savanth, “Comparison of Open MP and CUDA”, International Journal of Computer Science and Engineering E-ISSN: 2347-2693, Vol.2, Issue-12, pp.38-41, 2014.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
