Matrix Multiplication using Strassen’s Algorithm on CPU & GPU

Authors

  • Ray U Department of Information Technology, Institute of Engineering & Management, Kolkata, India
  • Hazra Tk Departmentof Information Technology, Institute of Engineering & Management, Kolkata, India
  • Ray UK Department of Information Technology, Jadavpur University, Kolkata, India

Keywords:

GPU, CUDA, Matrix Multiplication, Strassen’s Algorithm, Cache, Speedup

Abstract

In this paper we have successfully implemented Matrix Multiplication using Strassen's Algorithm on a NVIDIA GPU using CUDA. We have used the multiple cores of the GPU to reduce the computation time drastically. We have also compared the time taken by matrix multiplication using Strassen's algorithm on both CPU and GPU. We have found that the GPU implementation was much faster, but only when the recursion was performed till a certain limit. Beyond that limit, the computation took much more time than expected. Also, we found that implementing Matrix Multiplication using Strassen's algorithm on the CPU yielded some very positive results. By conducting experiments, we came to the conclusion that the recursion limit can be comparatively smaller for matrix multiplication using Strassen's algorithm on CPU than for matrix multiplication using Strassen's algorithm on GPU.

References

Francois Le Gall, “Powers of Tensors and Fast Matrix Multiplication,” Cornell University Library, arXiv:1401.7714 [cs.DS], 2014.

Wikipedia, Strassen Algorithm,

https://en.wikipedia.org/wiki/Strassen_algorithm.

Junjie Li, Sanjay Ranka, Sartaj Sahni, “Strassen’s Matrix Multiplication on GPUs,” 2011 IEEE 17th International Conference on Parallel and Distributed Systems(ICPADS), pp. 157-164, 2011.

C. P. Patidar and Meena Sharma, "Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics", ISROSET-International Journal of Scientific Research in Computer Science and Engineering, Volume-01, Issue-04, Page No (1-6), Aug 2013

Pujianto Yugopuspito, Sutrisno, Robertus Hudi, “Breaking through memory limitation in GPU parallel processing using Strassen Algorithm,” 2013 International Conference on Computer, Control, Informatics and Its Applications(IC3INA), pp. 201-205, 2013.

Ayaz ul Hasan Khan, Mayez Al-Mouhamed, Allam Fatayer, “Optimizing strassen matrix multiply on GPUs”, 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1-6, 2015.

V. Strassen, “Gaussian elimination is not optimal,” Numerische Mathematik, Vol. 13, No. 4, pp. 354-356, August 1969.

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Livest, Clifford Stein, Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. Chapter 28: Section 28.2: Strassen’s algorithm for matrix multiplication, pp. 735-741.

CUDA C Programming Guide,

https://docs.nvidia.com/cuda/cuda-c-programming-guide

John Nickolls, “GPU parallel computing architecture and CUDA programming model,” 2007 IEEE Hot Chips 19 Symposium (HCS), pp. 1-12, 2007.

Fazlul Kader Murshed Nawaz, Arnab Chattopadhyay, Kirthan G J, Girish D Mane, Rohith N Savanth, “Comparison of Open MP and CUDA”, International Journal of Computer Science and Engineering E-ISSN: 2347-2693, Vol.2, Issue-12, pp.38-41, 2014.

Downloads

Published

2025-11-11

How to Cite

[1]
U. Ray, T. K. Hazra, and U. K. Ray, “Matrix Multiplication using Strassen’s Algorithm on CPU & GPU”, Int. J. Comp. Sci. Eng., vol. 4, no. 10, pp. 98–105, Nov. 2025.

Issue

Section

Research Article