Khaled Salah, Mohamed AbdelSalam



Performance Comparison of FPGAs and GPUs: Solving Sparse Matrices Case-Study

pdf PDF



In this paper, performance comparison of FPGAs and GPUs are introduced. Numerical methods to solve sparse matrices are evaluated as the main case-study. The experimental results showed that GPUs show superior performance over FPGAs/HW Emulation in terms of run time for small #equations. For large number of equations “in order of ten millions”, the FPGAs/HW emulation outperforms GPUs as the parallelism rate of the emulation becomes higher in that case.



Numerical Method, FPGA, GPU, Sparse Matrices, Matrices.



[1] SmithGD. Numerical Solution of Partial Differential Equations: Finite Difference Methods. Oxford, UK: Oxford University Press, 1978.

[2] FixG. StrangG, An Analysis of the Finite Element Method. Englewood Cliffs NJ,USA: Prentice-Hall, 1973.

[3] LeVequeR, Finite Volume Methods for Hyperbolic Problems. Cambridge, UK: Cambridge University Press, 2002.

[4] K. Banerjep,”Boundary Element Methods in Engineering”. New York, NY, USA: McGraw-Hill, 1994.

[5] R. F. Carvalho, C. A. P. S. Martins, R. M. S. Batalha, and A. F. P. Camargos, '3D parallel conjugate gradient solver optimized for GPUs’, in Digests of the 2010 14th Biennial IEEE Conference on Electromagnetic Field Computation, 2010, pp. 1–1.

[6] G. Wu, X. Xie, Y. Dou, and M. Wang, 'High-Performance Architecture for the Conjugate Gradient Solver on FPGAs’, IEEE Trans. Circuits Syst. II Express Briefs, vol. 60, no. 11, pp. 791–795, Nov. 2013.

[7] Kendall A. Atkinson, an Introduction to Numerical Analysis (2nd ed.). New York: John Wiley & Sons, 1989.

[8] Mordecai Avriel, Nonlinear Programming: Analysis and Methods. Dover Publishing, 2003.

[9] Gene H. Golub and Charles F Van Loan, "Chapter 10". Matrix computations (3rd ed.). Johns Hopkins University Press, 2011.

[10] Y. Saad, "Iterative methods for sparse linear systems” (2nd ed.).SIA, 2005.

[11] http://www.nvidia.com/object/cuda_home_ new.html

[12] David B. Kirk and Wen-mei W. Hwu, Programming Massively Parallel Processors - A Hands-On Approach.: Morgan Kaufmann, 2012.

[13] K. Salah. "IP Cores Design from Specifications to Production: Modeling, Verification, Optimization, and Protection." IP Cores Design from Specifications to Production. Springer International Publishing, 2016.

[14] Mentor Graphics. Veloce Emulator. [Online]. http://www.mentor.com/products/fv/emulat ion.html.

[15] B.-L. Nie, S. Wong, C. Macon, and J.-M. Jin H.-T. Meng, "GPU accelerated finiteelement computation for electromagnetic analysis," IEEE Antennas Propag. Mag., vol. 56, no. 2, pp. 39-62, Apr. 2014.

[16] Z. Peng and Z. Nie, "Acceleration of the method of moments calculations by using graphics processing units," IEEE Transactions on Antennas and Propagation, pp. 2130-2133, July 2008.

[17] A. Karwowski, and A. Noga T. Topa, "Using GPU with CUDA to accelerate MoM-based electromagnetic simulation of wire-grid models," EEE Antennas and Wireless Propagation Letters, pp. 342-345, april 2011.

[18] A. Esposito, G. Monti, and L. Tarricone D. De Donno, "Parallel efficient method of moments exploiting graphics processing units," Microwave and Optical Technology Letters, Nov. 2010.

[19] B. Livshitz, and V. Lomakin S. Li, "Fast evaluation of Helmholtz potential on graphics processing units (GPUs)," Journal of Computational Physics, Nov. 2010

[20] .E. Lezar and D. B. Davidson, "GPUaccelerated method of moments by example: Monostatic scattering," IEEE Antennas and Propagation Magazine, Dec. 2010.

[21] A. Dziekonski, and M. Mrozowski P. Sypek, "How to render FDTD computations more effective using a graphics accelerator," IEEE Transactions on Magnetics, March 2009.

[22] V. Demir, "A stacking scheme to improve the efficiency of finite-difference timedomain solutions on graphics processing units," Applied Computational Electromagnetics Society Journal, Apr. 2010.

[23] V. Demir and A. Z. Elsherbeni, "Compute unified device architecture (CUDA) based finite-difference time-domain (FDTD) implementation," Applied Computational Electromagnetics Society Journal, Apr. 2010.

[24] Naumov M, "Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS," Technical report and white paper 2011.

[25] N. Bell and M. Garland., "Efficient sparse matrix-vector multiplication on CUDA," NVIDIA Corporation, NVIDIA Technical Report NVR-2008-004 2008.

[26] Jichun Li and Yunqing Huang, Time- Domain Finite Element Methods for Maxwell's Equations in Metamaterials.: Springer Series

Cite this paper

Khaled Salah, Mohamed AbdelSalam. (2017) Performance Comparison of FPGAs and GPUs: Solving Sparse Matrices Case-Study. International Journal of Mathematical and Computational Methods, 2, 161-170


Copyright © 2017 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0