Paper 2025/249
cuFalcon: An Adaptive Parallel GPU Implementation for High-Performance Falcon Acceleration
Abstract
The rapid advancement of quantum computing has ushered in a new era of post-quantum cryptography, urgently demanding quantum-resistant digital signatures to secure modern communications and transactions. Among NIST-standardized candidates, Falcon—a compact lattice-based signature scheme—stands out for its suitability in size-sensitive applications. In this paper, we present cuFalcon, a high-throughput GPU implementation of Falcon that addresses its computational bottlenecks through adaptive parallel strategies. At the operational level, we optimize Falcon key components for GPU architectures through memory-efficient FFT, adaptive parallel ffSampling, and a compact computation mode. For signature-level optimization, we implement three versions of cuFalcon: the raw key version, the expanded key version, and the balanced version, which achieves a trade-off between efficiency and memory usage. Additionally, we design batch processing, streaming mechanisms, and memory pooling to handle multiple signature tasks efficiently. Ultimately, performance evaluations show significant improvements, with the raw key version achieving 172k signatures per second and the expanded key version reaching 201k. Compared to the raw key version, the balanced version achieves a 7% improvement in throughput, while compared to the expanded key version, it reduces memory usage by 70%. Furthermore, our raw key version implementation outperforms the reference implementation by 36.75 $\times$ and achieves a 2.94$\times$ speedup over the state-of-the-art GPU implementation.
Metadata
- Available format(s)
-
PDF
- Category
- Implementation
- Publication info
- Preprint.
- Keywords
- Post-Quantum CryptographyFalconFast Fourier SamplingGPU acceleration
- Contact author(s)
-
liwq24 @ m fudan edu cn
hywei24 @ m fudan edu cn
crypto @ sher1e dev
crypto @ d4rk dev
daiwch @ mail sysu edu cn
ylzhao @ fudan edu cn - History
- 2025-02-18: approved
- 2025-02-17: received
- See all versions
- Short URL
- https://ia.cr/2025/249
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2025/249, author = {Wenqian Li and Hanyu Wei and Shiyu Shen and Hao Yang and Wangchen Dai and Yunlei Zhao}, title = {{cuFalcon}: An Adaptive Parallel {GPU} Implementation for High-Performance Falcon Acceleration}, howpublished = {Cryptology {ePrint} Archive, Paper 2025/249}, year = {2025}, url = {https://eprint.iacr.org/2025/249} }