Aspects of Latency Optimization for Hash-based Digital Signatures
Seiten
- Keine Verlagsinformationen verfügbar
- Artikel merken
The Tactile Internet will enable interactive real-time applications in industry and society. Proposed applications require, first, an end-to-end latency of about one millisecond, and second, stringent security. Data authentication is an important pillar for this security objective. Digital signatures provide data authentication, but with quantum computers looming at the horizon, most currently employed signature schemes will become unsecure. Hash-based digital signatures have been proposed as post-quantum secure alternative. These schemes are computationally expensive, leading to processing times that contradict the low-latency requirements of the Tactile Internet. This thesis develops a framework to understand, assess, and minimize the end-to-end latency of selected hash-based signatures.
We argue for the use of the eXtended Merkle Signature Scheme (XMSS), a postquantum secure algorithm with well-established security margins. For different architectural classes, we establish a performance baseline and assess the gap between our targeted latency budget of 100μs for data authentication and state-of-the-art XMSS implementations. High-end desktop processors utilizing vector extensions are above the target by a factor of 33. A hardware accelerator for embedded applications employing a hardware/software co-design is above target by a factor of 270. Pure software implementations on embedded processors miss by three orders of magnitude.
A thorough bottleneck analysis using our enhanced call graph method identifies computational intensive parts of the algorithm, and allows us to focus our later optimizations on the critical processing steps: the authentication path computation, the production of hashes and hash chain nodes, and the processing of hash chains. A comprehensive data dependency analysis studies which XMSS operations can be processed independently. We find that the authentication path computation can be taken out of the critical latency path altogether. By exploiting this, our developed EarlyAuth scheme improves the latency significantly and achieves a speedup of about 4.23 as compared to the XMSS reference implementation.
As software optimizations are insufficient to achieve the required performance, we design an application-specific integrated processor based on the Tensilica LX6 architecture. Using a hardware/software-codesign approach, we develop a generic accelerator for the underlying hash function which constitutes the main bottleneck of the XMSS signing and verification procedures. Although a speedup of 183 over the reference implementation could be achieved, with 440μs for signing alone, this generic architecture fails to meet our latency objective. Our instruction histogram analysis reveals that memory accesses require 51 percent of all cycles of the hash chain node computation. To mitigate these memory loads, we refine the architecture to take the specifics of XMSS into account. This improved design employs a result shift mechanism, a hardware padding generator, and tailored buffers designed to minimize memory operations. Although the cycles for memory loads are reduced to 7 percent, the performance is still insufficient. Further analysis reveals that the hash permutation function emerges as the new bottleneck and requires over 79 percent of the chain node computation cycles. We alleviate this by introducing a hardware unrolling scheme and optimize the trade-offs with respect to area-time and area-time-energy products. Our fastest architecture achieves a minimal total processing time for signing and verification of 265μs. Together with the EarlyAuth scheme, latencies below 100μs become viable.
Finally, we develop an end-to-and latency model that not only takes the signing and verification times into account, but also the data transmission times of message and signature. With this model, we analyze achievable end-to-end latencies and perform XMSS parameter optimizations. In addition, we derive a lower bound for the achievable XMSS end-to-end latency as a tool for design space exploration. Our derived latency efficiency metric relates the latency achieved by an actual design to the lower bound. This allows for comparison of different XMSS implementations.
We present a comprehensive set of analysis tools, implementation methods, and models, that aim to optimize the end-to-end latency of data authentication from an application perspective. Only the combination of all developed software and hardware optimizations makes it possible to achieve our latency objective. We improve the state of the art by up to two orders of magnitude and widen the envelope of viable secure applications for the Tactile Internet.
We argue for the use of the eXtended Merkle Signature Scheme (XMSS), a postquantum secure algorithm with well-established security margins. For different architectural classes, we establish a performance baseline and assess the gap between our targeted latency budget of 100μs for data authentication and state-of-the-art XMSS implementations. High-end desktop processors utilizing vector extensions are above the target by a factor of 33. A hardware accelerator for embedded applications employing a hardware/software co-design is above target by a factor of 270. Pure software implementations on embedded processors miss by three orders of magnitude.
A thorough bottleneck analysis using our enhanced call graph method identifies computational intensive parts of the algorithm, and allows us to focus our later optimizations on the critical processing steps: the authentication path computation, the production of hashes and hash chain nodes, and the processing of hash chains. A comprehensive data dependency analysis studies which XMSS operations can be processed independently. We find that the authentication path computation can be taken out of the critical latency path altogether. By exploiting this, our developed EarlyAuth scheme improves the latency significantly and achieves a speedup of about 4.23 as compared to the XMSS reference implementation.
As software optimizations are insufficient to achieve the required performance, we design an application-specific integrated processor based on the Tensilica LX6 architecture. Using a hardware/software-codesign approach, we develop a generic accelerator for the underlying hash function which constitutes the main bottleneck of the XMSS signing and verification procedures. Although a speedup of 183 over the reference implementation could be achieved, with 440μs for signing alone, this generic architecture fails to meet our latency objective. Our instruction histogram analysis reveals that memory accesses require 51 percent of all cycles of the hash chain node computation. To mitigate these memory loads, we refine the architecture to take the specifics of XMSS into account. This improved design employs a result shift mechanism, a hardware padding generator, and tailored buffers designed to minimize memory operations. Although the cycles for memory loads are reduced to 7 percent, the performance is still insufficient. Further analysis reveals that the hash permutation function emerges as the new bottleneck and requires over 79 percent of the chain node computation cycles. We alleviate this by introducing a hardware unrolling scheme and optimize the trade-offs with respect to area-time and area-time-energy products. Our fastest architecture achieves a minimal total processing time for signing and verification of 265μs. Together with the EarlyAuth scheme, latencies below 100μs become viable.
Finally, we develop an end-to-and latency model that not only takes the signing and verification times into account, but also the data transmission times of message and signature. With this model, we analyze achievable end-to-end latencies and perform XMSS parameter optimizations. In addition, we derive a lower bound for the achievable XMSS end-to-end latency as a tool for design space exploration. Our derived latency efficiency metric relates the latency achieved by an actual design to the lower bound. This allows for comparison of different XMSS implementations.
We present a comprehensive set of analysis tools, implementation methods, and models, that aim to optimize the end-to-end latency of data authentication from an application perspective. Only the combination of all developed software and hardware optimizations makes it possible to achieve our latency objective. We improve the state of the art by up to two orders of magnitude and widen the envelope of viable secure applications for the Tactile Internet.
| Erscheinungsdatum | 25.06.2021 |
|---|---|
| Sprache | englisch |
| Maße | 148 x 210 mm |
| Einbandart | Paperback |
| Themenwelt | Technik ► Elektrotechnik / Energietechnik |
| Schlagworte | 5G • hash • XMSS |
| ISBN-10 | 3-95947-049-5 / 3959470495 |
| ISBN-13 | 978-3-95947-049-0 / 9783959470490 |
| Zustand | Neuware |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Mehr entdecken
aus dem Bereich
aus dem Bereich
Grundlagen, Systemtechnik und Analysen ausgeführter Beispiele …
Buch | Softcover (2025)
Springer Vieweg (Verlag)
CHF 55,95
Wegweiser für Elektrofachkräfte
Buch | Hardcover (2024)
VDE VERLAG
CHF 67,20