2025-01-12 20:40:48 +08:00

69 lines
2.8 KiB
Plaintext

[/
Copyright Nick Thompson 2020
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]
[section:rsqrt Reciprocal square root]
[h4 Synopsis]
#include <boost/math/special_functions/rsqrt.hpp>
namespace boost::math {
template<class Real>
Real rsqrt(Real const & x);
} // namespaces
The function `rsqrt` computes the reciprocal square root 1/[sqrt]/x/.
Those in the game programming community might suspect this is a fast, low precision wrapper around the [@https://www.felixcloutier.com/x86/rsqrtss rsqrtss] instruction.
This is not correct: We /tried/ this instruction, but found no performance benefit to using it.
However, the /trick/ of computing a low precision reciprocal square root and then bootstrapping to higher precision via Newton's method /does/ work, but it only yields a performance benefit for quad and higher precision.
We do of course allow you to use `rsqrt` for `float`, `double`, and `long double`, but be aware there is no performance benefit to doing so.
However, the savings for quad precision and higher are very significant.
The use is
using boost::multiprecision::float128;
float128 x = 0.1Q;
float128 y = boost::math::rsqrt(x);
The reciprocal square root of +\u221E is zero, and the reciprocal square root of a NaN is a NaN.
[$../graphs/rsqrt_quad_0_100.svg]
Performance:
```
Running ./reporting/performance/rsqrt_performance.x
Run on (16 X 4300 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 1024 KiB (x8)
L3 Unified 11264 KiB (x1)
Load Average: 0.43, 0.49, 0.46
----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------
Rsqrt<float> 1.35 ns 1.35 ns 503364351
Rsqrt<double> 2.25 ns 2.25 ns 309753242
Rsqrt<long double> 2.68 ns 2.68 ns 261382652
Rsqrt<float128> 182 ns 182 ns 3756956
Rsqrt<number<mpfr_float_backend<100>>> 299 ns 299 ns 2494027
Rsqrt<number<mpfr_float_backend<200>>> 412 ns 412 ns 1589284
Rsqrt<number<mpfr_float_backend<300>>> 617 ns 617 ns 1067473
Rsqrt<number<mpfr_float_backend<400>>> 812 ns 812 ns 830564
Rsqrt<number<mpfr_float_backend<1000>>> 3183 ns 3183 ns 221079
Rsqrt<cpp_bin_float_50> 4321 ns 4321 ns 163243
Rsqrt<cpp_bin_float_100> 9393 ns 9393 ns 72967
```
[endsect]