69 lines
2.8 KiB
Plaintext
69 lines
2.8 KiB
Plaintext
[/
|
|
Copyright Nick Thompson 2020
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|
|
|
|
[section:rsqrt Reciprocal square root]
|
|
|
|
[h4 Synopsis]
|
|
|
|
#include <boost/math/special_functions/rsqrt.hpp>
|
|
|
|
namespace boost::math {
|
|
|
|
template<class Real>
|
|
Real rsqrt(Real const & x);
|
|
|
|
} // namespaces
|
|
|
|
|
|
The function `rsqrt` computes the reciprocal square root 1/[sqrt]/x/.
|
|
Those in the game programming community might suspect this is a fast, low precision wrapper around the [@https://www.felixcloutier.com/x86/rsqrtss rsqrtss] instruction.
|
|
This is not correct: We /tried/ this instruction, but found no performance benefit to using it.
|
|
However, the /trick/ of computing a low precision reciprocal square root and then bootstrapping to higher precision via Newton's method /does/ work, but it only yields a performance benefit for quad and higher precision.
|
|
We do of course allow you to use `rsqrt` for `float`, `double`, and `long double`, but be aware there is no performance benefit to doing so.
|
|
However, the savings for quad precision and higher are very significant.
|
|
|
|
|
|
The use is
|
|
|
|
using boost::multiprecision::float128;
|
|
float128 x = 0.1Q;
|
|
float128 y = boost::math::rsqrt(x);
|
|
|
|
The reciprocal square root of +\u221E is zero, and the reciprocal square root of a NaN is a NaN.
|
|
|
|
[$../graphs/rsqrt_quad_0_100.svg]
|
|
|
|
|
|
Performance:
|
|
|
|
```
|
|
Running ./reporting/performance/rsqrt_performance.x
|
|
Run on (16 X 4300 MHz CPU s)
|
|
CPU Caches:
|
|
L1 Data 32 KiB (x8)
|
|
L1 Instruction 32 KiB (x8)
|
|
L2 Unified 1024 KiB (x8)
|
|
L3 Unified 11264 KiB (x1)
|
|
Load Average: 0.43, 0.49, 0.46
|
|
----------------------------------------------------------------------------------
|
|
Benchmark Time CPU Iterations
|
|
----------------------------------------------------------------------------------
|
|
Rsqrt<float> 1.35 ns 1.35 ns 503364351
|
|
Rsqrt<double> 2.25 ns 2.25 ns 309753242
|
|
Rsqrt<long double> 2.68 ns 2.68 ns 261382652
|
|
Rsqrt<float128> 182 ns 182 ns 3756956
|
|
Rsqrt<number<mpfr_float_backend<100>>> 299 ns 299 ns 2494027
|
|
Rsqrt<number<mpfr_float_backend<200>>> 412 ns 412 ns 1589284
|
|
Rsqrt<number<mpfr_float_backend<300>>> 617 ns 617 ns 1067473
|
|
Rsqrt<number<mpfr_float_backend<400>>> 812 ns 812 ns 830564
|
|
Rsqrt<number<mpfr_float_backend<1000>>> 3183 ns 3183 ns 221079
|
|
Rsqrt<cpp_bin_float_50> 4321 ns 4321 ns 163243
|
|
Rsqrt<cpp_bin_float_100> 9393 ns 9393 ns 72967
|
|
```
|
|
|
|
[endsect]
|