93 lines
3.5 KiB
Plaintext
93 lines
3.5 KiB
Plaintext
[/
|
|
Copyright 2018 Nick Thompson
|
|
Copyright 2021 Matt Borland
|
|
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|
|
|
|
[section:bivariate_statistics Bivariate Statistics]
|
|
|
|
[heading Synopsis]
|
|
|
|
``
|
|
#include <boost/math/statistics/bivariate_statistics.hpp>
|
|
|
|
namespace boost{ namespace math{ namespace statistics {
|
|
|
|
template<typename ExecutionPolicy, typename Container>
|
|
auto covariance(ExecutionPolicy&& exec, Container const & u, Container const & v);
|
|
|
|
template<typename Container>
|
|
auto covariance(Container const & u, Container const & v);
|
|
|
|
template<typename ExecutionPolicy, typename Container>
|
|
auto means_and_covariance(ExecutionPolicy&& exec, Container const & u, Container const & v);
|
|
|
|
template<typename Container>
|
|
auto means_and_covariance(Container const & u, Container const & v);
|
|
|
|
template<typename ExecutionPolicy, typename Container>
|
|
auto correlation_coefficient(ExecutionPolicy&& exec, Container const & u, Container const & v);
|
|
|
|
template<typename Container>
|
|
auto correlation_coefficient(Container const & u, Container const & v);
|
|
|
|
}}}
|
|
``
|
|
|
|
[heading Description]
|
|
|
|
This file provides functions for computing bivariate statistics.
|
|
The functions are C++11 compatible, but require C++17 to use execution policies.
|
|
If an execution policy is not passed to the function the default is std::execution::seq.
|
|
|
|
[heading Covariance]
|
|
|
|
Computes the population covariance of two datasets:
|
|
|
|
std::vector<double> u{1,2,3,4,5};
|
|
std::vector<double> v{1,2,3,4,5};
|
|
double cov_uv = boost::math::statistics::covariance(u, v);
|
|
|
|
The implementation follows [@https://doi.org/10.1109/CLUSTR.2009.5289161 Bennet et al].
|
|
The parallel implementation follows [@https://dl.acm.org/doi/10.1145/3221269.3223036 Schubert et al].
|
|
The data is not modified.
|
|
Works with real-valued inputs and does not work with complex-valued inputs.
|
|
|
|
/Nota bene:/ If the input is an integer type the output will be a double precision type.
|
|
|
|
The algorithm used herein simultaneously generates the mean values of the input data /u/ and /v/.
|
|
For certain applications, it might be useful to get them in a single pass through the data.
|
|
As such, we provide `means_and_covariance`:
|
|
|
|
std::vector<double> u{1,2,3,4,5};
|
|
std::vector<double> v{1,2,3,4,5};
|
|
auto [mu_u, mu_v, cov_uv] = boost::math::statistics::means_and_covariance(u, v);
|
|
|
|
[heading Correlation Coefficient]
|
|
|
|
Computes the [@https://en.wikipedia.org/wiki/Pearson_correlation_coefficient Pearson correlation coefficient] of two datasets /u/ and /v/:
|
|
|
|
std::vector<double> u{1,2,3,4,5};
|
|
std::vector<double> v{1,2,3,4,5};
|
|
double rho_uv = boost::math::statistics::correlation_coefficient(u, v);
|
|
// rho_uv = 1.
|
|
|
|
Works with real-valued inputs and does not work with complex-valued inputs.
|
|
|
|
/Nota bene:/ If the input is an integer type the output will be a double precision type.
|
|
|
|
If one or both of the datasets is constant, the correlation coefficient is an indeterminant form (0/0).
|
|
In this case the returned value is a `quiet_NaN()`.
|
|
|
|
|
|
[heading References]
|
|
|
|
* Bennett, Janine, et al. ['Numerically stable, single-pass, parallel statistics algorithms.] Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009.
|
|
* Schubert, Erich; Gertz, Michael ['Numerically stable parallel computation of (co-)variance'] Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018.
|
|
|
|
[endsect]
|
|
[/section:bivariate_statistics Bivariate Statistics]
|