Module statkit.asserts
Assert functions based on, and for, statistical tests.
Functions
def assert_converges_to(true_estimate, samples, verbose=False)-
Expand source code
def assert_converges_to(true_estimate, samples, verbose=False): r"""Assert with 95% probability if sample average converges to `true_estimate`. **Theory** Let $\mu$ denote the true mean and \( \hat{\mu}_M = M^{-1}\sum_{i=1}^M x^{(i)} \) be the (emperical) Monte Carlo mean based on $M$ samples. $$ \text{MSE} = \langle(\hat{\mu}_M - \mu)^2\rangle = \langle(\hat{\mu}_M - \langle \hat{\mu}_M \rangle )^2\rangle + \langle(\langle \hat{\mu}_M \rangle - \mu)^2\rangle \\ = \text{var}[\hat{\mu}_M] + (\langle \hat{\mu}_M \rangle - \mu)^2 $$ When sample size $M\rightarrow \infty$; $\text{var}[\hat{\mu}_M] \rightarrow 0$ with a rate of $1/M$. If $\langle \hat{\mu}_M \rangle = \mu$ (i.e., an unbiased estimate), then $\text{MSE} \rightarrow 0$; otherwise, it remains constant. Uses linear regression to test if the variance indeed goes down with a rate of $1/M$ by fitting the coefficients $w$ and $b$ to the function $\ln \langle(\hat{\mu}_M - \mu)^2\rangle = w\log(M) + b$, and checking if $w = -1$ with 95% probability. Example: Generate normally distributed observations with a mean of \( \mu=0.5 \) and verify that the average of the samples \( \hat{\mu}_M \) does indeed tend to $\hat{\mu}_M \rightarrow 0.5$: ``` true_mean = 0.5 x_samples = np.random.normal(locs=true_mean, size=10_000) assert_converges_to(true_mean, x_samples) ``` Args: true_estimate: The true mean $\mu$ of the samples. samples: The samples $x=[x^{(1)},\dots, x^{(M)}]^\intercal$ to test. verbose: Whether to print the regression message. Raises: AssertionError: If the sample average does not converge to the true mean with a confidence of 95% probability. AssertionError: If the sample size is too small to reliably estimate convergence. """ M_samples = len(samples) # Split training data in non-overlapping chunks. # How: use geometric series: sum (1/2)^n = 1 minimum_sample_size = 8 num_points = floor(log(M_samples / minimum_sample_size) / log(2)) sample_proportion = 0.5 ** (arange(1, num_points + 1)) sample_sizes = (sample_proportion * M_samples).astype(int) slice_indices = cumsum(hstack((0, sample_sizes))) datasets = [ samples[start:stop] for start, stop in zip(slice_indices[:-1], slice_indices[1:]) ] mse = [((mean(d, axis=0) - true_estimate) ** 2) for d in datasets] slope, _, _, _, stderr, _ = _linregress(log(sample_sizes), log(mse)) expected_slope = -1.0 slope_lower = slope - 1.96 * stderr slope_upper = slope + 1.96 * stderr regression_message = f"""Slope between ln MSE and ln N: Expected: {array2string(-ones_like(slope))} Actual: {array2string(slope, precision=2)} (95% confidence interval: {array2string(slope_lower, precision=2)} to {array2string(slope_upper, precision=2)})""" if verbose: print(regression_message) if not np.all((slope_lower < expected_slope) & (expected_slope < slope_upper)): raise AssertionError( f"""Samples do not converge to true estimate μ={array2string(true_estimate, precision=2)}. The mean squared error (MSE) between Monte Carlo estimate from `samples` and μ does not go down with sample size as 1/M. With a confidence of 95% probability, your samples are biased. {regression_message} """ ) elif np.any(slope_upper > 0.0): raise AssertionError( "Sample size too small to reliably estimate convergence. Increase the number of samples." )Assert with 95% probability if sample average converges to
true_estimate.Theory
Let $\mu$ denote the true mean and \hat{\mu}_M = M^{-1}\sum_{i=1}^M x^{(i)} be the (emperical) Monte Carlo mean based on $M$ samples. \text{MSE} = \langle(\hat{\mu}_M - \mu)^2\rangle = \langle(\hat{\mu}_M - \langle \hat{\mu}_M \rangle )^2\rangle + \langle(\langle \hat{\mu}_M \rangle - \mu)^2\rangle \\ = \text{var}[\hat{\mu}_M] + (\langle \hat{\mu}_M \rangle - \mu)^2 When sample size $M\rightarrow \infty$; $\text{var}[\hat{\mu}_M] \rightarrow 0$ with a rate of $1/M$. If $\langle \hat{\mu}_M \rangle = \mu$ (i.e., an unbiased estimate), then $\text{MSE} \rightarrow 0$; otherwise, it remains constant. Uses linear regression to test if the variance indeed goes down with a rate of $1/M$ by fitting the coefficients $w$ and $b$ to the function $\ln \langle(\hat{\mu}_M - \mu)^2\rangle = w\log(M) + b$, and checking if $w = -1$ with 95% probability.
Example
Generate normally distributed observations with a mean of \mu=0.5 and verify that the average of the samples \hat{\mu}_M does indeed tend to $\hat{\mu}_M \rightarrow 0.5$:
true_mean = 0.5 x_samples = np.random.normal(locs=true_mean, size=10_000) assert_converges_to(true_mean, x_samples)Args
true_estimate- The true mean $\mu$ of the samples.
samples- The samples $x=[x^{(1)},\dots, x^{(M)}]^\intercal$ to test.
verbose- Whether to print the regression message.
Raises
AssertionError- If the sample average does not converge to the true mean with a
- confidence of 95% probability.
AssertionError- If the sample size is too small to reliably estimate convergence.