Skip to content

Getting Started

This guide will walk you through the basic usage of effdim.

Installation

Ensure effdim is installed:

pip install effdim

Basic Concepts

EffDim revolves around two main functions:

  • effdim.compute(data, method=...): Calculates a single dimension metric.
  • effdim.analyze(data, methods=[...]): Calculates multiple metrics at once.

Data is typically passed as a N x D numpy array, where \(N\) is the number of samples and \(D\) is the number of features.

Example: Random Noise vs Structured Data

Let's see how effective dimension differs between random noise and structured data.

1. Random Noise

High-dimensional random noise should have a high effective dimension because the variance is spread out in all directions.

import numpy as np
import effdim

# 1000 samples, 100 dimensions
noise = np.random.randn(1000, 100)

# Participation Ratio
pr = effdim.compute(noise, method='participation_ratio')
print(f"PR of Noise: {pr:.2f}")
# Expected: close to 100 (or slightly less due to finite sampling)

2. Structured Data (Low Rank)

If we create data that lies on a low-dimensional plane embedded in high-dimensional space, the effective dimension should be low.

# Create 1000 samples with only 5 meaningful dimensions
latent = np.random.randn(1000, 5)
projection = np.random.randn(5, 100)
structured_data = latent @ projection

# Add a tiny bit of noise
structured_data += 0.01 * np.random.randn(1000, 100)

pr = effdim.compute(structured_data, method='participation_ratio')
print(f"PR of Structured Data: {pr:.2f}")
# Expected: close to 5

Available Methods

You can check the available methods in the Theory section. Common ones include:

  • 'pca': PCA Explained Variance
  • 'participation_ratio' (or 'pr')
  • 'shannon' (or 'entropy')
  • 'effective_rank' (or 'erank')
  • 'knn': k-Nearest Neighbors
  • 'twonn': Two-Nearest Neighbors

analyzing Multiple Metrics

Use effdim.analyze to get a report.

report = effdim.analyze(structured_data, methods=['pr', 'pca', 'shannon'])
print(report)
# {'participation_ratio': ..., 'pca': ..., 'shannon': ...}