Metadata-Version: 2.4
Name: optixcel
Version: 2.1.0
Summary: Optixcel: Fast & Lightweight ML for Optical Property Prediction
Home-page: https://github.com/wajdan/optixcel
Author: Optixcel Development Team
Author-email: Optixcel Development Team <optixcel@example.com>
License: MIT
Project-URL: Homepage, https://github.com/wajdan/optixcel
Project-URL: Repository, https://github.com/wajdan/optixcel
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: joblib>=1.3.0
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# 🧠 OPTICAL MIND - Complete Research System Documentation

## Overview

OpticalMind is an intelligent, self-analyzing machine learning system for predicting optical properties of perovskites. It combines cutting-edge ML with comprehensive diagnostics and explainability.

---

## Core Components

### 1. Data Preprocessor
- ✅ Automatic data structure analysis
- ✅ Missing value handling (statistical vs domain-aware)
- ✅ Robust normalization (RobustScaler)
- ✅ Constant feature detection

**Key Features:**
```python
preprocessor = DataPreprocessor(verbose=True)
analysis = preprocessor.analyze_data(X)
X_clean, y_clean = preprocessor.handle_missing_values(X, y)
X_norm = preprocessor.normalize(X_clean, fit=True)
```

### 2. Diagnostics Engine
- ✅ Statistical outlier detection (Z-score + IQR)
- ✅ Feature inconsistency analysis
- ✅ Overfitting detection (train vs validation gap)
- ✅ Feature correlation analysis

**Key Features:**
```python
diagnostics = DiagnosticsEngine(verbose=True)
outliers = diagnostics.detect_statistical_outliers(X, y, threshold=3.0)
feature_issues = diagnostics.detect_feature_inconsistencies(X)
overfit_analysis = diagnostics.detect_overfitting(model, X_train, y_train, X_val, y_val)
correlations = diagnostics.analyze_correlations(X)
```

### 3. Feature Engineer
- ✅ Intelligent feature selection (SelectKBest)
- ✅ Polynomial feature generation
- ✅ Feature interaction creation

**Key Features:**
```python
engineer = FeatureEngineer(verbose=True)
X_selected = engineer.select_features(X, y, n_features=50)
X_engineered = engineer.create_polynomial_features(X, degree=2)
```

### 4. Comprehensive Evaluator
Calculates 8+ evaluation metrics:
- **R²**: Coefficient of determination
- **NSE**: Nash-Sutcliffe Efficiency  
- **RMSE**: Root Mean Squared Error
- **MAE**: Mean Absolute Error
- **MAPE**: Mean Absolute Percentage Error
- **VAR%**: Variance Explained (%)
- **PI**: Performance Index
- **a10/a20**: % predictions ≤ 10%/20% error

**Usage:**
```python
evaluator = ComprehensiveEvaluator()
metrics = evaluator.calculate_metrics(y_true, y_pred)
formatted = evaluator.format_metrics(metrics, phase="Test")
print(formatted)
```

### 5. Explainability Engine
- ✅ Feature importance calculation
- ✅ SHAP-ready architecture
- ✅ Prediction explanation generation

**Usage:**
```python
explainer = ExplainabilityEngine(verbose=True)
importance = explainer.calculate_feature_importance(model, X)
explanation = explainer.explain_prediction(features, feature_names)
```

---

## Main OpticalMind Class

### Initialization
```python
from optical_mind import OpticalMind

mind = OpticalMind(
    verbose=True,      # Print all diagnostics
    n_features=50,     # Select top 50 features
    random_state=42    # For reproducibility
)
```

### Training with Full Diagnostics
```python
report = mind.fit(
    X,                      # Input features
    y,                      # Target values
    test_size=0.2,         # 20% test split
    validation_size=0.1    # 10% of training for validation
)
```

**Training Phases (Automatic):**
1. Data analysis and characterization
2. Preprocessing and normalization
3. Comprehensive diagnostics
4. Feature engineering and selection
5. Ensemble model training (XGBoost + Random Forest)
6. Overfitting detection
7. Evaluation with 8+ metrics
8. Feature importance calculation
9. Complete diagnostics summary

### Prediction
```python
# Basic prediction
predictions = mind.predict(X_test)

# Prediction with uncertainty
predictions, uncertainties = mind.predict(
    X_test,
    return_uncertainty=True
)
```

### Getting Results
```python
# Diagnostics summary
summary = mind.get_diagnostics_summary()
print(summary)

# Full training report
report = mind.training_report
print(f"Test R²: {report['test_metrics']['r2']:.4f}")

# Save model
mind.save('optical_mind_model.pkl')

# Load model
loaded_mind = OpticalMind.load('optical_mind_model.pkl')
```

---

## Complete Example

```python
import pandas as pd
from optical_mind import OpticalMind

# Load data
df = pd.read_csv('final_170K_complete_optical.csv')

# Prepare
X = df.drop(columns=['target']).values
y = df['target'].values

# Create intelligent system
mind = OpticalMind(verbose=True, n_features=50)

# Train with complete diagnostics
report = mind.fit(X, y)

# Make predictions
predictions = mind.predict(X[:100])
predictions_unc, uncertainties = mind.predict(X[:100], return_uncertainty=True)

# Get summary
print(mind.get_diagnostics_summary())

# Save for later use
mind.save('perovskite_predictor.pkl')
```

---

## Diagnostics Output Explanation

### Data Analysis Phase
Shows:
- Number of samples and features
- Memory usage
- Constant and near-constant features

### Diagnostics Phase
Shows:
- **Outliers**: Number detected (% of total)
- **Feature Issues**: 
  - Zero variance features
  - Highly skewed features
  - High kurtosis features
- **Correlation Analysis**: Redundant feature pairs

### Overfitting Analysis
- Train R² vs Validation R²
- Gap between them
- Severity assessment: none / mild / moderate / severe
- Recommendations if detected

### Performance Metrics
All 8+ metrics for Train/Validation/Test:
```
R² Score ................... 0.997019 (best possible: 1.0)
NSE ........................ 0.997019 (1.0 = perfect)
RMSE ...................... 0.000012 (lower is better)
MAE ........................ 0.000009 (lower is better)
Variance Explained (%) ... 92.41%   (higher is better)
PI ......................... 0.935133 (0-1, higher is better)
a10 (err ≤ 10%) ........... 60.25%   (% within 10% error)
a20 (err ≤ 20%) ........... 64.30%   (% within 20% error)
```

---

## Architecture Diagram

```
INPUT DATA
    ↓
[Data Preprocessor]
    ├─ Analyze structure
    ├─ Handle missing values
    ├─ Normalize
    └─ Remove problematic rows
    ↓
[Diagnostics Engine]
    ├─ Detect outliers
    ├─ Find feature inconsistencies
    ├─ Analyze correlations
    └─ Generate diagnostics report
    ↓
[Feature Engineering]
    ├─ Select top K features
    ├─ Create interactions (optional)
    └─ Generate polynomial features (optional)
    ↓
[Model Training - ENSEMBLE]
    ├─ XGBoost (gradient boosting)
    ├─ Random Forest (tree-based)
    └─ Combined predictions
    ↓
[Evaluation]
    ├─ Calculate 8+ metrics
    ├─ Detect overfitting
    └─ Generate performance report
    ↓
[Explainability]
    ├─ Feature importance
    ├─ SHAP analysis (ready)
    └─ Prediction explanations
    ↓
OUTPUT: Predictions + Diagnostics + Explanations
```

---

## File Structure

```
optical_mind_core.py          # Core modules (preprocessing, diagnostics, etc)
optical_mind.py              # Main OpticalMind class
optical_mind_demo.py         # Demo script with full example
OPTICAL_MIND_README.md       # This documentation
```

---

## Key Results from Test Run (10K samples)

| Metric | Train | Validation | Test |
|--------|-------|-----------|------|
| **R²** | 0.9970 | 0.9971 | 0.9970 |
| **NSE** | 0.9970 | 0.9971 | 0.9970 |
| **RMSE** | 0.000012 | 0.000012 | 0.000012 |
| **MAE** | 0.000009 | 0.000009 | 0.000009 |
| **a10** | 62.26% | 58.50% | 60.25% |
| **a20** | 66.24% | 62.75% | 64.30% |

**Diagnostics Summary:**
- Outliers detected: 1,145 (15.9%)
- Feature issues: 36 total
- Redundant pairs: 232
- Overfitting: **NONE** (gap = -0.0006)
- Top feature importance: 0.248

---

## Advanced Usage

### Custom Hyperparameters
```python
mind = OpticalMind(
    verbose=True,
    n_features=75,      # Use more features
    random_state=123
)
```

### Accessing Raw Results
```python
# Training metrics
train_metrics = mind.training_report['train_metrics']
val_metrics = mind.training_report['val_metrics']
test_metrics = mind.training_report['test_metrics']

# Overfitting analysis
overfit = mind.training_report['overfit_analysis']
print(f"Overfitting severity: {overfit['severity']}")

# Feature importance
feature_imp = mind.training_report['feature_importance']
```

### Diagnostics Details
```python
# Raw diagnostics
outliers = mind.diagnostics_report['outliers']
feature_issues = mind.diagnostics_report['feature_issues']
correlations = mind.diagnostics_report['correlations']

# Process as needed
outlier_indices = outliers['indices']
redundant_pairs = correlations['redundant_pairs']
```

---

## Requirements

```
numpy >= 1.20
pandas >= 1.2
scikit-learn >= 0.24
xgboost >= 1.5
scipy >= 1.6
joblib >= 1.0
```

Install with:
```bash
pip install numpy pandas scikit-learn xgboost scipy joblib
```

---

## Performance Tips

1. **Large Datasets**: Use sampling for faster iteration
   ```python
   sample_size = 50000
   X_sample = X[:sample_size]
   y_sample = y[:sample_size]
   mind.fit(X_sample, y_sample)
   ```

2. **Fewer Features**: Reduces training time
   ```python
   mind = OpticalMind(n_features=30)  # vs default 50
   ```

3. **Parallel Processing**: Automatically uses all CPUs
   - No configuration needed!

---

## Troubleshooting

**Issue**: Memory error with full dataset
- **Solution**: Use sample or reduce n_features

**Issue**: Very high MAPE but low RMSE
- **Solution**: Data has values close to zero; MAPE is less reliable

**Issue**: Overfitting detected
- **Solution**: System will recommend:
  - Use less complex models
  - Increase regularization
  - Add more training data

**Issue**: Poor predictions
- **Solution**: Check diagnostics:
  ```python
  print(mind.get_diagnostics_summary())
  ```
  - High outliers? → Review data quality
  - Many feature issues? → Domain knowledge needed
  - High redundancy? → Correlations too strong

---

## License

MIT License - Free for research and commercial use

---

## Citation

If you use OpticalMind in your research, please cite:

```
OpticalMind: Intelligent ML System for Optical Property Prediction
Author: Muhammad Wajdan Jamal
Version: 1.0.0
Year: 2026
```

---

## Support & Feedback

For issues, suggestions, or improvements:
1. Check this documentation
2. Review the demo script
3. Check diagnostic output
4. Iterate based on recommendations

---

**Last Updated:** April 5, 2026
**Status:** ✅ Production Ready
**Quality:** Research-Grade
