Imagine you're learning to draw, and you have a teacher who gives you feedback. If you're drawing something very different from what the teacher wants, they might not give you much feedback (because it's too far off).
The โต-product works the same way! When inputs are far away from the weight vector, the gradients (feedback) become very small. This means each neuron only "learns" from nearby examples โ creating natural localization.
This is like having a teacher who focuses on helping you with things that are close to what you're trying to learn, rather than getting distracted by completely different things.
Traditional neurons have global gradient influence:
This means every training example affects every weight, leading to:
Computing the gradient:
As $\|\mathbf{x}\| = k \to \infty$, the denominator grows as $k^2$, while the numerator grows as $k$. Therefore, $\|\nabla_{\mathbf{x}} \text{โต}\| \sim \mathcal{O}(1/k) \to 0$.
Each neuron creates a "territory" around its weight vector. Only nearby inputs contribute significant gradients, enabling local pattern learning.
The โต-product is infinitely differentiable, perfect for physics-informed neural networks (PINNs) requiring higher-order derivatives.
The gradient is Lipschitz continuous, providing theoretical guarantees for optimization and stability during training.
Distant outliers contribute vanishingly small gradients, so they don't disrupt learning of local patterns.
This proposition shows that โต-product neurons have spatial awareness. They naturally create "learning territories" โ regions of input space where they actively learn, with smooth falloff outside.
This is fundamentally different from linear or ReLU neurons, which have global influence. The localization property enables: