AI BEST SEARCH
AI Glossary & Keyword Index [AI BEST SEARCH]
Gradient Descent

Gradient Descent

Gradient descent is an optimization algorithm that gradually adjusts a model's parameters (weights and biases) to minimize the loss function in machine learning. It is the most fundamental and widely used technique, and serves as the backbone of the training process in neural networks and many other models. The method works by computing the gradient (slope) of the loss function and moving the parameters a small step in the opposite direction, steering the model toward a state of lower error. In other words, it is like descending from a high elevation (high error) to the valley floor (minimum error) to find the optimal solution. The basic update rule for gradient descent: θ = θ - η × ∇L(θ) • θ: Model parameters (e.g., weights) • η (eta): Learning rate — the step size • ∇L(θ): Gradient of the loss function with respect to θ Main variants: • Batch Gradient Descent: Computes the gradient using all data in one go, resulting in stable updates but slow and memory-intensive for large datasets. • Stochastic Gradient Descent (SGD): Computes the gradient one sample at a time. Fast but can be noisy and unstable. • Mini-batch Gradient Descent: Computes gradients over small batches. Strikes a good balance between speed and stability—the most widely used approach. Advanced methods that build on gradient descent for faster and more stable training include: • Momentum • AdaGrad • RMSprop • Adam (the most widely used) Gradient descent is the fundamental optimization principle underlying the learning process in AI and machine learning, and understanding it is very important for building and improving models.

Related terms

Learning Rate