language-icon Old Web
English
Sign In

Bias–variance tradeoff

In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. The bias–variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. The bias–variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: The bias–variance decomposition is a way of analyzing a learning algorithm's expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself. This tradeoff applies to all forms of supervised learning: classification, regression (function fitting), and structured output learning. It has also been invoked to explain the effectiveness of heuristics in human learning. The bias-variance tradeoff is a central problem in supervised learning. Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously. High-variance learning methods may be able to represent their training set well but are at risk of overfitting to noisy or unrepresentative training data. In contrast, algorithms with low variance typically produce simpler models that don't tend to overfit but may underfit their training data, failing to capture important regularities. Models with low bias are usually more complex (e.g. higher-order regression polynomials), enabling them to represent the training set more accurately. In the process, however, they may also represent a large noise component in the training set, making their predictions less accurate – despite their added complexity. In contrast, models with higher bias tend to be relatively simple (low-order or even linear regression polynomials) but may produce lower variance predictions when applied beyond the training set. Suppose that we have a training set consisting of a set of points x 1 , … , x n {displaystyle x_{1},dots ,x_{n}} and real values y i {displaystyle y_{i}} associated with each point x i {displaystyle x_{i}} . We assume that there is a function with noise y = f ( x ) + ε {displaystyle y=f(x)+varepsilon } , where the noise, ε {displaystyle varepsilon } , has zero mean and variance σ 2 {displaystyle sigma ^{2}} . We want to find a function f ^ ( x ) {displaystyle {hat {f}}(x)} , that approximates the true function f ( x ) {displaystyle f(x)} as well as possible, by means of some learning algorithm. We make 'as well as possible' precise by measuring the mean squared error between y {displaystyle y} and f ^ ( x ) {displaystyle {hat {f}}(x)} : we want ( y − f ^ ( x ) ) 2 {displaystyle (y-{hat {f}}(x))^{2}} to be minimal, both for x 1 , … , x n {displaystyle x_{1},dots ,x_{n}} and for points outside of our sample. Of course, we cannot hope to do so perfectly, since the y i {displaystyle y_{i}} contain noise ε {displaystyle varepsilon } ; this means we must be prepared to accept an irreducible error in any function we come up with. Finding an f ^ {displaystyle {hat {f}}} that generalizes to points outside of the training set can be done with any of the countless algorithms used for supervised learning. It turns out that whichever function f ^ {displaystyle {hat {f}}} we select, we can decompose its expected error on an unseen sample x {displaystyle x} as follows::34:223

[ "Estimator", "Variance (accounting)" ]
Parent Topic
Child Topic
    No Parent Topic