Additive smoothing

In statistics, additive smoothing, also called Laplace smoothing^[1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts $\mathbf {x} =\langle x_{1},x_{2},\ldots ,x_{d}\rangle$ from a $d$ -dimensional multinomial distribution with $N$ trials, a "smoothed" version of the counts gives the estimator

{\hat {\theta }}_{i}={\frac {x_{i}+\alpha }{N+\alpha d}}\qquad (i=1,\ldots ,d),

where the smoothed count ${\hat {x}}_{i}=N{\hat {\theta }}_{i}$ , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in § Pseudocount below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) $x_{i}/N$ and the uniform probability $1/d.$ Common choices for α are 0 (no smoothing), +1⁄2 (the Jeffreys prior), or 1 (Laplace's rule of succession),^[2]^[3] but the parameter may also be set empirically based on the observed data.

From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.

^ C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.
^ Jurafsky, Daniel; Martin, James H. (June 2008). Speech and Language Processing (2nd ed.). Prentice Hall. p. 132. ISBN 978-0-13-187321-6.
^ Russell, Stuart; Norvig, Peter (2010). Artificial Intelligence: A Modern Approach (2nd ed.). Pearson Education, Inc. p. 863.

[1] C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.

[2] Jurafsky, Daniel; Martin, James H. (June 2008). Speech and Language Processing (2nd ed.). Prentice Hall. p. 132. ISBN 978-0-13-187321-6.

[3] Russell, Stuart; Norvig, Peter (2010). Artificial Intelligence: A Modern Approach (2nd ed.). Pearson Education, Inc. p. 863.

[1]

[2]

[3]