Laplace smoothing or additive smoothing is a statistical technique to increase the probability of less likely elements by increasing the counts of all the elements in the dataset by one. The application of smoothing is very important in natural language processing, as some words may have zero or close to zero probabilities such as the out-of-vocabulary words (words that do not exist in the vocabulary), but the same rare words may not have the same values in test data.
The addition is done to the count of the word, which is divided by the number of tokens in the corpus to get the probability of the word in the corpus.
Effect of smoothing
Smoothing enhances accuracy in statistical models such as Naïve Bayes when applied to data with high sparsity, by removing the penalty on zero-probability n-grams. The problem is with the really zero probabilities or the ones that are really zero.
Smoothing types
- Laplace: Add one only to the count of the words.
- Lidstone: Adding alpha value to the counts, which enables to tune the probabilities in a more flexible way.
- Interpolation: interpolates lower level n-grams with higher level n-grams such as interpolating unigram probabilities with bigram probabilities to prioritize some bigrams that should have higher counts than the assumed alpha value or 1 in the prior two methods.
- Kneser Ney: Steals from words with higher probability and adds to words with low probability. It also uses interpolation and the notion of fertility
Reference: NLP Lectures in the University of Essex