Boosting (machine learning)

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, variance.^[1] It is used in supervised learning and a family of machine learning algorithms that convert weak learners to strong ones.^[2]

The concept of boosting is based on the question posed by Kearns and Valiant (1988, 1989):^[3]^[4] "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

Robert Schapire answered the question posed by Kearns and Valiant in the affirmative in a paper published in 1990^[5].This has had significant ramifications in machine learning and statistics, most notably leading to the development of boosting.^[6]

When first introduced, the hypothesis boosting problem simply referred to the process of turning a weak learner into a strong learner. "Informally, [the hypothesis boosting] problem asks whether an efficient learning algorithm […] that outputs a hypothesis whose performance is only slightly better than random guessing [i.e. a weak learner] implies the existence of an efficient algorithm that outputs a hypothesis of arbitrary accuracy [i.e. a strong learner]."^[3] Algorithms that achieve hypothesis boosting quickly became simply known as "boosting". Freund and Schapire's arcing (Adapt[at]ive Resampling and Combining),^[7] as a general technique, is more or less synonymous with boosting.^[8]

^ Leo Breiman (1996). "BIAS, VARIANCE, AND ARCING CLASSIFIERS" (PDF). TECHNICAL REPORT. Archived from the original (PDF) on 2015-01-19. Retrieved 19 January 2015. Arcing [Boosting] is more successful than bagging in variance reduction
^ Zhou Zhi-Hua (2012). Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC. p. 23. ISBN 978-1439830031. The term boosting refers to a family of algorithms that are able to convert weak learners to strong learners
^ ^a ^b Michael Kearns(1988); Thoughts on Hypothesis Boosting, Unpublished manuscript (Machine Learning class project, December 1988)
^ Michael Kearns; Leslie Valiant (1989). "Crytographic limitations on learning Boolean formulae and finite automata". Proceedings of the twenty-first annual ACM symposium on Theory of computing - STOC '89. Vol. 21. ACM. pp. 433–444. doi:10.1145/73007.73049. ISBN 978-0897913072. S2CID 536357.
^ Schapire, Robert E. (1990). "The Strength of Weak Learnability" (PDF). Machine Learning. 5 (2): 197–227. CiteSeerX 10.1.1.20.723. doi:10.1007/bf00116037. S2CID 53304535. Archived from the original (PDF) on 2012-10-10. Retrieved 2012-08-23.
^ Leo Breiman (1998). "Arcing classifier (with discussion and a rejoinder by the author)". Ann. Stat. 26 (3): 801–849. doi:10.1214/aos/1024691079. Schapire (1990) proved that boosting is possible. (Page 823)
^ Yoav Freund and Robert E. Schapire (1997); A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1):119-139
^ Leo Breiman (1998); Arcing Classifier (with Discussion and a Rejoinder by the Author), Annals of Statistics, vol. 26, no. 3, pp. 801-849: "The concept of weak learning was introduced by Kearns and Valiant (1988, 1989), who left open the question of whether weak and strong learnability are equivalent. The question was termed the boosting problem since [a solution must] boost the low accuracy of a weak learner to the high accuracy of a strong learner. Schapire (1990) proved that boosting is possible. A boosting algorithm is a method that takes a weak learner and converts it into a strong learner. Freund and Schapire (1997) proved that an algorithm similar to arc-fs is boosting.

[1] Leo Breiman (1996). "BIAS, VARIANCE, AND ARCING CLASSIFIERS" (PDF). TECHNICAL REPORT. Archived from the original (PDF) on 2015-01-19. Retrieved 19 January 2015. Arcing [Boosting] is more successful than bagging in variance reduction

[2] Zhou Zhi-Hua (2012). Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC. p. 23. ISBN 978-1439830031. The term boosting refers to a family of algorithms that are able to convert weak learners to strong learners

[Kearns88-3] Michael Kearns(1988); Thoughts on Hypothesis Boosting, Unpublished manuscript (Machine Learning class project, December 1988)

[4] Michael Kearns; Leslie Valiant (1989). "Crytographic limitations on learning Boolean formulae and finite automata". Proceedings of the twenty-first annual ACM symposium on Theory of computing - STOC '89. Vol. 21. ACM. pp. 433–444. doi:10.1145/73007.73049. ISBN 978-0897913072. S2CID 536357.

[Schapire90-5] Schapire, Robert E. (1990). "The Strength of Weak Learnability" (PDF). Machine Learning. 5 (2): 197–227. CiteSeerX 10.1.1.20.723. doi:10.1007/bf00116037. S2CID 53304535. Archived from the original (PDF) on 2012-10-10. Retrieved 2012-08-23.

[6] Leo Breiman (1998). "Arcing classifier (with discussion and a rejoinder by the author)". Ann. Stat. 26 (3): 801–849. doi:10.1214/aos/1024691079. Schapire (1990) proved that boosting is possible. (Page 823)

[7] Yoav Freund and Robert E. Schapire (1997); A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1):119-139

[8] Leo Breiman (1998); Arcing Classifier (with Discussion and a Rejoinder by the Author), Annals of Statistics, vol. 26, no. 3, pp. 801-849: "The concept of weak learning was introduced by Kearns and Valiant (1988, 1989), who left open the question of whether weak and strong learnability are equivalent. The question was termed the boosting problem since [a solution must] boost the low accuracy of a weak learner to the high accuracy of a strong learner. Schapire (1990) proved that boosting is possible. A boosting algorithm is a method that takes a weak learner and converts it into a strong learner. Freund and Schapire (1997) proved that an algorithm similar to arc-fs is boosting.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]