Language model

A language model is a model of the human brain's ability to produce natural language.^[1]^[2] Language models are useful for a variety of tasks, including speech recognition,^[3] machine translation,^[4] natural language generation (generating more human-like text), optical character recognition, route optimization,^[5] handwriting recognition,^[6] grammar induction,^[7] and information retrieval.^[8]^[9]

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using texts scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as word n-gram language model.

^ Blank, Idan A. (November 2023). "What are large language models supposed to model?". Trends in Cognitive Sciences. 27 (11): 987–989. doi:10.1016/j.tics.2023.08.006. PMID 37659920."LLMs are supposed to model how utterances behave."
^ Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models" (PDF). Speech and Language Processing (3rd ed.). Archived from the original on 22 May 2022. Retrieved 24 May 2022.
^ Kuhn, Roland, and Renato De Mori (1990). "A cache-based natural language model for speech recognition". IEEE transactions on pattern analysis and machine intelligence 12.6: 570–583.
^ Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation" Archived 15 August 2020 at the Wayback Machine. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
^ Liu, Yang; Wu, Fanyou; Liu, Zhiyuan; Wang, Kai; Wang, Feiyue; Qu, Xiaobo (2023). "Can language models be used for real-world urban-delivery route optimization?". The Innovation. 4 (6): 100520. Bibcode:2023Innov...400520L. doi:10.1016/j.xinn.2023.100520. PMC 10587631. PMID 37869471.
^ Pham, Vu, et al (2014). "Dropout improves recurrent neural networks for handwriting recognition" Archived 11 November 2020 at the Wayback Machine. 14th International Conference on Frontiers in Handwriting Recognition. IEEE.
^ Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). "Grammar induction with neural language models: An unusual replication" Archived 14 August 2022 at the Wayback Machine. arXiv:1808.10000 .
^ Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.
^ Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.

[1] Blank, Idan A. (November 2023). "What are large language models supposed to model?". Trends in Cognitive Sciences. 27 (11): 987–989. doi:10.1016/j.tics.2023.08.006. PMID 37659920."LLMs are supposed to model how utterances behave."

[2] Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models" (PDF). Speech and Language Processing (3rd ed.). Archived from the original on 22 May 2022. Retrieved 24 May 2022.

[3] Kuhn, Roland, and Renato De Mori (1990). "A cache-based natural language model for speech recognition". IEEE transactions on pattern analysis and machine intelligence 12.6: 570–583.

[Semantic_parsing_as_machine_translation-4] Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation" Archived 15 August 2020 at the Wayback Machine. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).

[5] Liu, Yang; Wu, Fanyou; Liu, Zhiyuan; Wang, Kai; Wang, Feiyue; Qu, Xiaobo (2023). "Can language models be used for real-world urban-delivery route optimization?". The Innovation. 4 (6): 100520. Bibcode:2023Innov...400520L. doi:10.1016/j.xinn.2023.100520. PMC 10587631. PMID 37869471.

[6] Pham, Vu, et al (2014). "Dropout improves recurrent neural networks for handwriting recognition" Archived 11 November 2020 at the Wayback Machine. 14th International Conference on Frontiers in Handwriting Recognition. IEEE.

[7] Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). "Grammar induction with neural language models: An unusual replication" Archived 14 August 2022 at the Wayback Machine. arXiv:1808.10000 .

[ponte1998-8] Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.

[hiemstra1998-9] Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]