Foundation model

In artificial intelligence (AI), a foundation model (FM), also known as large X model (LxM), is a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases.^[1] Generative AI applications like large language models (LLM) are common examples of foundation models.^[1]

Building foundation models is often highly resource-intensive, with the most advanced models costing hundreds of millions of dollars to cover the expenses of acquiring, curating, and processing massive datasets, as well as the compute power required for training.^[2] These costs stem from the need for sophisticated infrastructure, extended training times, and advanced hardware, such as GPUs. In contrast, adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.

Early examples of foundation models are language models (LMs) like OpenAI's GPT series and Google's BERT.^[3]^[4] Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo^[5] for images, MusicGen^[6] for music, and RT-2^[7] for robotic control. Foundation models are also being developed for fields like astronomy,^[8] radiology,^[9] genomics,^[10] music,^[11] coding,^[12] times-series forecasting,^[13] mathematics,^[14] and chemistry.^[15]

^ ^a ^b Competition and Markets Authority (2023). AI Foundation Models: Initial Report. Available at: https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf
^ Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, "The AI Index 2023 Annual Report," AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
^ Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What we know about how BERT works". arXiv:2002.12327 [cs.CL].
^ Haddad, Mohammed. "How does GPT-4 work and how can you start using it in ChatGPT?". Al Jazeera. Retrieved 20 October 2024.
^ Tackling multiple tasks with a single visual language model, 28 April 2022, retrieved 13 June 2022
^ Copet, Jade; Kreuk, Felix; Gat, Itai; Remez, Tal; Kant, David; Synnaeve, Gabriel; Adi, Yossi; Défossez, Alexandre (7 November 2023). "Simple and Controllable Music Generation". arXiv:2306.05284 [cs.SD].
^ "Speaking robot: Our new AI model translates vision and language into robotic actions". Google. 28 July 2023. Retrieved 11 December 2023.
^ Nguyen, Tuan Dung; Ting, Yuan-Sen; Ciucă, Ioana; O'Neill, Charlie; Sun, Ze-Chang; Jabłońska, Maja; Kruk, Sandor; Perkowski, Ernest; Miller, Jack (12 September 2023). "AstroLLaMA: Towards Specialized Foundation Models in Astronomy". arXiv:2309.06126 [astro-ph.IM].
^ Tu, Tao; Azizi, Shekoofeh; Driess, Danny; Schaekermann, Mike; Amin, Mohamed; Chang, Pi-Chuan; Carroll, Andrew; Lau, Chuck; Tanno, Ryutaro (26 July 2023). "Towards Generalist Biomedical AI". arXiv:2307.14334 [cs.CL].
^ Zvyagin, Maxim; Brace, Alexander; Hippe, Kyle; Deng, Yuntian; Zhang, Bin; Bohorquez, Cindy Orozco; Clyde, Austin; Kale, Bharat; Perez-Rivera, Danilo (11 October 2022). "GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics". bioRxiv 10.1101/2022.10.10.511571.
^ Engineering, Spotify (13 October 2023). "LLark: A Multimodal Foundation Model for Music". Spotify Research. Retrieved 11 December 2023.
^ Li, Raymond; Allal, Loubna Ben; Zi, Yangtian; Muennighoff, Niklas; Kocetkov, Denis; Mou, Chenghao; Marone, Marc; Akiki, Christopher; Li, Jia (9 May 2023). "StarCoder: may the source be with you!". arXiv:2305.06161 [cs.CL].
^ Se, Ksenia; Spektor, Ian (5 April 2024). "Revolutionizing Time Series Forecasting: Interview with TimeGPT's creators". Turing Post. Retrieved 11 April 2024.
^ Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer, Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (30 November 2023). "Llemma: An Open Language Model For Mathematics". arXiv:2310.10631 [cs.CL].
^ "Orbital".

[:1-1] Competition and Markets Authority (2023). AI Foundation Models: Initial Report. Available at: https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf

[2] Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, "The AI Index 2023 Annual Report," AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.

[3] Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What we know about how BERT works". arXiv:2002.12327 [cs.CL].

[4] Haddad, Mohammed. "How does GPT-4 work and how can you start using it in ChatGPT?". Al Jazeera. Retrieved 20 October 2024.

[deepmind_20220428-5] Tackling multiple tasks with a single visual language model, 28 April 2022, retrieved 13 June 2022

[6] Copet, Jade; Kreuk, Felix; Gat, Itai; Remez, Tal; Kant, David; Synnaeve, Gabriel; Adi, Yossi; Défossez, Alexandre (7 November 2023). "Simple and Controllable Music Generation". arXiv:2306.05284 [cs.SD].

[7] "Speaking robot: Our new AI model translates vision and language into robotic actions". Google. 28 July 2023. Retrieved 11 December 2023.

[8] Nguyen, Tuan Dung; Ting, Yuan-Sen; Ciucă, Ioana; O'Neill, Charlie; Sun, Ze-Chang; Jabłońska, Maja; Kruk, Sandor; Perkowski, Ernest; Miller, Jack (12 September 2023). "AstroLLaMA: Towards Specialized Foundation Models in Astronomy". arXiv:2309.06126 [astro-ph.IM].

[9] Tu, Tao; Azizi, Shekoofeh; Driess, Danny; Schaekermann, Mike; Amin, Mohamed; Chang, Pi-Chuan; Carroll, Andrew; Lau, Chuck; Tanno, Ryutaro (26 July 2023). "Towards Generalist Biomedical AI". arXiv:2307.14334 [cs.CL].

[10] Zvyagin, Maxim; Brace, Alexander; Hippe, Kyle; Deng, Yuntian; Zhang, Bin; Bohorquez, Cindy Orozco; Clyde, Austin; Kale, Bharat; Perez-Rivera, Danilo (11 October 2022). "GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics". bioRxiv 10.1101/2022.10.10.511571.

[11] Engineering, Spotify (13 October 2023). "LLark: A Multimodal Foundation Model for Music". Spotify Research. Retrieved 11 December 2023.

[12] Li, Raymond; Allal, Loubna Ben; Zi, Yangtian; Muennighoff, Niklas; Kocetkov, Denis; Mou, Chenghao; Marone, Marc; Akiki, Christopher; Li, Jia (9 May 2023). "StarCoder: may the source be with you!". arXiv:2305.06161 [cs.CL].

[13] Se, Ksenia; Spektor, Ian (5 April 2024). "Revolutionizing Time Series Forecasting: Interview with TimeGPT's creators". Turing Post. Retrieved 11 April 2024.

[14] Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer, Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (30 November 2023). "Llemma: An Open Language Model For Mathematics". arXiv:2310.10631 [cs.CL].

[15] "Orbital".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]