Information Theory and Large Language Models

3 hours ago
4

Information Theory and Large Language Models

Large Language Models (LLMs) are deeply connected to the theory of information, tracing back to foundational ideas in artificial intelligence and language processing. Language models serve as a cornerstone in natural language processing, utilizing mathematical methods to generalize language laws and knowledge for prediction and generation.

The intricate historical and contemporary links between LLMs and cognitive science are explored through the lens of information theory and statistical language models. The emergence of LLMs highlights the enduring significance of information-based and statistical learning theories in understanding human communication.

These theories, proposed in the mid-20th century, offered a framework for integrating computational science, social sciences, and humanities. Research investigates the information encoded in LLM embeddings, analyzing representation entropy and proposing theories based on (conditional) entropy to explain scaling laws. Insights from Information Bottleneck Theory are also being explored to understand how LLMs process information, compress input into task spaces, and extract relevant information for predictions

References

Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory
https://arxiv.org/abs/2501.00999

Large Language Models and what Information Theory tells us about the Evolution of Language
https://medium.com/ontologik/large-language-models-and-what-information-theory-tells-us-about-the-evolution-of-langauge-13458349b8c8

Large Language Models: A Deep Dive
https://link.springer.com/book/10.1007/978-3-031-65647-7

History, development, and principles of large language models: an introductory survey
https://link.springer.com/article/10.1007/s43681-024-00583-7

Large Language Models: A Historical and Sociocultural Perspective
https://pubmed.ncbi.nlm.nih.gov/38500317/

Information Theory and Language
https://www.researchgate.net/publication/340603919_Information_Theory_and_Language

The Information of Large Language Model Geometry
https://arxiv.org/abs/2402.03471

Large language models: a survey of their development, capabilities, and applications
https://link.springer.com/article/10.1007/s10115-024-02310-4

Large Language Models and theoretical linguistics
https://lingphil.scripts.mit.edu/papers/fox/FoxKatzir_2024_Large_Language_Models_and_theoretical_linguistics.pdf

How Large Language Models Have Evolved
https://quiq.com/blog/how-large-language-models-have-evolved/

Towards Reflexive AI: A Comprehensive Exploration of Enhancing Social Science Research Through NLP
https://link.springer.com/chapter/10.1007/978-3-031-84460-7_49

The limitations of using languages for description
https://web.mit.edu/dxh/www/1970_Marvin_Lecture_Transcript_Italy_Limitations_Language.pdf

Large Language Models and Artificial Intelligence in Psychiatry Medical Education: Augmenting But Not Replacing Best Practices
https://link.springer.com/article/10.1007/s40596-024-01996-6

When AI goes wrong: 10 examples of AI mistakes and failures
https://www.evidentlyai.com/blog/ai-failures-examples

The effects of over-reliance on AI dialogue systems on students' cognitive abilities: a systematic review
https://slejournal.springeropen.com/articles/10.1186/s40561-024-00316-7

Artificial Intelligence (AI) and Information Literacy
https://lib.guides.umd.edu/c.php?g=1340355&p=9880574

AI Unreliable Answers: A Case Study on ChatGPT
https://link.springer.com/chapter/10.1007/978-3-031-35894-4_2

Loading comments...