Computers are limited in their ability to handle language, but LLMs are changing that (Image source: DeepMind/Unsplash)
Listen to this articleYour browser does not support the audio element.
Evolved over millions of years, modern human language is incredibly complex. The English language alone is comprised of over 1.7 lakh words that include nouns, verbs, adjectives, and a lot more, which can be put together in billions of ways to create unique sentences every time. While communication comes naturally to us – humans are hard-wired to speak and understand speech seamlessly – computers remain limited in their ability to handle language. But thanks to the onset of LLMs (Large Language Models) and (NLP) natural language processing, things are changing.
One of the most popular LLMs in recent times is OpenAI’s GPT-3/GPT-3.5, which is the foundation of the AI chatbot ChatGPT. It’s got a lot of people talking due to its capability of producing text that seems as if written by a human, with remarkable accuracy. This breakthrough can be useful for companies that wish to automate tasks, as well as for regular users looking up specific information. But it’s not the only LLM out there – there are several others – NVIDIA’s MT-NLG, for example, is made up of significantly more parameters. Here’s a look at some of the key LLMs.
You have exhausted your monthly limit of free stories.
Read more stories for free with an Express account.
OpenAI’s AI chatbot ChatGPT is built on top of the GPT-3.5 language model (Express photo)
What are LLMs (Large Language Models)?
Large language models use deep learning techniques to process large amounts of text. They work by processing vast amounts of text, understanding the structure and meaning, and learning from it. LLMs are ‘trained’ to identify meanings and relationships between words. The greater the amount of training data a model is fed, the smarter it gets at understanding and producing text.
The training data is usually large datasets, such as Wikipedia, OpenWebText, and the Common Crawl Corpus. These contain large amounts of text data, which the models use to understand and generate natural language.
GPT-3
Generative Pre-trained Transformer 3 (GPT-3) is a language model that uses deep learning to generate human-like text. Introduced by Open AI earlier in May 2020 as a successor to GPT-2, the model can also generate code, stories, poems, and a lot more. The model gained widespread attention following the release of ChatGPT in November and it also forms the foundation for the image-generative model Dall-E. It boats 175 Billion trainable parameters.
ERNIE Titan LLM
Baidu, which made its name in search engines, has recently been stepping up its game in AI. The Chinese company has developed its own Large Language Model called ERNIE (Enhanced Language Representation through Knowledge Integration). “Titan” is an enhanced version of ERNIE, designed to improve natural language understanding and generation tasks. It is pre-trained on a massive corpus of text data and can be fine-tuned for specific NLP tasks.
Although models like GPT-3 are promising, it’s still difficult for users to control results and obtain factually-consistent output. ERNIE proposes to fix this shortcoming by using a special technique during the training phase where the model learns to tell the difference between real text and text generated by itself. This also allows the model to rank the credibility of the generated text, making it more reliable and trustworthy.
Story continues below this ad
Yandex YaLM 100B
YaLM 100B, like its name suggests, leverages 100 billion parameters. Parameters are the values that are learned and adjusted during training to optimize the model’s performance on a specific task. They determine how powerful a model is. While the 100 billion figure is obviously smaller than GPT-3’s 175 billion parameters, YaLM stands out by being available for free. Training the model took 65 days, with 1.7 TB of online texts, books, and “countless other sources” being fed to a pool of 800 A100 graphics cards. Yandex claims that this LLM is “currently the world’s largest GPT-like neural network freely available for English.” The model has been published on GitHub under the Apache 2.0 license, permitting both research and commercial use.
BLOOM has been trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources, according to BigScience, the company behind it. BigScience is an open collaboration with hundreds of researchers and institutions around the world, hosted on Huggingface. It is able to output in not just one, but 46 different languages, and 13 programming languages, which the company claims is “hardly distinguishable from text written by humans.” BLOOM can perform tasks it hasn’t been specifically trained for by converting them into text generation tasks. Like GPT-3, BLOOM uses around 175 billion parameters. But it has one big difference – it is accessible to everyone. The model’s training lasted four months and started on March 11, 2022, utilising 384 graphic cards of 80 gigabytes each on the Jean Zay supercomputer in France.
Gopher
Gopher is an autoregressive transformer-based dense LLM. It employs a staggering 280 billion parameters, rivalled only by Nvidia’s MT-NLG (530 billion) in size. The model was trained on MassiveText, which is a dataset sized 10.5 terabytes containing sources like Wikipedia, GitHub, and Massive Web. The company the model, DeepMind, is a British AI subsidiary of Alphabet Inc. that was acquired by Google in 2014. Gopher reportedly beats models like GPT-3 in terms of performance across disciplines like math, reasoning, knowledge, science, reading comprehension, and ethics.
Megatron-Turing Natural Language Generation is developed by NVIDIA in collaboration with Microsoft. It was first introduced in October 2021 as a successor to the Turing NLG 17B and Megatron-LM models. The Turing project was launched by Microsoft in 2019 with the goal of allowing AI-powered enterprise search. MT-NLG is the largest of its kind, with 530 billion parameters. It can perform a wide range of natural language tasks like completion prediction, reading comprehension, commonsense reasoning, natural language inferences, and word sense disambiguation. The model was trained on NVIDIA’s Selene machine learning supercomputer, which is the sixth fastest supercomputer in the world.
Zohaib is a tech enthusiast and a journalist who covers the latest trends and innovations at The Indian Express's Tech Desk. A graduate in Computer Applications, he firmly believes that technology exists to serve us and not the other way around. He is fascinated by artificial intelligence and all kinds of gizmos, and enjoys writing about how they impact our lives and society. After a day's work, he winds down by putting on the latest sci-fi flick. • Experience: 3 years • Education: Bachelor in Computer Applications • Previous experience: Android Police, Gizmochina • Social: Instagram, Twitter, LinkedIn ... Read More
Technology on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.