“BEWARE” LLM machine-learning neuro network is running among us. “KEYWORDS”.
Along with OpenAI’s GPT-3 and 4 LLM, popular LLMs include open models such as Google’s LaMDA and PaLM LLM (the basis for Bard), Hugging Face’s BLOOM and XLM-RoBERTa, Nvidia’s NeMO LLM, XLNet, Co:here, and GLM-130B.
Open-source LLMs, in particular, are gaining traction, enabling a cadre of developers to create more customizable models at a lower cost. Meta’s February launch of LLaMA (Large Language Model Meta AI) kicked off an explosion among developers looking to build on top of open-source LLMs.
LLM is a machine-learning neuro network trained through data input/output sets;
frequently, the text is unlabeled or uncategorized, and the model is using self-supervised or semi-supervised learning methodology. Information is ingested, or content entered, into the LLM, and the output is what that algorithm predicts the next word will be. The input can be proprietary corporate data or, as in the case of ChatGPT, whatever data it’s fed and scraped directly from the internet.
Training LLMs to use the right data requires the use of massive, expensive server farms that act as supercomputers.
LLMs are controlled by parameters, as in millions, billions, and even trillions of them. (Think of a parameter as something that helps an LLM decide between different answer choices.) OpenAI’s GPT-3 LLM has 175 billion parameters, and the company’s latest model – GPT-4 – is purported to have 1 trillion parameters.
For example, you could type into an LLM prompt window “For lunch today I ate….” The LLM could come back with “cereal,” or “rice,” or “steak tartare.” There’s no 100% right answer, but there is a probability based on the data already ingested in the model. The answer “cereal” might be the most probable answer based on existing data, so the LLM could complete the sentence with that word. But, because the LLM is a probability engine, it assigns a percentage to each possible answer. Cereal might occur 50% of the time, “rice” could be the answer 20% of the time, steak tartare .005% of the time.
“The point is it learns to do this,” said Yoon Kim, an assistant professor at MIT who studies Machine Learning, Natural Language Processing and Deep Learning. “It’s not like a human — a large enough training set will assign these probabilities.”
But beware — junk in, junk out. In other words, if the information an LLM has ingested is biased, incomplete, or otherwise undesirable, then the response it gives could be equally unreliable, bizarre, or even offensive. When a response goes off the rails, data analysts refer to it as “hallucinations,” because they can be so far off track.
“Hallucinations happen because LLMs, in their in most vanilla form, don’t have an internal state representation of the world,” said Jonathan Siddharth, CEO of Turing, a Palo Alto, California company that uses AI to find, hire, and onboard software engineers remotely. “There’s no concept of fact. They’re predicting the next word based on what they’ve seen so far — it’s a statistical estimate.”
Because some LLMs also train themselves on internet-based data, they can move well beyond what their initial developers created them to do. For example, Microsoft’s Bing uses GPT-3 as its basis, but it’s also querying a search engine and analyzing the first 20 results or so. It uses both an LLM and the internet to offer responses.
We see things like a model being trained on one programming language and these models then automatically generate code in another programming language it has never seen,” Siddharth said. “Even natural language; it’s not trained on French, but it’s able to generate sentences in French.”
“It’s almost like there’s some emergent behavior. We don’t know quite know how these neural network works,” he added. “It’s both scary and exciting at the same time.”
Another problem with LLMs and their parameters is the unintended biases that can be introduced by LLM developers and self-supervised data collection from the internet.
Are LLMs biased?
For example, systems like ChatGPT are highly likely to provide gender-biased answers based on the data they’ve ingested from the internet and programmers, according to Sayash Kapoor, a Ph.D. candidate at Princeton University’s Center for Information Technology Policy.
We tested ChatGPT for biases that are implicit — that is, the gender of the person is not obviously mentioned, but only included as information about their pronouns,” Kapoor said. “That is, if we replace “she” in the sentence with “he,” ChatGPT would be three times less likely to make an error.”
Innate biases can be dangerous, Kapoor said, if language models are used in consequential real-world settings. For example, if biased language models are used in hiring processes, they can lead to real-world gender bias..
source
