The Evolution of Large Language Models

From Translation to DeepSeek

 

There has been much discussion about DeepSeek in the media and the step change in share prices of OpenAI and NVIDIA that it caused. It is our intention in this short article to provide a history to the creation of the DeepSeek product and surmise how it will impact the wider AI landscape.

DeepSeek is concerned with just one form of numerical solution to problems, namely Large Language Models (LLMs). This is an important application of AI because manipulating the written word is a task that most of us are required to do every day. It is, by no means, the only application of AI.

 

From Words to Tokens: The Birth of LLMs

LLMs evolved from the desire to automatically translate text from one language to another. This task was identified as something that might be automated in the early 1990s.

Automated translation converts words into tokens, where each token may represent one or more words in the reference language, and then assembles these tokens into a unique sequence, usually called a vector by computer scientists.

Translation is performed by looking for a matching vector in the target language. LLMs are designed to perform this task efficiently.

 

Beyond Translation: The Expansion of LLMs

It is important to note that this translation task is not restricted to languages such as converting English to French.

Because the translation is made via tokens, rather than directly, we can choose the inputs and outputs to be whatever we wish. They may be physics equations, chemical reactions, or architectural design rules. The reference language may be an English description of a computer program, and the translation is the Python code needed to execute that program.

In the case of ChatGPT, the reference language is in the form of questions, and the output language is in the form of answers.

LLMs are neither artificial nor intelligent, they perform a translation that is defined by rules extracted from samples.

 

In our next article, we’ll explore how DeepSeek tackled this challenge, finding ways to train an advanced LLM without access to the fastest GPUs.

Previous
Previous

DeepSeek’s Breakthrough – Experts and Efficiency in AI

Next
Next

The Future of AI and Copyright: What Lies Ahead for SMEs?