DeepSeek’s Breakthrough – Experts and Efficiency in AI

26 Feb

The addition of Attention enabled Translators to process complex concepts

During the evolution of automatic translators, it was realised that the chance of a word occurring in a sentence was highly dependent on the structure of the sentence. A sentence that contains the word ‘tomato’ is much more likely to include ‘greenhouse’ than ‘particle accelerator’.

The Power of Attention in AI

Humans pay attention to the context of what they plan to read by looking at a headline, skimming the document or reading the first line. This allows us to make sense of the content before we read it, the content is processed more accurately and efficiently. There are many ways to replicate attention during an automated translation, all emulating what we do instinctively.

Overcoming GPU Limitations

Processing text word-by-word to focus the algorithm’s attention on the context is painfully slow in all but the most trivial cases, even on the fastest computer. There was a need for a method of identifying the train-of-thought of large sequences of tokens by working on all the tokens at the same time.

The parallel processing of tokens to extract attention was first implemented in 2017, although it came with a substantial increase in computing requirements.

The use of Graphics Processing Units (GPUs) for fast parallel processing of numerical methods was well established by 2017, and the translation of long sequences of tokens using LLMs became feasible for the first time.

Even so, determining the colossal number of permutations that exist between all sequences of input tokens and generated output, represents a computational, financial and technical barrier to all but the wealthiest of AI companies.

In our next article, we’ll explore the use of embedded experts in DeepSeek’s approach.

Aralia Insights

Isla King

DeepSeek’s Breakthrough – Experts and Efficiency in AI

The addition of Attention enabled Translators to process complex concepts

The Power of Attention in AI

Overcoming GPU Limitations

Aralia Systems

Location

Contact

DeepSeek’s Breakthrough – Experts and Efficiency in AI

The addition of Attention enabled Translators to process complex concepts

The Power of Attention in AI

Overcoming GPU Limitations

Improved use of embedded Experts is the secret to DeepSeek’s success

The Evolution of Large Language Models

Aralia Systems

Location

Contact