Selective Forgetting for Faster Learning in AI

The vast capabilities of Artificial Intelligence come at a cost of immense computational power and resource requirements. This is particularly true for training Large Language Models (LLMs) that can understand and process human language. Selective Forgetting allows AI to learn more efficiently, by strategically forgetting some information.

Training Language Models

Today’s AI language engines rely heavily on artificial neural networks. The concept of selective forgetting in AI is almost exactly the same as how our brains function. Human memory prioritizes storing the “gist” of experiences rather than minute details. This allows for flexible adaptation and continuous learning. These complex structures mimic the human brain, with interconnected “neurons” processing information. By incorporating similar forgetting mechanisms, AI models can achieve more human-like learning processes.

Training an LLM involves feeding it massive amounts of text data in multiple languages. The network gradually adjusts connections between neurons, learning to associate words and concepts across languages.

But it comes with its own limitations, especially for languages with limited training data –

High Computational Cost: Training LLMs requires significant computing power, making it expensive and time-consuming.
Limited Adaptability: Once trained, adapting an LLM to new languages is difficult. Starting from scratch is impractical, while retraining the entire model can be resource-intensive.

Selective Forgetting

Selective forgetting is inspired by how our brains consolidate memories. This method is a promising solution to the limitations of training huge datasets. The core idea is to allow the LLM to strategically forget specific information during training. This method comes with its own benefits –

Improved Efficiency: By focusing on essential knowledge, the LLM can learn faster and require less computational power.
Enhanced Adaptability: The ability to forget allows the LLM to learn new languages more easily by building upon existing knowledge.

The Experiment of Selective Forgetting

Researchers have explored the potential of selective forgetting in language learning. One study focused on a technique called “embedding layer erasure.” Here’s how it works:

Train an LLM in one language.
Erase information about word building blocks (tokens) stored in the first layer (embedding layer).
Retrain the LLM on a second language, allowing the embedding layer to adapt to new tokens.

Surprisingly, even with this seemingly disruptive process, the LLM successfully learned the second language. This suggests that while the embedding layer stores language-specific details, deeper network layers capture more abstract knowledge about language concepts. This “conceptual understanding” facilitates learning new languages.

One Step Ahead: Periodic Forgetting

While embedding layer erasure proved effective, it still required significant retraining. Researchers proposed a more efficient approach: periodic forgetting. Here, the embedding layer is periodically reset during the initial training itself. This “trains” the entire model to adapt to forgetting, making it easier to learn new languages later.

Experiments comparing the periodic forgetting model with the standard approach yielded encouraging results:

The forgetting model performed slightly lower on the initial language (85.1 vs. 86.1).
When retrained on new languages with limited data (5 million tokens vs. 70 billion), the forgetting model’s accuracy dropped less significantly (62.7 vs. 53.3).
When computational limitations were imposed, the forgetting model maintained significantly better performance compared to the standard model (57.8 vs. 37.2).

These findings suggest that periodic forgetting enhances the LLM‘s general language learning ability. By continuously forgetting and relearning during training, the model becomes adept at adapting to new languages with less data and computational resources.

Future Applications

Reduce Bias: By forgetting potentially biased data from training datasets, AI models can become fairer and more objective.
Improve Privacy: Sensitive data used during training can be selectively forgotten before deploying the model, enhancing privacy protection.
Promote Lifelong Learning: AI systems can continuously learn and adapt to new information streams by strategically forgetting irrelevant data.
More Flexible LLMs: AI models capable of adapting to a wider range of languages, even those with limited data availability.
Democratization of AI: Selective forgetting can make AI breakthroughs accessible to languages currently underserved by large tech companies.
A New Generation of AI: AI systems with improved adaptability, efficiency, and ethical considerations, paving the way.