Alina Khay

Alina Khay

Share this post

Alina Khay
Alina Khay
Future of LLMs

Future of LLMs

The GPU shortage is showing signs of improvement. Tech giants like Amazon, Google, and Microsoft are stepping up to challenge Nvidia with their own GPU-friendly chips.

Alina Khay's avatar
Alina Khay
Oct 19, 2024
∙ Paid
2

Share this post

Alina Khay
Alina Khay
Future of LLMs
Share

The GPU shortage is showing signs of improvement. Tech giants like Amazon, Google, and Microsoft are stepping up to challenge Nvidia with their own GPU-friendly chips.

The Rise of Compact Models

A noteworthy trend is the shift towards smaller models that deliver big performance. For instance, Mistral's 7B model outperforms the 13B version, and Llama-2 70B can be fine-tuned to match the performance of GPT-3.5 (180B).

Mixture of Experts (MOE) Architecture

The future may lie in MOE architecture, where open-source models are finely tuned for specific tasks. This ensemble of "expert" models has the potential to achieve GPT-4-like performance. With each model being under 100B in size, they become accessible for research and mass-market use, even for those with limited GPU resources.

Microsoft's DeepSpeed team, part of the AI at Scale initiative, explores MoE model applications and optimizations at scale. They were able to cut training cost by 5x, resulting in up to 4.5x faster and 9x cheaper inf…

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Alina
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share