Future of LLMs
The GPU shortage is showing signs of improvement. Tech giants like Amazon, Google, and Microsoft are stepping up to challenge Nvidia with their own GPU-friendly chips.
The GPU shortage is showing signs of improvement. Tech giants like Amazon, Google, and Microsoft are stepping up to challenge Nvidia with their own GPU-friendly chips.
The Rise of Compact Models
A noteworthy trend is the shift towards smaller models that deliver big performance. For instance, Mistral's 7B model outperforms the 13B version, and Llama-2 70B can be fine-tuned to match the performance of GPT-3.5 (180B).
Mixture of Experts (MOE) Architecture
The future may lie in MOE architecture, where open-source models are finely tuned for specific tasks. This ensemble of "expert" models has the potential to achieve GPT-4-like performance. With each model being under 100B in size, they become accessible for research and mass-market use, even for those with limited GPU resources.
Microsoft's DeepSpeed team, part of the AI at Scale initiative, explores MoE model applications and optimizations at scale. They were able to cut training cost by 5x, resulting in up to 4.5x faster and 9x cheaper inf…