Date:

Microsoft Phi SLMs Trained on NVIDIA GPUs

Large Language Models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size, they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridges quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to focus on specific domains and are built with simpler neural architectures.

Microsoft Announces New Generation of Open SLMs

Microsoft has announced the new generation of open SLMs, which includes two new additions: Phi-4-mini and Phi-4-multimodal.

Phi-4-Multimodal

Phi-4-multimodal is the first multimodal model to join the family, accepting text, audio, and image data inputs. This model is small enough for on-device deployment, building on the December 2024 research-only release of the Phi-4 14B parameter SLM and enabling commercial use for the two new smaller models.

Why Invest in SLMs?

SLMs enable generative AI capabilities in memory and compute-constrained environments. For example, SLMs can be deployed directly on smartphones and consumer-grade devices. On-device deployment can facilitate privacy and compliance for use cases that must adhere to regulatory requirements. Other benefits of SLMs include lower latency due to inherently faster inference compared to an LLM of similar quality.

Phi-4-Multimodal

Phi-4-multimodal is with 5.6B parameters and accepts audio, image, and text reasoning. This enables it to support use cases such as automated speech recognition (ASR), multi-modal summarization, translation, OCR, and visual reasoning. This model was trained on 512 NVIDIA A100-80GB GPUs over 21 days.

Why is Phi-4-Multimodal Important?

Phi-4-multimodal excels at ASR, ranking #1 on the Huggingface OpenASR leaderboard with a word error rate of 6.14%. Word error rate (WER) is the commonly used calculation to quantify the performance of speech recognition. WER calculates the percentage of incorrectly transcribed words (substitutions, insertions, and deletions) compared to the correct text.

Get Started Today

Bring your data and try out Phi-4 on the NVIDIA-accelerated platform at build.nvidia.com/microsoft. On the first multi-modal sandbox for Phi-4-multimodal, you can try out text, image, and audio as well as sample tool calling to see how this model will work for you in production.

Conclusion

In conclusion, the new generation of open SLMs, including Phi-4-mini and Phi-4-multimodal, offers a more accessible and cost-effective way to leverage AI capabilities. With its smaller resource footprint, SLMs can be deployed in memory and compute-constrained environments, making them ideal for a wide range of applications.

FAQs

Q: What are the benefits of using SLMs?
A: SLMs enable generative AI capabilities in memory and compute-constrained environments, and can be deployed directly on smartphones and consumer-grade devices.

Q: What is Phi-4-Multimodal?
A: Phi-4-Multimodal is a multimodal model that accepts text, audio, and image data inputs, and supports use cases such as automated speech recognition, multi-modal summarization, translation, OCR, and visual reasoning.

Q: How do I get started with Phi-4?
A: You can bring your data and try out Phi-4 on the NVIDIA-accelerated platform at build.nvidia.com/microsoft.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here