MAI-Transcribe-1, Voice-1, Image-2: Microsoft’s big AI upgrade

12 hours ago 6
ARTICLE AD BOX

New Delhi: Microsoft has officially unveiled its latest set of AI models, which includes: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, focused on the improving speech, voice, and image generation capabilities. These models are currently available via Microsoft Foundry and the MAI Playground, with the aim of faster performance, efficiency, and competitive pricing.

This rollout comes with upgrades across transcription accuracy, voice generation, and image creation, with Microsoft also integrating these capabilities into its own products.

MAI-Transcribe-1

This is designed for speech-to-text tasks and supports transcription across the top 25 most used languages, based on the FLEURS benchmark. This model is been built to handle those real-world audio conditions and delivers batch transcription speeds that are 2.5 times faster than its existing Azure Fast offering.

MAI-Voice-1

This model aims at voice generation, producing speech with natural tone, emotional range, and consistency across longer content. The company has also added support for creating custom voices by using a short audio sample. The model can also generate up to 60 seconds of audio in one second, with Microsoft highlighting efficient GPU usage for cost-effective performance.

MAI-Image-2

It offers at least twice the generation speed commands to earlier systems on Foundry and Copilot, based on production data. The model is especially designed to deliver realistic lighting, accurate skin tones, and clear text rendering for visual content. It is also being rolled out in phases across services such as Bing and PowerPoint.

All these latest three models are available starting on Microsoft Foundry, with MAI Playground access currently limited to users in the US. Microsoft has also positioned the models as providing a competitive price to performance across cloud providers.

Pricing:
  • MAI-Trnascribe-1 at $0.36 per hour.
  • MAI-Voice-1 at $22 per one million characters and $5 per one million tokens for text input.
  • MAI-Image-2 at $33 per one million tokens for image output.

Microsoft has also noted that these latest models are also being used within its own products and are available for developers to build applications and services.

Read Entire Article