pds-it
['Blog post','no']
Microsoft Technology
Blog

Microsoft's Maia and Cobalt chips: The background

Contents

    With the two chips named Maia 100 and Cobalt 100, Microsoft has recently added a proprietary development in the field of artificial intelligence and machine learning to its portfolio, which will be used for the first time in the first few months of 2024. Designed by Microsoft, Maia is used to optimize the performance of large AI models. The main purpose of the Maia chip is to increase the efficiency and performance of AI computations, with a particular focus on large-scale computational models. The development of Maia is a response to the growing demands of AI applications that require significant computing power.

    In contrast to conventional, universally applicable hardware components, the design of the Maia chip is tailor-made for tasks in machine learning, such as matrix multiplication, which is one of the foundations of neural network calculations. In computer graphics, matrix multiplication is used to transform the rotation, scaling and translation of objects. This process is particularly important for neural networks and deep learning. Here, matrices are used to represent weights and inputs of neurons, and the multiplication of these matrices is a central part of the process that enables a network to learn from data and make predictions.

    Maia 100: Functionality and capabilities

    The Maia chip is characterized by its efficiency in AI tasks such as natural language processing and image recognition, whereby the architecture has been optimized for the fast execution of deep learning algorithms. In times of rising energy prices, the energy efficiency of the Maia chip has been one of the most important criteria in its development. The bottom line is that Maia performs computations more efficiently than the hardware commonly used to date, significantly reducing energy consumption, which is particularly important for large data centers. In addition, the chip is designed to integrate seamlessly with Microsoft's Azure cloud infrastructure. Microsoft's AI services such as Azure, Bing, Microsoft 365 and also the Azure OpenAI service can therefore utilize the improved computing power without the need for significant architectural changes.

    A key element of the Maia 100, which was developed in close collaboration with OpenAI, is its support for sub-8-bit data types, which speeds up the training and inference of models. The processor was developed with the benefit of OpenAI's deep insight into the workloads that are specifically tailored for large language models. In addition, the Maia 100 is a fully liquid-cooled server processor, which in turn enables higher server density. In parallel, special server racks and liquid coolers have been developed to allow the most seamless integration possible into Microsoft's existing data center infrastructure.

    Details of the Cobalt 100

    The Cobalt 100 CPU from Microsoft is an energy-efficient 128-core chip based on an Arm Neoverse CSS architecture. This in turn is optimized for efficiency and performance in cloud-native offerings and will power new virtual machines for customers this year and beyond. When choosing Arm technology, the focus was on optimizing "performance per watt" in the data centers.

    Unlike the Maia 100, which was developed as an AI accelerator with a focus on AI workloads, the Cobalt 100 is a more versatile CPU designed for a broader range of cloud-based services and applications. So while the Maia 100 is specifically focused on AI-related tasks such as model training and inference, the Cobalt 100 is designed to support a variety of cloud-native services.

    Microsoft is still keeping some of the specific technical details of the Maia 100 and Cobalt 100 under wraps, and results from benchmarking test suites such as MLCommons/MLPerf have not yet been published for the Maia 100. It is therefore not yet possible to say much about the performance of the new chips and how they compare with competitor products. However, unlike Amazon's Trainium 2, which is still in development and focuses on generative AI and machine learning, Nvidia's new H200 AI GPU chip or Google's recently launched TPU v5e, the two chips are explicitly tailored for optimal collaboration with Microsoft's portfolio of AI tools and services. By focusing on specialized hardware for AI, Microsoft has positioned itself prominently in the competitive landscape of cloud computing and AI hardware with the Maia 100 and the Cobalt 100. For further insights and detailed comparisons, however, specific performance benchmarks and case studies are required.

    Develop Generative AI Solutions with Azure Open AI Services

    With this course aimed at software developers and data scientists, "Develop Generative AI Solutions with Azure Open AI Services" you will get an introduction to the topic of generative AI and thus to a very rapidly developing area of artificial intelligence. You will learn how to use Azure OpenAI to create solutions with AI models within Azure and how to create Azure OpenAI service solutions.

    Author
    Stefan Schasche
    As an experienced IT editor, Stefan Schasche writes about everything that has microchips or Li-ion batteries under the hood. He also reports on campaigns, programmatic advertising and international business topics.