Chips

Microsoft Unveils Real-Time AI for Azure

Microsoft on Wednesday unveiled a new deep learning acceleration platform designed for real-time artificial intelligence, codenamed “Project Brainwave,” at Hot Chips 2017.

The platform has three main layers:

  • a high-performance, distributed system architecture;
  • a hardware DNN (deep neural network) engine synthesized onto FPGAs (field programmable gate arrays); and
  • a compiler and runtime for low-friction deployment of trained models.

Project Brainwave leverages the massive FPGA infrastructure from Project Catapult that Microsoft has been deploying in Azure and Bing over the past few years.

AI in Real Time

“FPGA is a way to add and apply dedicated task-specific computing power geared to deep neural nets to conventional cloud infrastructure,” said Doug Henschen, principal analyst at Constellation Research.

“This makes it easier to develop for conventional server capacity and let the FPGAs provide the computing power necessary for AI workloads,” he told TechNewsWorld.

Attaching high-performance FPGAs directly to Microsoft’s data center network lets DNNs be served as hardware microservices, calling them by a server with no software in the loop. This reduces latency and allows very high throughput.

“Real-time AI is the eventual goal for the vast majority of projects,” said Rob Enderle, principal analyst at the Enderle Group.

“AI should be able to move at the speed of thought, or it’ll just be an advanced script,” he told TechNewsWorld.

Project Brainwave’s Guts

Project Brainwave uses a soft DNN processing unit, or DPU, synthesized onto commercially available programmable gate arrays. This lets it scale across a range of data types, with the desired data type being a synthesis-time decision.

Microsoft’s soft DPUs combine the ASIC digital signal processing blocks on the FPGAs with the synthesizable logic to provide a greater and more optimized number of functional units.

The DPUs use highly customized, narrow-precision data types defined by Microsoft, which increase performance without real losses in model accuracy. Research innovations can be incorporated into the hardware platform rapidly, typically in weeks.

Project Brainwave incorporates a software stack supporting the Microsoft Cognitive Toolkit (MCTK) and Google’s Tensorflow. Support for other frameworks will be added later.

Tensorflow is “the currently dominating machine learning technique,” said Holger Mueller, principal analyst at Constellation Research.

“That buys Microsoft time to strengthen MCTK,” he told TechNewsWorld.

Microsoft’s Project Brainwave Demo

At Hot Chips, Microsoft demonstrated the Project Brainwave system ported to Intel’s 14nm Stratix 10 FPGA.

It ran a gated recurrent unit (GRU) model five times larger than Resnet-50 with no batching, using Microsoft’s custom 8-bit floating point format (ms-fp8).

It sustained 39.5 Tflops of data, running each request in under one millisecond.

Microsoft will bring Project Brainwave to Azure users, complementing indirect access through services such as Bing.

“This is a good place to start for many of Microsoft’s AI efforts,” said Ray Wang, principal analyst at Constellation Research.

“What’s been visibly missing is a rich neural network. You can’t do machine learning or AI without one,” he told TechNewsWorld.

Fraud detection, retail mass personalization at scale, dynamic pricing and insurance adjustment are among the businesses that would benefit from real-time AI, Wang noted.

Dealing With Competitors

Most customers and technology partners that Constellation has spoken to have gone to the Google Cloud Platform using Tensorflow, Wang said.

Google will be Microsoft’s biggest competitor at first, he predicted.

“In the long run, it’ll be those with massive compute power that will lead AI,” Wang said, “such as Facebook, Alibaba, Tencent and Amazon.”

The FPGA-based service “will likely be a popular and cost-effective option, but Microsoft will surely also offer GPU infrastructure options geared to AI as well,” Henschen remarked. “IBM and Google have both brought GPU compute power to their respective clouds.”

Richard Adhikari

Richard Adhikari has been an ECT News Network reporter since 2008. His areas of focus include cybersecurity, mobile technologies, CRM, databases, software development, mainframe and mid-range computing, and application development. He has written and edited for numerous publications, including Information Week and Computerworld. He is the author of two books on client/server technology. Email Richard.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

What's your outlook for the business climate in 2025?
Loading ... Loading ...

Technewsworld Channels