As the pace of business increases, business intelligence technologies have been forced to speed up so they can crunch more data and do it faster. On Thursday, IBM unveiled InfoSphere Streams, a high-performance computing system designed to, in essence, provide business intelligence capabilities on steroids.
Based on IBM’s System S stream computing technology, InfoSphere Streams lets users analyze structured and unstructured data in real-time from thousands of sources.
Users can mash up data to create new applications based on goal descriptions from business users.
Stream computing automatically reconfigures system resources to cope with workloads as needed.
Stream Computing Tech Details
Like other business intelligence processing systems, stream computing processes raw data in response to queries from business users and then returns results. Unlike some other systems, however, it can process the raw data in real-time, assemble and serve up answers on the fly, and dynamically adapt to accept new data streams and respond to changing user requirements.
Stream computing systems can process raw data coming in from video, television broadcasts, blogs, electronic sensors and other sources.
They can scale seamlessly from a single server to thousands of nodes in racks or to special-purpose hardware architectures such as IBM Cell Broadband Engine and IBM Blue Gene.
The Cell Broadband Engine, or Cell for short, is a microprocessor architecture jointly developed by Sony Computer Entertainment, Toshiba and IBM beginning in 2001. Its first major commercial application was in Sony’s PlayStation 3 video game console.
Blue Gene is a family of supercomputers from IBM, one of which will participate in the television quiz show “Jeopardy” next year.
More About Stream Computing
The stream processing core continually monitors and adapts to the state and utilization of the system’s computing resources, users’ information needs, and the availability of data to meet those needs.
Feedback mechanisms make it a self-learning system that continuously recalibrates itself to improve the speed and accuracy of analysis.
The stream computing platform can let thousands of user-generated applications share computational resources by sharing intermediate analytics results across multiple applications whenever possible, so the applications do not have to duplicate previously performed analyses.
User-generated applications are dynamically constructed from an extensible set of reusable components. New applications are created as new data sources and algorithms are added.
“In a sense, you’re talking about a system that enables dynamic or near real-time data analysis,” Charles King, principal analyst at Pund-IT, told TechNewsWorld.
Super MARIO
The on-the-fly applications served up by the system are created through a technology codenamed “MARIO” — Mashup Automation with Runtime Invocation and Orchestration.
Here’s how it works: Users select familiar business and analytics terms from a menu to tell the system what they want. MARIO then assembles a stream processing application from this input, using configurable components.
MARIO then gives the business users feedback by presenting the application as a graph of connected components so users can refine their requests.
Implications of InfoSphere Streams
The global economy might be in recession, but the pace of business keeps on increasing, and business intelligence technology must be able to boast high speeds in order to be useful.
That has led business intelligence software vendors to move into complex event processing, or CEP.
CEP deals with processing multiple events to identify the ones that are meaningful. It uses techniques such as detection of complex patterns, event correlation and abstraction, event hierarchies, and relationships between events.
By being able to process both structured and unstructured raw data, InfoSphere approaches one of the problems enterprises face when they try to make sense of the business intelligence they receive: how to manage and analyze unstructured data. Much corporate data lies in unstructured piles of documents, and dealing with these growing stacks of tangled information has become increasingly difficult.
But What Does It All Mean?
Crunching massive amounts of data in real-time and serving up results on the fly is impressive, but will it help? After all, Wall Street firms had the most complex, rapid data processing systems available, and many of them still tanked last autumn.
“Real-time data may be valuable in a very few specialized niches, but that’s a long way from conventional business intelligence,” Nigel Pendse, an analyst at The BI Survey, told TechNewsWorld.
“Just because it’s possible doesn’t make it useful to businesses,” he remarked.
“Stream computing is not necessary or applicable for a huge number of applications,” agreed Pund-IT’s King, “but trading analysis, and weather or ecological system analysis are areas where it would be useful.”
It’s about time. this is a great development, because the mountains of data growing out there are waiting for mining techniques and, by extension, monetization. The possibilities are endless.
The challenge for IBM will be to create a GUI that can be used by people other than IT PhD’s. The biggest market share will come from "regular" users, not pointy-headed analysts (of which I AM one) in smaller businesses. This has been the curse of Wolfram Alpha.
In any event, it is a great leap forward because, as the saying goes "Thar’s gold in them thar hills!"