IBM and Google disclosed on an initiative the two technology heavyweights hope will help computer science students and researchers learn more about and gain experience in what has come to be known as “cloud computing.” Ultimately, the goal is to create a new generation of software developers able to write software that will fully take advantage of the capabilities of parallel computing.
“This project combines IBM’s historic strengths in scientific, business and secure-transaction computing with Google’s complementary expertise in Web computing and massively scaled clusters,” said Samuel J. Palmisano, chairman, president and CEO of IBM. “We’re aiming to train tomorrow’s programmers to write software that can support a tidal wave of global Web growth and trillions of secure transactions every day.”
Google is “excited to partner with IBM to provide resources which will better equip students and researchers to address today’s developing computational challenges,” said the company’s CEO Eric Schmidt. The training, according to Schmidt, is an imperative to ensure that students are equipped to “harness the potential of modern computing systems” and provide a way for researchers to “innovate ways to address emerging problems.” The project is no luxury, but a necessity, he continued, that will enable Google to “most effectively serve the long-term interests” of its users.
Stormy Clouds
The issue IBM and Google are attempting to address is an apparent dearth of technical training at colleges and universities in this realm of extremely complex Internet-scale computing. Cloud computing, or parallel computing, is the process by which scores of processors are used to solve a common problem.
“The way we compute today is we have a problem and we give it to a computer and we say to that computer, ‘you solve it and you will dedicate your resources to it,'” Martin Reynolds, an analyst at Gartner, told TechNewsWorld. “That’s pretty easy. You give it the problem, and it’s done. The problem is that computers are going in a different direction.”
This cutting-edge form of computing is visible at large companies such as Google and its network.
“You see it happening in the small as well with what’s going on with multicore [server processors],” Reynolds continued. “If you look at a server now, you’ve got four core processors, and soon it will be eight core, each with two threads. You put four of those in a box and all of a sudden you’ve got 64 balls in the air that you have to juggle, up from four just a few years ago.”
The problem, then, exists on both ends, with large-scale environments created by Amazon, Google, Microsoft and others as well as a scale-up in the small environment that are becoming richer and richer in terms of the things they can juggle at once.
“The problem is that our computer problems are not really built for that system. The way we build computer problems is today is we expect to put them on one computer and have them run,” Reynolds said. “There is not a lot of scalability in that approach. It also turns out that when you look at these very large-scale computing systems, one of the things that happens is that the individual elements become somewhat unreliable. There’s actually no guarantee that a particular computer will be available all the time.
“It might be available most of the time, but your problem solving has to be such that if something goes away, it doesn’t matter. Something else picks it up and runs it.”
The benefit to this type of approach, Reynolds pointed out, is that the computer costs can be extraordinarily low because there is a whole pool of servers out there that everyone can share. The collective will deliver very low-cost computing.
“You can see there’s a real benefit to doing this, but it is very, very, hard to program this stuff. There are just too many problems, and the solutions are not well known. Google succeeds because Google builds very specific narrow applications that run on its cloud. But it hasn’t been able to extend that to a general purpose programming model where anyone can come along and tap in.”
The big goal, according to Reynolds, is how the technology industry can encourage the next generation of computer scientists to figure out how to build software that can run in these massively parallel environments.
Clouds in the Future
To achieve that end, IBM and Google are hooking up to provide the hardware, software and services universities need to augment their curricula and expand the scope of their research. The aim is to lower the financial and logistical impediments that have prevented the academic community from exploring this emerging model of computing.
The project will start off small as a pilot program available at six schools that have joined the initiative: the University of Washington, Carnegie-Mellon University, Massachusetts Institute of Technology, Stanford University, the University of California at Berkeley and the University of Maryland. Google and IBM said the program will expand to include additional researchers, educators and scientists in the future.
The two partners have created a variety of resources to simplify the development of massively parallel programs.
Among them are: a cluster of processors running an open source implementation of Google’s publish computing infrastructure; a Creative Commons licensed university curriculum focused on massively parallel computing techniques; open source software from IBM to help students develop programs for clusters running Hadoop; management, monitoring and dynamic resource provisioning of the cluster by IBM; and a Web site to encourage collaboration among universities in the program.
More specifically, Google will complete a data center containing more than 1,600 processors by the end of 2007. IBM is also creating a data center on its end.
Likely Pair
Solving the problems in massively parallel computing would mean that problems could be flipped around the world from one server network to another without concern about reliability or other problems that could trip it up, Reynolds said.
“We’d get the problem solved in the most cost-effective and fastest way possible,” he noted.
IBM and Google are engaged in the problem from two different sides.
“IBM sees it from the side of, ‘Computing is always getting cheaper and what is happening is that it is becoming so cheap to compute that all kinds of problems can now be solved, except we can’t program them. We can’t afford the programmers to deal with them, this stuff, anymore. And that is becoming a bottleneck.'”
The company is trying to deliver new business solutions that will make its future investments in technology pay off.
Google, on the other hand, has a very large computing system that it would like to use to do more and more computing, and it’s hoping to plant the seeds that will allow people to do general purpose computing in its environment.
“It’s a fundamental shift in computer science. What we are looking at here is an exercise to try and make parallel computing applicable to every day computer problems. In the long run, about five to 10 years, these programs will start to show results,” Reynolds concluded.