Few people hold the status of “visionary” in the computer processor field, but Sun Microsystems’ Marc Tremblay fits the bill. Sun fellow, vice president, and chief architect for the firm’s Scalable Systems Group, Tremblay foresaw the advent of “throughput computing” and the jump in performance afforded by multiple cores and multiple computing threads.
Tremblay was co-architect for Sun’s UltraSPARC I, and chief architect for the UltraSPARC II microprocessor. He was also the chief architect for Sun’s MAJC (Microprocessor Architecture for Java Computing) Program, and architected the picoJava processor core, a Java bytecode engine.
Recently Sun announced the eight-core Niagara chip, also known as the UltraSPARC T1, which quadruples the number of thread processing to 32. This throughput thoroughbred not only ups performance, but also reduces power consumption. TechNewsWorld got in touch with Tremblay to talk about Niagara, and to get his take on what’s next for server chips.
TechNewsWorld: The new UltraSPARC T1 processor represents a number of changes, including 32-thread, simultaneous computing and reduced power consumption. What do you see as the most significant aspects of the new chips?
Marc Tremblay: We had the courage to challenge the old way of building processors and go all out for throughput. The UltraSPARC T1 maps so well to commercial applications, as opposed to specint and specfp (benchmarks), for instance. We had the courage to build a server chip as opposed to a derivative of a desktop chip. From a technical standpoint, I would say that the achievement was to build 32-way Symmetric MultiProcessing (SMP) in silicon in 90 nanometer and only dissipate 70 Watts. Few people would have thought that this was possible. The rest of the world has two or four general purpose threads dissipating more than 100 Watts.
TNW: Would you agree that these processors are not necessarily very efficient with instructions, but make up for it in sheer numbers with the 32-thread capability?
Tremblay: The basic pipeline is indeed very simple and does not offer much Instruction Level Parallelism (ILP). It does not support multi-issue, branch prediction, speculation, etc. That is key since it buys us tons of power and area and allows us to have four threads per core and the ability to put eight cores per chip. Not trimming down cores would have made that impossible. The art is really to trim things that don’t hurt server applications.
TNW: Sun has touted the chips’ energy use as half that of competing processors from Intel and IBM. How did Sun achieved the power savings?
Tremblay: Mostly by getting rid of power-hungry structures such as large associative memories, or content addressable memory (CAM) that are found in all out-of-order processors. Getting rid of any kind of speculation also means that the processor always does work that counts, as opposed to speculative work and/or predication.
TNW: How important is it to keep pace with the multi-threaded application environment of the Internet?
Tremblay: The Internet has millions of threads as the Google server farm shows. The key for the T1 is that it can run 32 threads or processes — any mix. So even if the application isn’t multi-threaded, e.g. a search engine, it can still run multiple copies of it and therefore run 32 simultaneous searches. In addition to that, we are of course launching the next-wave development environment to enable efficient, bug-free, multi-threaded programming.
TNW: How did Sun approach the balance of performance, energy and heat issues in the design of the UltraSPARC T1?
Tremblay: That’s the art; identifying what really matters to the customer and having the courage to delete features that have been accepted as “good” by the community for years. At the end of the day, it was fairly easy to decide once we adopted that mindset.
TNW: Do you expect competitors will respond to Sun’s message to the industry “that the problems associated with power and cooling are just as important as keeping up with performance,” as IDC analyst Vernon Turner stated recently?
Tremblay: From what I have already seen, not a single vendor will go against that. They are all going in that direction. We are fortunate to have taken the turn four years ago, and started a design from scratch. I don’t believe that one can get this class of performance/watt with legacy cores. One has to design for it, from scratch.
TNW: If parallel processing and power savings are some of the UltraSPARC T1’s greatest achievements, what can we expect next from such server processors from Sun?
Tremblay: More throughput, more parallelism and more robustness per thread, both from a performance and RAS standpoint. Since we are a system company, we are also spending transistors on system-level functions such as network acceleration and crypto-acceleration.