Tony Cass, the leader of the European Organization forNuclear Research’s (CERN’s) database services group, outlined someof the challenges the organization’s computer system faces during hiskeynote speech Wednesday at LISA, the 24th Large Installation SystemAdministration Conference, being held in San Jose, Calif., through Friday.
Smashing beams of protons and ions together at high speeds in CERN’sLarge Hadron Collider generates staggering amounts of data that requires asophisticated computer system to handle.
The CERN computing system has to winnow out a few hundred good eventsfrom the 40 million events generated every second by the particlecollisions, store the data and analyze it, manage and control thehigh-energy beams used and send and receive gigabytes of data everyday.
Numbers, Numbers, Numbers
The accelerator generates 40 million particle collisions, or events,every second. CERN’s computers pick out a “few hundred” per second ofthese that are good, then begins processing the data, Cass said.
These good events are recorded on disks and magnetic tapes at 100 to 150Mbps (megabits per second). That comes up to 15 petabytes of data a year for all fourCERN detectors — Alice, Atlas, CMS and LHCb. The data is transferredat 2 Gbps (gigabits per second) and CERN requires three full Oracle SL8500 tape robots ayear.
CERN forecasts it will store 23 to 25 petabytes of data per year, whichis 100 million to 120 million files. That requires 20,000 to 25,000 1-terabyte tapes a year. The archives will need to store 0.1exabytes, or 1 billion files, in 2015.
“IBM and StorageTek and Oracle have good roadmaps for their tapetechnology, but still managing the tapes and data is a problem,” Casssaid. “We have to reread all past data between runs. That’s 60 petabytesin four months at 6 Gbps.”
A “run” refers to when the accelerator is put into action. StorageTek isnow part of Oracle, whose databases CERN uses.
CERN has to run 75 drives flat out at a sustained 80 Mbps just tohandle controlled access, Cass said.
Dealing With the Data
CERN uses three Oracle accelerator database applications.
One’s a short-term settings and control configuration that retainsdata for about a week. “As you ramp up the energy (for the beams) youneed to know how it should behave and to have control systems to seehow it’s behaving and, if there’s a problem, where does it come from,”Cass explained.
The second is a real-time measurement log database that retains data for a week.
The third is a long-term archive of logs that retains data for about20 years. There are 2 trillion records in the archives, which aregrowing by 4 billion records a day. Managing that is complicated.”They want to do searches across the full 2 trillion rows ever nowand then,” Cass remarked.
There are 98 PCs in all in CERN’s control system, which consists of150 federated Supervisory Control and Data Acquisition (SCADA) systemscalled “PVSS” from ETM, a company now owned by Siemens. The PCs monitor934,000 parameters.
Overall, CERN has about 5,000 PCs, Cass stated.
CERN’s processing power is distributed worldwide over a grid. “Thereare not many computing grids used on the scale of the LHC computinggrid, which federates the EG, EGI and ARC science grids in Europe andthe Open Science Grid in the United States,” Cass said. “The Grid isenabling distributed computing resources to be brought together to run1 million jobs a day. Grid usage is really good.”
CERN has a Tier Zero center, 11 Tier One centers at different labs, and150 Tier Two centers at various universities. Tier Zero performs datarecording, initial data reconstruction and data redistribution, Casssaid. Tier One is for permanent storage, reprocessing and analysis,while Tier Two is for simulation and end user analysis.
CERN has also developed a Google Earth-based monitoring system thatruns about 11,400 jobs worldwide at a data transfer rate of6.55 Gbps.
Problems, Problems, Problems
CERN’s still struggling with system failures because of the complexityof its setup.
“There’s a lot of complex technology, a lot of systems that need tointeroperate to transfer data within CERN, the system to talk betweendifferent storage systems that have slightly different mindsets, allthis is complicated, and because it’s complicated, it fails,” Casspointed out.
For example, there are conflicts between file sizes, file placementpolicies and user access patterns.
“When people want to read data back they want all the data recorded atthe same moment at the same time and have to mount several tapes to dothat, so there’s a conflict between the right access patterns and theright storage patterns,” Cass pointed out.
Hardware failures are frequent and can cause problems for storage anddatabase systems, Cass stated. “We have something like 200,000 disksacross the grid and are getting disk failures every hour around thegrid, and when they cause storage system problems, there are even morefailures,” he added.
Infrastructure failures are “a fact of life,” Cass said. These usuallyconsist of a loss of power and cooling systems going down.
The overall computing structure is also an issue. “We’re trying tomove from computer center empires to a federation with consensusrather than control,” Cass remarked. “We do have consensus, butcommunication is a problem.”
For example, the Quator and Lemontools CERN’s ITdepartment developed for fabric management and monitoring haven’t beenadopted by all its Tier One and Two sites.
“A major financial institution with tens of thousands of boxes — farmore than we have — has adopted them,” Cass stated. “The idea thatwe’d be able to have a common configuration management across thesystem still hasn’t worked out.”
There are also problems with the shared file system. CERN’s softwarehas to be distributed to 150 sites around the world, and it uses AFSwhile “a few hundred” nodes use NFS, which creates a bottleneck.
AFS, or the Andrew File System, is a distributed networked file systemwhich uses trusted servers to present a homogenous file name space toall connected workstations regardless of their location. NFS, theNetwork File System, is a protocol developed by Sun Microsystems. Itlets client computers access files over a network as if those fileswere in local storage.
Solutions, Solutions, Solutions
CERN is developing a solution for the shared file system bottleneck.This is CERN VM, a virtual software installation with an HTTP filesystem based onGrowfs using HTTP caches that will cache data on nodes.
“Over the next couple of years, CERN VM will be used more and more forsoftware distribution and to resolve file system bottlenecks,” Casssaid.
To resolve the problems with storage placement and access, CERN ischanging the way it manages data transfer.
Other improvements to storage include putting data being accessed foranalysis into a separate system from data reported from experiments.The analysis data storage will have lower latency because “peopledon’t like latency, they want immediate access to the file at the riskof a penalty later,” Cass pointed out.