(Image courtesy of SLAC and Fermilab)
By Monica Bobra
Over 10,000 CPUs scattered across the globe hum in unison for one of the world’s major distributed computing systems, known as the Open Science Grid (OSG). Built and operated by teams from U.S. universities and national laboratories, the national grid computing infrastructure for large scale science is open to small and large research groups nationwide from many scientific disciplines. The OSG, which began operating on July 21, was designed to allow particle physics labs to share their computing resources more efficiently to face growing computing needs.
Now, many scientific institutions—including those unrelated to particle physics—can log onto the internet and run scientific applications on the Grid. DOE, NSF and member institutions provide funding. To become a member, institution-affiliated scientists must provide computing power, storage space or human resources. SLAC currently contributes 100 CPUs to the OSG, by designing software systems to track Grid resource usage and defining security policies. Twenty other institutions provide computing and many terabytes of data storage.
“There are no specific thresholds,” said Richard Mount, head of SCS. But, he explained, adapting computers and storage to support the OSG infrastructure is not easy. “Nobody will do it unless they’re serious players,” said Mount.
To submit data processing jobs to the Grid, institutions and experiments agree how much CPU time a certain institution gives to an experiment. The request zooms through cyberspace into what is known as a middleware layer of OSG architecture, which prioritizes and schedules computing jobs. Because of the small number of users, prioritization has not been a problem. As the number of users increase, developers must come up with a prioritization plan. Eventually, paying commercial users may use portions of the Grid.
“There’s no clear policy on how resources will be allocated,” said Matteo Melani (SCS), an engineer who is developing software to track resource usage for the middleware layer. To do this, Melani asks the resource providers and the experiments what resource they want to track, such as CPU, disk space and network bandwidth. He then designs software to answer questions like: How much CPU time did SLAC contribute to CMS? How much storage space did SLAC contribute to ATLAS? With this information, each institute can determine how to allocate their resources.
Bob Cowles (SCS) is helping to develop security policies for the OSG, and manages Grid security issues for SLAC. The security measures include the local autonomy of OSG computers on site.
All of SLAC’s Open Science Grid computers process data from BABAR using a specific type of software. To do this, the computers access BABAR data, run algorithms and then send the results to other computers. During some stages of this process, such as data retrieval, the computers have a few free cycles.
When the computers have cycles to spare, they run the OSG software, which processes data. Currently CMS—a future experiment at CERN’s Large Hadron Collider (LHC), which will turn on in 2007—runs Monte Carlo simulations. In the future, the computers will also process data from ATLAS, another LHC experiment.
“The infrastructure is smart enough to make sure all the computers are utilized all the time,” said Melani. “It’s very, very efficient in managing resources.”
The SLAC computing infrastructure combined with the OSG middleware make sure we maximize computer utilization. Cowles, Mount, Melani and many others at SLAC are working on improving both the software and hardware contribution to the OSG. In the future, Mount hopes to add more computers to the OSG.
The 10,000 OSG CPUs are collectively processing jobs from about six collaborations worldwide. As the OSG develops, more institutions will join and more data will be processed. Mount can only conjecture what the future holds. “If grid technology is a success, it will create this marketplace in which all sorts of things can happen,” said Mount.