Problems in computational anatomy and multidimensional modeling present fundamental and significant computing challenges. These include the ability to visualize and appreciate complex, dynamic information, and the need to organize biological data so that it can be shared by collaborating researchers. The objective of the Computation Core (CC) is to manage, improve and make available all of the enormous computational resources and expertise needed to support the resource and its collaborations. We have created a technology-rich environment and infrastructure to integrate and facilitate interdisciplinary research.
Rapid advancements in imaging technology have provided researchers with the ability to produce very high-resolution, time-varying, multidimensional datasets of the human brain. Population-based longitudinal studies using this data drive a continually-increasing demand for computer power. Today, LONI relies on a 306-node, dual-processor SUN Microsystems V20z cluster, one of the largest V20z installations in the world. Each computer node has dual 64-bit 2.4 gigahertz AMD Opteron CPUs with 4 gigabytes of memory. In addition to the SUN cluster, LONI has a 64-node Dell development cluster, with each node using dual 64-bit 3.6 gigahertz Intel EM64T processors and 4 gigabytes of memory. To augment the facility’s cluster resources, LONI has a 64-processor SGI Origin 3800 SMP supercomputer with 32 gigabytes of memory. A comparable 32-processor SGI Onyx2 Reality Monster with 16 gigabytes of memory and a 6-processor SGI Onyx2 with 8 gigabytes of memory SMP supercomputers are utilized to drive graphics-intensive applications and interactive real-time multidimensional visualization of structural brain models and volumetric datasets.
To facilitate the submission and execution of computer jobs in this heterogeneous computer environment, SUN’s Grid Engine (SGE) is used to virtualize the resources above into a computer service. A grid layer sits atop the computer resources and submits jobs to available resources according to user-defined criteria such as CPU type, processor count, etc. The laboratory has successfully integrated the latest version of the LONI Pipeline (http://pipeline.loni.usc.edu) with SGE using SUN’s Java DRMAA bindings. The bindings allow jobs to be natively submitted from the LONI Pipeline to the grid without the need for external scripts. Furthermore, the LONI Pipeline can directly control the grid with DRMAA, significantly increasing the operating environment’s versatility and efficacy, and improving overall end-user experience.
Institutions and scientists worldwide rely on the facility’s resources to conduct research. LONI has made a decisive move towards fault-tolerant, high-availability systems designed to ensure near 24/7 functionality. Concurrent with its graphics and computation systems, the laboratory uses a fault-tolerant Storage Area Network (SAN) to accommodate current and projected storage requirements. The SAN hardware infrastructure is composed of: the cluster and supercomputers previously mentioned; RAID storage; dual robotic tape silos; a full complement of Brocade fiber channel switches delivering up to 800 megabytes per second data throughput.
Alternate paths exist throughout the fabric so that no single point of failure exists, guaranteeing access to critical data and processing power. Two quad-processor 500MHz MIPS-R14000 processor SGI Origin300 servers mediate all data transactions and provide networking services. A high availability application ties both servers together and ensures failover in the case of hardware failure.
LONI relies on two silos, a Storagetek SL8500 and a Powderhorn 9310, to store mirrored copies of the facility’s offline tape data. These tape robots are housed in two different locations, ensuring that catastrophic events in any one data center will leave a copy of all tape data intact in another data center. Seven high-speed 40-gigabyte and two high-capacity 400-gigabyte tape drives provide LONI’s tape services. To leverage the available SAN throughput, an SGI TP9500, TP9400 and a TP9100 RAID5 arrays as well as SUN 3510s provide nearly 50 terabytes of fault-tolerant disk storage.