Data Center HPC Aurora Au-5600: an energy efficient, scalable, powerful, noiseless and versatile range of systems
Aurora Au-5600 ushers in a new era of high performance computing at all levels, from departmental down to office sized machines. Its system level optimization allows unprecedented efficiency and cost reduction, while running a system with state of the art performance. Aurora Au-5600 delivers seamless scalability while maximizing efficiency and reducing the total cost of ownership across the entire operational life of the equipment. AU-5600 is the first example of a noiseless supercomputer: thanks to its liquid cooling system, it generates no vibration or noise. Installation is made simpler and more flexible, widening the choice of installation sites.
Aurora represents Eurotech’s latest HPC product offering, an innovative and leading edge architecture. Aurora systems can scale from a single working unit (node card) to many computing racks, with no performance degradation. Aurora HPC systems offer high density and power efficiency, thanks to usage of liquid cooling for all their modules, and to an advanced design. A wide range of Aurora configurations can be obtained, all based on the same building blocks.
The Aurora node card is the main processing unit of every Aurora system. A single blade hosting two Intel Xeon 5600 series processors (Six core, up to 3.33GHz), each connected to 6/12/24GB of DDR3 memory, running at 1333MHz. CPUs are linked to peripherals and interfaces via Intel 5600 chipset (Tylersburg), using QPI (Intel QuickPath Interconnect). Node cards provide local storage by means of an on board 1.8" SATA SSD. Nodes host one Mellanox Infiniband ConnectX2 device, providing Infiniband QDR connectivity, with 40Gbps bandwidth and <2us latency, used to implement a switched network and also for data storage.
A large, high end FPGA allows implementation of a point to point 3D-Torus shaped network, for applications which do not require a centralized switched network. The 3D-Torus main features and advantages are low latency (about 1us), an aggregated bandwidth of 60Gbps, high robustness and reliability, thanks to redundant lines. Based on the same FPGA device present on board, some reconfigurable computing functions such as coprocessing, acceleration, GPU-like arithmetics are possible thanks to available logic resources on the FPGA (up to 700Gops per device). One node card can provide a maximum computing power of 166Gflops for a typical power consumption of 340W.
Aurora node cards are hosted in chassis, in multiples of 16. Each chassis features also a 36-port QDR Infiniband switch, which leaves 20 usable ports for external connections, each at a 40Gbps speed. All chassis feature two independent monitoring and control networks, for redundancy and safe operation. Maintenance and inspection of Aurora systems can be carried out by personnel also using a touchscreen interface on a monitor, where also diagnostic data are shown, resulting in a user friendly man-machine interface.
Aurora Racks can contain up to 16 chassis each, providing power, cabling, signal connections, and piping for heat removal via a liquid cooling circuit.
Aurora systems can scale to many racks, each connected to its nearest neighbour via short and in-rack (therefore invisible) cables. Such an arrangement does not cause performance degradation or difficulties in installation, management and maintenance of systems, regardless of their size.
• Energy efficient heat removal using hot liquid cooling. Using dry coolers, free cooling, heat reclamation, varying flow rates and different inlet coolant temperatures, substantial savings in terms of energy bills can be achieved, resulting in substantially lower TCO
• Noiseless and vibrationless operation: Aurora will not shake or rattle. It will make no noise either, it will not have moving fans, will not require a dedicated room for installation.
• High packaging density: aurora systems are the best in class in terms of computing density per unit rack. Up to 2048 cores/512 CPUs/256 blades can be hosted in a single 48U rack. This means a reduced floor occupation and easier installation.
• Using Intel Xeon5600 series processors, AURORA can reach state of the art performance levels, while keeping compatibility with x86 applications, tools and OS's, open source or proprietary.
• AURORA is also well equipped to tackle the OS jitter issue, thanks to its three different and independent synchronization networks.
• AURORA benefits from a dual network of sensors and controls, which enhances reliability and robustness, and makes maintenance easier.
Interconnects are crucial for the "P" in HPC as they can enhance, of stifle performance. Aurora with two available networks allows maximum flexibility and adaptability.
QDR INFINIBAND Aurora QDR Infiniband is designed in order to achieve state of the art speed, while leaving flexibility in switching topologies and expandability. QDR Infinband runs at 40Gbps per port, with memory-to-memory latency of less than 2us. Each Aurora node is connected to each other via a QDR IB port within each chassis. 20 IB external ports are available via QSFP connectors, allowing users to connect Aurora via an external federation of switches, and to storage systems.
A switchless interconnect, which allows HPC systems to tackle classes of problems based on nearest neighbor data communication. Aurora 3D Torus features a high data rate, with 6 channels per node, each with a signaling rate of 10Gbps, for a total of 60+ redundant 60Gbps, with mem-to-mem latency of ~1us. Aurora 3DTorus is a robust and flexible implementation, featuring redundant links allowing on-the-fly machine repartitioning without performance degradation. Being implemented on an FPGA device, Aurora 3DTorus is flexible by design, and upgrades are easy. 3DTorus transfer mechanism can be based on direct memory copy, if data need to be transferred in regular patterns, or RDMA access, for applications with different data structures. Aurora 3D Torus does not require external equipment or cabling: everything needed for the 3DTorus network is provided. Short cables, confined within racks, allow easy scalability, up to numbers such that real estate and power become the limiting factors. In order to reuse a larger amount of available code, MPI has been ported on the Aurora 3DTorus.
Monitoring and Management
Consistent and reliable Aurora operation depends on efficient acquisition and processing of diagnostic data. Aurora has two fully independent sensor networks operating at the same time. They are powered from different supplies, in order to ensure highest coverage, minimizing downtime and unavailability.
IPMI One of the sensor and management networks is based on IPMI, via IBMC modules on every node, also collecting data from sensors on other boards. IBMCs allow flexible and powerful monitoring of the machine. This network acquires and logs data from voltage, temperature and humidity sensors; its diagnostics and maintenance are performed on the machine while it is running.
AuroraMON Is a distributed application running on an external server, connected via ethernet/SNMP to main modules situated in each AURORA chassis and then to every AURORA module, it is based on the ServNET network. AuroraMON performs monitoring and management tasks, like a SCADA system. Its web UI is displayed on all the AURORA rack monitors, and on a management console: it integrates information from various sources: ServNET sensors, Operating System layer readings, external infrastructure,etc. Its monitoring daemons and control agents collect and processing data to/from AURORA modules. Management actions are performed locally to raise alarms, run tests and collect data. AuroraMON can be connected via SNMP with building facilities to monitor and control not only AURORA, but also external services such as AC power supply, UPS, cooling, networks, other hardware, and more. AuroraMON archives data logs on nonvolatile memory, to implement a distributed database for health monitoring.
All AURORA modules are hot replaceable, allowing intervention on modules, while keeping the rest of the machine running, hence minimizing machine downtime. Each AURORA chassis can be extracted from the rack using sliding rails, keeping the remainder of the system running when major maintenance and repair work or disassembly are needed. Usage of 48VDC power supplies reduces safety and compliance issues: AURORA is a Low Voltage System for regulatory and product certification bodies.
ServNET It is an independent network of sensors: it can operate even when an Aurora system is completely powered off. By means of ServNET, system administrators can perform selective system shutdown, bringup, power cycling when necessary. ServNet also allows to double check monitoring data against the system sensors connected with IPMI, to gather information on reliability of the acquisition systems themselves.
|Computing performance:||• 166GFlops/node, 5TFlops/chassis, 42TFlops/rack|
|Power consumption:||• 340W/node, 11.2kW/chassis, 90kW/rack typ.|
• Intel Six core Xeon5600 series ((TDP 130 W max), 2CPU/12 cores/node, 64CPU/384 cores/chassis, 512CPU/3072 cores/rack
• 12M Cache, 3.33 GHz clock, 6.40 GT/s Intel® QPI
• 6/12/24 GB soldered on board ECC DDR3 SDRAM per node
• Memory bandwidth: 40 GB per second per node
|Local Storage:||• 80 / 160 / 256 GB 1.8” SATA SSD|
• QDR Infiniband port per node (BW: 40Gbps, Latency <2us)
• 20+20 QDR IB ports (QSFP connections) per chassis
• 1+1 3D Torus nearest neighbour switchless per node (BW:60+60Gbps, Latency: ~1us )
• External ACDC converter (85-300VAC to 48VDC), n+1 redundant, 97% efficiency
• In rack DCDC trays (48VDC to 10VDC), 97% efficiency
|Cooling:||• Entirely liquid cooled, ambient heat spillage <2%|
|Monitoring and Control:||
Two independent sensor networks:
• IPMI 960 measurement points per rack.
• ServNet: sensor and actuation network: 960 measurement points per rack
• AU-5500SRxx: Single rack systems where xx indicates the number of chassis populating the rack
• AU-5500MRyy-xx: Multiple rack systems where yy indicates the number of racks and xx indicates the total number of chassis
|Service and Support:||Eurotech can provide post sale support in a variety of flexible arrangements. Aurora systems have several advantages compared to their competitors in terms of modularity, robustness, and rational system design: this results in an offering totally aimed at minimizing downtime, and the proportion of HPC Aurora systems which become unavailable during service and maintenance.|
• Dimensions (Rack): H 2260mm x W 1095mm x D 1500mm
• Weight (Maximum): 1560kg (3440 lbs.) per fully populated rack
• Acoustical Noise Level: <20 dB at 1 m
Adoption of Intel processors ensures compatibility with a vast range of applications, tools, OS's and specific HPC middleware. The advantage of being x86-based allows Aurora to have an almost unlimited choice of compilers, debuggers, libraries, applications, OS's, specific HPC middleware, clustering and administration tools, open source or proprietary.
• Scientific Linux and others
• Intel Cluster Toolkit
• GNU toolchain
• Portland CDK
• Intel MPI
• Portland MVAPICH
Debuggers and performance tools
• Intel Trace Analyzer and Collector
• Intel VTune
Math Libraries Compatibility of Math libraries implementation specific
• Intel MKL
• IMSL (with ICT requires adaptation)
Resource Management/ Deployment
• OpenPBS, SunGridEngine
• PBS Professional
• Bright Cluster Manager
• Platform LSF/Cluster Manager
• Rocks, Rocks+
• Torque, MOAB, xCAT
Distributed File System
• Lustre over QDR Infiniband, either via OFED or TCP.
• pNFS, panFS, GPFS under test
Maintenance and Management
• Intel Cluster Checker
• Oil and Gas reservoir simulation
• Climate Modeling, Weather Forecast
• Molecular Dynamics
• Finance and banking
• Computational Fluid Dynamics, Finite Element Analysis
• High Energy Physics