University of Kentucky Cluster Fluid Dynamics (CFD) Group

CFD Clusters

A collaboration between UK CFD and KAOS

A fundamental barrier to the widespread application of computational fluid dynamics is the cost - realistic numerical simulations require large amounts of computer power, and the more accurate the simulations, the more costly it is. The results has been that complex CFD simulations have been restricted to large academic, government, and corporate laboratories that can afford the high up-front and maintenance costs of top-notch shared-memory supercomputers. The results is that many smaller companies and institutions that could benefit from CFD research are unable to access the necessary technology.

To try to address this challenge, the UK CFD group is working with the Compilers, Hardware Applications, and Operating Systems (CHAOS) Laboratory  in UK Electrical Engineering to design inexpenisve PC clusters that are optimized for CFD applications. By carefully engineering the PC cluster design, using and building tools to improve application performance, and restructuring the application code explicitly for the cluster, we can create a high-performance CFD platform at a relatively low cost. These Personalized Turnkey Superclusters (PeTS) can then be provided to numerous smaller research organizations, allowing them to get optimal performance at a moderate cost. Futher, such systems are much more simply upgraded as processor speeds continue to increase rapidly with time, helping to avoid the problem of sunk costs in an out-moded computational dinosaur.

An Award-Winning Prototype

The initial collaboration between UK CFD and KAOS ahs already received notice as a Finalist (Honorable Mention) in the Bell Price/Peformance competition at SC2000. Combining DNSTool, a CFD code developed by Dr. Thomas Hauser of the UK CFD group, and KLAT2, a 64-PC cluster designed and built by the KAOS lab, we were able to perform simulations of the flow over a turbine blade at an outstanding ratio of computational power to computational cost. The system had a sustained rate of $1.86/MFLOP for single precision calculations and $2.75/MFLOP for double precision, and an overall hardware/construction cost of about $40,000.

KLAT2 (Kentucky Linux Athlon Testbed 2) is a PC cluster based on 64+2 700 MHz Athlon processors connected by 264 NICs (Network Interface Cards) and 9, "32 way" switches. KLAT2 is the first cluster incorporating the Flat Neighborhood Network (FNN) topology, in which each PC has multiple NICs (4 each on KLAT2) connected to the switches in such a manner than each PC has at minimum one communication path to every other PC with only one switch latency. This approach minimizes communication delays, increasing bandwidth and allowing the use of much cheaper 100 Mb/s vs. 1 Gb/s technology. A Genetic Alogrithm (GA) is employed to design this complex network efficiently based on the anticipated demands of the computer codes to be run on the system. Independent of the CFD research, KLAT2 has achieved over 64 GFLOPs on 32-bit ScaLAPACK runs, a speed better than many computers on the most recent top 500 supercomputer listing.

DNSTool is a parallel CFD code designed to solve complex fluid flows based on Direct Numerical Simulation (DNS) of the turbulent flow properties. It is a precursor of LESTool, a primary code at UK CFD. In order to optimize the peformance of DNSTool on the cluster, several modifictions were made to the code beyond a thorough "cleaning" of unnecessary or cumbersome computations. Two specific modifications were the restructuring of the data storage and the incorporation of SIMD Within A Register (SWAR) 3DNow! routines. The original data structure was based on large, spatial arrays for each variable, which was not compatible with the cache-based architecture of PC processors. The restructured data, in which related variables at a given spatial location were grouped together in storage, significantly improved the code performance. SWAR is a means of compiling the code to take advantage of certain fast functions within the processor registers, in which multiple calculations can effectively be made simultaneously. SWAR instructions are included as inline assembly macros, replacing traditional C or FORTRAN coding. The section of the code that is rewritten is quite small - by isolating the most time-consuming calculations, the benefits of SWAR can be maximized. 

The simulation performed by DNSTool was the computation of the flow over a turbine blade, requiring a curvelinear grid of about 16 million points. The flow was subsonic throughout (inflow M = 0.1) and includes a transitional region over the blade. Technically, the results of this simulation should be considered Quasi-DNS (QDNS) as the number of grid point is probably about a factor of 5 too low to capture all the relevant turbulence scales. Still, the simulation is representative of modern CFD research and an excellent demonstration of the potential of the this system.

Current Research

Based on our experience with DNSTool, we have been incorporating other CFD codes into the PeTS concept. These codes include LESTool (being done by the primary author of LESTool, Dr. Thomas Hauser at Utah State University), OVERSET Tools for CFD Analysis, and two in-house UK CFD codes, GHOST and UNCLE. The single node optimization work for the latter two codes is discussed in separate projects pages. We also continue to investigate new cluster architectures and new approaches to boost code performance, such as virtual parallel file servers. 

Relevant Publications and Presentations

Hauser, Th., T.I. Mattox, R.P. LeBeau, H.G. Dietz, P.G. Huang, 2000. "High-Cost CFD on a Low-Cost Cluster," Gordon Bell/Price Performance Finalist (Honorable Mention) and regular paper in SC2000, Dallas, TX, November 4-10. ( 4.2 MB pdf)

LeBeau, R.P., H. Chen, P. Kristipati, S. Gupta, and P.G. Huang, “ Joint Performance Evaluation and Optimization of Two CFD Codes on Commodity Clusters,” 43rd AIAA Aerospace Sciences Meeting and Exhibit, AIAA-2005-1380, Reno, NV, January 10-13, 2005.

Hauser, Th. , T.I. Mattox, R.P. LeBeau, Jr., H.G. Dietz, and P.G. Huang, "CFD code optimizations for complex microprocessors," SIAM Journal on Scientific Computing, 25, 1461-1477, 2004.

Huang, P.G., R.P. LeBeau, H.G. Dietz, T.I. Mattox, T.E. Dowling, “Applications of Computational Fluid Dynamics (CFD) on Commodity Clusters,” 10th Annual Kentucky EPSCoR Conference, Lexington, KY, May 13, 2004.

Chen, H., P.G. Huang, and R.P. LeBeau, "Performance Tests of Parallel 2D/3D Unstructured Incompressible CFD Code in Different Architecture of Linux Clusters," 29th Annual Dayton-Cincinnati Aerospace Science Symposium, Dayton , OH, March 9, 2004.

Hauser, Th., R.P. LeBeau, Jr., T.I. Mattox, P.G. Huang, and H.G. Dietz, "Improving the performance of a CFD program across different Linux Cluster Architectures," 16th AIAA Computational Fluid Dynamics Conference, Orlando, FL, June 23-26, 2003.

Hauser, Th., R.P. LeBeau, Jr., T.I. Mattox, P.G. Huang, and H.G. Dietz, "A Comparative Study of the Performance of a CFD program across different Linux Cluster Architectures," 3rd LCI International Conference on Linux Clusters: The HPC Revolution 2002, St. Petersburg, FL, October 23-25, 2002.

Hauser, Th., T.I. Mattox, R.P. LeBeau, H.G. Dietz, P.G. Huang, "CFD on Low-Cost, High Performance Cluster," 26th Annual
Dayton-Cincinnati Aerospace Science Symposium, March 30, 2001 ( abstract)

| ©2007 University of Kentucky Cluster Fluid Dynamics Group