0
Research Papers: Fuel Combustion

Development of a Stiffness-Based Chemistry Load Balancing Scheme, and Optimization of Input/Output and Communication, to Enable Massively Parallel High-Fidelity Internal Combustion Engine Simulations

[+] Author and Article Information
Janardhan Kodavasal

Argonne National Laboratory,
9700 S. Cass Avenue,
Argonne, IL 60439
e-mail: jkodavasal@anl.gov

Kevin Harms

Argonne Leadership Computing Facility,
9700 S. Cass Avenue,
Argonne, IL 60439
e-mail: harms@alcf.anl.gov

Priyesh Srivastava

Convergent Science, Inc.,
6400 Enterprise Lane,
Madison, WI 53719
e-mail: priyesh.srivastava@convergecfd.com

Sibendu Som

Argonne National Laboratory,
9700 S. Cass Avenue,
Argonne, IL 60439
e-mail: ssom@anl.gov

Shaoping Quan

Convergent Science, Inc.,
6400 Enterprise Lane,
Madison, WI 53719
e-mail: shaoping.quan@convergecfd.com

Keith Richards

Convergent Science, Inc.,
6400 Enterprise Lane,
Madison, WI 53719
e-mail: krichards@convergecfd.com

Marta García

Argonne Leadership Computing Facility,
9700 S. Cass Avenue,
Argonne, IL 60439
e-mail: mgarcia@alcf.anl.gov

1Corresponding author.

Contributed by the Internal Combustion Engine Division of ASME for publication in the JOURNAL OF ENERGY RESOURCES TECHNOLOGY. Manuscript received January 12, 2016; final manuscript received January 12, 2016; published online February 23, 2016. Editor: Hameed Metghalchi.The United States Government retains, and by accepting the article for publication, the publisher acknowledges that the United States Government retains, a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for United States government purposes.

J. Energy Resour. Technol 138(5), 052203 (Feb 23, 2016) (11 pages) Paper No: JERT-16-1022; doi: 10.1115/1.4032623 History: Received January 12, 2016; Revised January 12, 2016

A closed-cycle gasoline compression ignition (GCI) engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics (CFD) code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q (BG/Q) supercomputer. The test case has 9 × 106 cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output (I/O) performance resulted in a significant speedup in reading restart files, and in an over 100-times speedup in writing restart files and files for postprocessing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over three-times faster runtime near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.

FIGURES IN THIS ARTICLE
<>
Copyright © 2016 by ASME
Your Session has timed out. Please sign back in to continue.

References

Figures

Grahic Jump Location
Fig. 1

CFD domain of 360 deg shown during injection. Note that the profiling and optimization of the code has been done near TDC, during ignition, and the spray is shown for context, since the simulation itself was run from IVC through injection.

Grahic Jump Location
Fig. 2

Darshan output for I/O sizes with the original code showing on the order of half a billion file write calls in the 4–12 bytes range

Grahic Jump Location
Fig. 3

Darshan output for I/O sizes with improvement to file write subroutines—fewer file write calls on the order of a thousand, with bigger chunks of data on the order of MB written in each file write

Grahic Jump Location
Fig. 4

Scaling performance of binary file write operation

Grahic Jump Location
Fig. 5

Illustration of data transferred between ranks. Here, rank 0 sends to rank 1 C neighbor cells each having K properties (pressure, temperature, etc.), represented by P.

Grahic Jump Location
Fig. 6

Original communication scheme

Grahic Jump Location
Fig. 7

Improved collective communication scheme

Grahic Jump Location
Fig. 8

Scaling performance of the communication operation during mesh load balancing. Results shown for the first load balancing cycle during initialization.

Grahic Jump Location
Fig. 9

Illustration of chemistry load imbalance during combustion. R1 denotes a region where chemical kinetics is solved by an arbitrary rank 1 and R2 denotes a region where chemical kinetics is solved by another arbitrary rank 2.

Grahic Jump Location
Fig. 10

Improvement in chemistry load balance with stiffness-based load balancing schemes. Imbalance shown in terms of the ratio of chemistry time spent in every time-step by the slowest and fastest ranks.

Grahic Jump Location
Fig. 11

Computation time for simulation with the original code, and the improved code with stiffness-based chemistry load balancing

Grahic Jump Location
Fig. 12

Actual compared to ideal speedup in computation time for the original code, and improved code with stiffness-based chemistry load balancing

Grahic Jump Location
Fig. 13

Computational expense in terms of CPU-hours for the original code, and the improved code with stiffness-based chemistry load balancing, over the range of configurations studied

Tables

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In