Joint Laboratory Publications

Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds

Proceedings of Europar 2012.
L. Bautista Gomez, B. Nicolae, N. Maruyama, F. Cappello, S. Matsuoka

Energy considerations in Checkpointing and Fault Tolerance protocol

Proceeding of IEEE/IFIP DSN/FTXS 2012.
M. el Mehdi Diouri, O. Guck, L. Lefevre and F. Cappello

Towards Efficient Live Migration of I/O Intensive Workloads: A Transparent Storage Transfer Proposal

Proceedings of ACM HPDC 2012.
B. Nicolae, F. Cappello

Hybrid static/dynamic scheduling for already optimized dense matrix factorization

Proceedings of IEEE IPDPS 2012
S. Donfack, L Grigori, B. Gropp, V. Kale

HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications

Proceedings of IEEE IPDPS 2012

Technical report TR-JLPC-11-05
A. Guermouche, T. Ropars, M. Snir, F. Cappello

Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems

Proceedings of IEEE IPDPS 2012
Technical report TR-JLPC-11-10
A. Gainaru, F. Cappello, B. Kramer

Adaptive Event Prediction Strategy with Dynamic TimeWindow for Large-Scale HPC Systems

Proceedings of SLAMS 2011 (Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques)
A. Gainaru, F. Cappello, J. Fullop, S. Trausan-Matu, B. Kramer

FTI: high performance Fault Tolerance Interface for hybrid systems [pdf]

Proceedings of IEEE/ACM SC11,
Technical report TR-JLPC-11-09
L. Bautista Gomez; D. Komatitsch, N. Maruyama; S. Tsuboi, F. Cappello, S. Matsuoka, T Nakamura

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems

Proceedings of IEEE/ACM SC11,
Technical report TR-JLPC-11-08
E. M. Heien, D. Kondo, A. Gainaru, D. Lapine, B. Kramer, F. Cappello

Damaris: Leveraging Multicore Parallelism to Mask I/O Jitter

Technical report TR-JLPC-11-07
M. Dorrier, G. Antoniu, F. Cappello, M. Snir, L. Orf

BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots

Proceedings of IEEE/ACM SC11,
Technical report TR-JLPC-11-06
B. Nicolae, F. Cappello

Checkpointing strategies for parallel jobs

Proceedings of IEEE/ACM SC11,
Technical report TR-JLPC-11-04
M. Bougeret, H. Casanova, M. Rabie, Y. Robert. F. Vivien

Comparing archival policies for Blue Waters

Proceedings of HIPC 2011,
Technical report TR-JLPC-11-03
F.Cappello, M. Jacquelin, L. Marchal, Y. Robert and M. Snir

Improving Parallel System Performance with a NUMA-aware Load-Balancer

Technical report TR-JLPC-11-02
L. Pilla, C. Pousa, D. Cordeiro, A. Bhatele, P. Navaux, J-F. Méhaut, L. Kale

On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications

Proceedings of Europar 2011
Technical report TR-JLPC-11-01
T. Ropars, A. Guermouche, B. Ucar, E. Meneses, L. V. Kale, F. Cappello

Optimizing multi-deployment on clouds by means of self-adaptive prefetching

Proceedings of Europar 2011
B. Nicolae, F. Cappello, G. Antoniu

Event log mining tool for large scale HPC systems

Proceedings of Europar 2011
A. Gainaru, F. Cappello, B. Kramer

The International Exascale Software Project roadmap

IJHPCA 25(1): 3-60 (2011)
J. Dongarra, F. Cappello, T. H. Dunning, B. Gropp, S. Kale, B. Kramer, M. Snir, et al.

Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications

Proceedings of IPDPS 2011
Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-10-03)
Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, Franck Cappello

Preventive Migration vs. Preventive Checkpointing for Extreme Scale Supercomputers

Parallel Processing Letters 21(2): 111-132 (2011)
F. Cappello, H. Casanova, Y. Robert

On Communication Determinism in Parallel HPC Applications

Proceedings of IEEE ICCCN 2010
Franck Cappello, Amina Guermouche, Marc Snir

Checkpointing vs. Migration for Post-Petascale Supercomputers

Proceedings of ICPP 2010
Franck Cappello, Henri Casanova, Yves Robert

Distributed Diskless Checkpoint for Large Scale Systems

Proceedings of IEEE CCGRID 2010
Leonardo Arturo Bautista Gomez, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka

Hierarchical Event Log Organizer

Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-10-02)
Ana Gainaru, Franck Cappello, Stephan Trausan-Matu, William Kramer

State of the art on event analysis for large scale computers

Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-10-01)
Ana Gainaru, Franck Cappello, Stephan Trausan-Matu

Toward Exascale Resilience

IJHPCA 23(4): 374-388 (2009)
Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, Marc Snir

Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities

IJHPCA 23(3): 212-226 (2009)
Franck Cappello, INRIA and UIUC

The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community

IJHPCA 23(4): 309-322 (2009)
Jack Dongarra, Pete Beckman, Patrick Aerts, Franck Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Terry Moore, Rick Stevens, Anne E. Trefethen, Mateo Valero

Revisiting Fault Tolerant Protocols for HPC Applications

Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-09-02), submitted
Franck Cappello, INRIA, UIUC; Amina Guermouche, Univ. Paris Sud; Thomas Herault, Univ. Paris Sud, INRIA, UTK, Marc Snir, UIUC

Toward Exascale Resilience

Technical Report of the INRIA-Illinois Joint Laboratory on Petascale Computing (TR-JLPC-09-01)
Franck Cappello, INRIA, UIUC; Al Geist, ORNL; Bill Gropp, UIUC; Sanjay Kale, UIUC; Bill Kramer, UIUC; Marc Snir, UIUC

 

The Joint Laboratory for Petascale Computing includes researchers from the French National Institute for Research in Computer Science and Control (INRIA), the University of Illinois at Urbana-Champaign's Center for Extreme-Scale Computation, and the National Center for Supercomputing Applications. The Joint Lab is part of Parallel@Illinois.