References:
[1] O. Marin, “The darx framework: Adapting fault tolerance for agent
systems,” Ph.D. dissertation, Universit´e de Have, 2003.
[2] B. Hamid, “Distributed fault-tolerance techniques for local
computations,” Ph.D. dissertation, Universit´e Bordeaux I, 2007.
[3] F. Reichenbach, “Service snmp de dtection de faute pour des systmes
rpartis,” Ph.D. dissertation, Ecole polytechnique de Lausane, 2002.
[4] M. Wiesmann, F. Pedone, and A. Schiper, “A systematic classification
of replicated database protocols based on atomic broadcast,” in 3rd
Europeean Research Seminar on Advances in Distributed Systems, 1999.
[5] X. Besseron, “Tol´erance aux fautes et reconfiguration dynamique
pour les applications distribu´ees `a grande ´echelle,” Ph.D. dissertation,
Universit´e de Grenoble, 2010.
[6] N. M. Ndiaye, “Techniques de gestion des d´e faillances dans les grilles
informatiques tol´e rantes aux fautes,” Ph.D. dissertation, Universit´e
Pierre et Marie Curie, 2013.
[7] S. Drapeau, “Un canevas adaptable de services de duplication,” Ph.D.
dissertation, Institut National Polytechnique de Grenoble, 2003.
[8] R. Souli-Jbali, M. S. Hidri, and R. B. Ayed, “Dynamic data
replication-driven model in data grids,” in 39th Annual Computer
Software and Applications Conference, COMPSAC Workshops 2015,
Taichung, Taiwan, July 1-5, 2015, 2015, pp. 393–397.
[9] Chandy and Lamport, “Distributed snapshots : Determining global states
of distributed systems,” ACM Transactions on Computer Systems, vol. 3,
no. 1, pp. 63–75, 1985.
[10] H. S.Paul, A. Gupta, and R. Badrinath, “Hierarchical coordinated
checkpointing protocol,” in International Conference on Parallel and
Distributed Computing Systems, 2002, pp. 240–245.
[11] K. Bhatia, K. Marzullo, and L. Alvisi, “Scalable causal message logging
for wide-area environments,” Concurrency and Computation: Practice
and Experience, vol. 15, no. 3, pp. 243–250, 2003.
[12] S. Monnet, C. Morin, and R. Badrinath, “Hybrid checkpointing for
parallel applications in cluster federations,” in 3rd Workshop on
Resiliency in High Performance Computing (Resilience) in Clusters,
Clouds, and Grids, 2004, pp. 773–782.
[13] E. Meneses, C. L. Mendes, and L. V. Kale, “Team based message
logging : Preliminary results,” in 4th IEEE ACM International
Symposium on Cluster Computing and the Grid, 2010.
[14] J.-M. Yang, K. Li, W.-W. Li, and D.-F. Zhang, “Trading off logging
overhead and coordinating overhead to achieve efficient rollback
recovery,” Concurrency and Computation: Practice and Experience,
vol. 21, no. 3, pp. 819–853, 2009.
[15] A. Guermouche, “Nouveaux protocoles de tolrance aux fautes pour les
applications du calcul haute performance,” Ph.D. dissertation, Universit´e
Paris-Sud, 2011.
[16] D. B. Johnson and W. Zwaenepoel, “Sender based message logging,”
in The Seventeenth Annual International Symposium on Fault-Tolerant
Computing, 1987, pp. 14–19.
[17] A. Varga and R. Hornig, “An overview of the omnet++ simulation
environment,” in Proceedings of the 1st International Conference on
Simulation Tools and Techniques for Communications, Networks and
Systems & Workshops, 2008, pp. 60:1–60:10.