journal article Apr 04, 2017

Failover strategy for fault tolerance in cloud computing environment

Software: Practice and Experience Vol. 47 No. 9 pp. 1243-1274 · Wiley
Abstract
SummaryCloud fault tolerance is an important issue in cloud computing platforms and applications. In the event of an unexpected system failure or malfunction, a robust fault‐tolerant design may allow the cloud to continue functioning correctly possibly at a reduced level instead of failing completely. To ensure high availability of critical cloud services, the application execution, and hardware performance, various fault‐tolerant techniques exist for building self‐autonomous cloud systems. In comparison with current approaches, this paper proposes a more robust and reliable architecture using optimal checkpointing strategy to ensure high system availability and reduced system task service finish time. Using pass rates and virtualized mechanisms, the proposed smart failover strategy (SFS) scheme uses components such as cloud fault manager, cloud controller, cloud load balancer, and a selection mechanism, providing fault tolerance via redundancy, optimized selection, and checkpointing. In our approach, the cloud fault manager repairs faults generated before the task time deadline is reached, blocking unrecoverable faulty nodes as well as their virtual nodes. This scheme is also able to remove temporary software faults from recoverable faulty nodes, thereby making them available for future request. We argue that the proposed SFS algorithm makes the system highly fault tolerant by considering forward and backward recovery using diverse software tools. Compared with existing approaches, preliminary experiment of the SFS algorithm indicates an increase in pass rates and a consequent decrease in failure rates, showing an overall good performance in task allocations. We present these results using experimental validation tools with comparison with other techniques, laying a foundation for a fully fault‐tolerant infrastructure as a service cloud environment. Copyright © 2017 John Wiley & Sons, Ltd.
Topics

No keywords indexed for this article. Browse by subject →

References
64
[1]
Bilal K (2015)
[2]
O.Sefraoui M.Aissaoui andM.Eleuldj “Cloud computing migration and IT resources rationalization ”2014 Int. Conf. Multimed. Comput. Syst. pp.1164–1168 Apr. 2014. 10.1109/icmcs.2014.6911300
[3]
Y.Jararweh Z.Alshara M.Jarrah M.Kharbutli andM. N.Alsaleh “TeachCloud: a cloud computing educational toolkit ” no.2012 pp.1–16. 10.1504/ijcc.2013.055269
[4]
R.Jhawar V.Piuri andI.Universit “Fault tolerance management in IaaS clouds ”2012 IEEE First AESS Eur. Conf. Satell. Telecommun. pp.1–6 2012. 10.1109/estel.2012.6400113
[5]
Bala A "Fault tolerance‐challenges, techniques and implementation in cloud computing" International Journal of Computer Science (2012)
[7]
S.Shen A.Iosup A.Israel W.Cirne D.Raz andD.Epema “An availability‐on‐demand mechanism for datacenters ”2015 15th IEEE/ACM Int. Symp. Clust. Cloud Grid Comput. pp. 495–504 2015. 10.1109/ccgrid.2015.58
[8]
B.MohammedandM.Kiran “Analysis of cloud test beds using opensource solutions ”2015 3rd Int. Conf. Futur. Internet Things Cloud pp. 195–203 2015. 10.1109/ficloud.2015.106
[9]
D.Sun G.Chang C.Miao andX.Wang “Analyzing modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments ”Journal of Supercomputing vol. 66 no. 1. J Suercomputer () pp. 193–228 2013. 10.1007/s11227-013-0898-7
[10]
M.Pradesh “A survey on various fault tolerant approaches for cloud environment during load balancing ” vol. 4 no. 6 pp. 25–34 2014.
[11]
The cost of a cloud

Albert Greenberg, James Hamilton, David A. Maltz et al.

ACM SIGCOMM Computer Communication Review 10.1145/1496091.1496103
[12]
ITProPortal “ITProPortal.com: 24/7 Tech Commentary & Analysis ”2012. [Online]. Available:http://www.itproportal.com/. [Accessed: 24‐Jun‐2015].
[13]
Pantic Z. (2012)
[15]
S.Yadav “Comparative study on open source software for cloud computing platform: Eucalyptus Openstack and Opennebula ” vol.3 no.10 pp.51–54 2013.
[16]
A. D.Meshram “Fault tolerance model for reliable cloud computing general terms: ” no. July 2013.
[18]
C.‐T.Yang Y.‐T.Liu J.‐C.Liu C.‐L.Chuang andF.‐C.Jiang “Implementation of a cloud IaaS with dynamic resource allocation method using OpenStack ”2013 Int. Conf. Parallel Distrib. Comput. Appl. Technol. pp.71–78 Dec. 2013. 10.1109/pdcat.2013.18
[19]
Singh K "Failure analysis and prediction for the CIPRES science gateway Kritika" Concurr. Comput. Pract. Exp. (2016)
[20]
Fu M "Runtime recovery actions selection for sporadic operations on public cloud" Softw. ‐ Pract. Exp. (2016)
[21]
Chen G "A lightweight software fault‐tolerance system in the cloud enviroment" Concurr. Comput. Pract. Exp. (2015)
[22]
Pei X "Repairing multiple failures adaptively with erasure codes in distributed storage systems Xiaoqiang" Concurr. Comput. Pract. Exp. (2015)
[23]
Bertolli C "Fault tolerance for data parallel programs C" Concurr. Comput. Pract. Exp. (2011)
[24]
Maloney A "A survey and review of the current state of rollback‐recovery for cluster systems" Concurr. Comput. Pract. Exp. (2009)
[25]
Alshareef HN "Robust cloud management of MANET checkpoint sessions" Concurr. Comput. Pract. Exp. (2016)
[26]
Bin Hong DW "DAC‐Hmm: detecting anomaly in cloud systems with hidden Markov models" Concurr. Comput. Pract. Exp. (2015)
[27]
Chen P "A probabilistic model for performance analysis of cloud infrastructures" Concurr. Comput. Pract. Exp. (2015)
[28]
Salehi MA "Resource provisioning based on preempting virtual machines in distributed systems Mohsen" Concurr. Comput. Pract. Exp. (2013)
[30]
A.Ganesh M.Sandhya andS.Shankar “A study on fault tolerance methods in cloud computing ”2014 IEEE Int. Adv. Comput. Conf. pp.844–849 2014. 10.1109/iadcc.2014.6779432
[31]
Kaur J "Efficient algorithm for fault tolerance in cloud computing" 2014 IJCSIT Int. J. Comput. Sci. Inf. Technol. (2014)
[32]
A.Tchana L.Broto andD.Hagimont “Approaches to cloud computing fault tolerance ”IEEE CITS 2012–2012 Int. Conf. Comput. Inf. Telecommun. Syst. 2012. 10.1109/cits.2012.6220386
[33]
K.Parveen G.Raj andK. R.Anjandeep “A novel high adaptive fault tolerance model in real time cloud computing ” pp.138–143 2014. 10.1109/confluence.2014.6949285
[34]
K. J.NaikandN.Satyanarayana “A novel fault‐tolerant task scheduling algorithm for computational grids ”2013 15th Int. Conf. Adv. Comput. Technol. pp.1–6 2013. 10.1109/icact.2013.6710529
[37]
Concurr. Comput. Pract. Exp. 2015 27 14 Issue information 10.1002/cpe.3388
[38]
I. P.Egwutuoha S.Chen D.Levy B.Selic andR.Calvo “A proactive fault tolerance approach to high performance computing (HPC) in the cloud ”Proc. ‐ 2nd Int. Conf. Cloud Green Comput. 2nd Int. Conf. Soc. Comput. Its Appl. CGC/SCA 2012 pp.268–273 2012. 10.1109/cgc.2012.22
[39]
X.Kong J.Huang C.Lin andP. D.Ungsunan “Performance fault‐tolerance and scalability analysis of virtual infrastructure management system ”2009 IEEE Int. Symp. Parallel Distrib. Process. with Appl. pp.282–289 2009. 10.1109/ispa.2009.24
[41]
R.Nogueira F.Araujo andR.Barbosa “CloudBFT: elastic byzantine fault tolerance ”2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing. 10.1109/prdc.2014.31
[42]
YadavN PandeySK.Fault tolerance in DCDIDP using HAProxy:231–237.
[43]
Kim HKH "Node selection for a fault‐tolerant streaming service on a peer‐to‐peer network" 2003 Int. Conf. Multimed. Expo. ICME ‘03. Proc. (Cat. No.03TH8698) (2003)
[44]
Sheng D "GloudSim: Google trace based cloud simulator with virtual machines" Softw. ‐ Pract. Exp. (2015)
[45]
Qiang W "CDMCR: multi‐level fault‐tolerant system for distributed applications in cloud" International Journal of Applied Engineering Research (2015)
[47]
Singh D "High availability of clouds: failover strategies for cloud computing using integrated checkpointing algorithms" Proc. ‐ Int. Conf. Commun. Syst. Netw. Technol. CSNT (2012)
[48]
Jung G "Performance and availability aware regeneration for cloud based multitier applications" Proc. Int. Conf. Dependable Syst. Networks (2010)
[50]
P.DasandP. M.Khilar “VFT: a virtualization and fault tolerance approach for cloud computing ”2013 IEEE Conf. Inf. Commun. Technol. ICT 2013 no. Ict pp.473–478 2013. 10.1109/cict.2013.6558142

Showing 50 of 64 references

Related

You May Also Like

Graph drawing by force‐directed placement

Thomas M. J. Fruchterman, Edward M. Reingold · 1991

4,151 citations

Garbage collection in an uncooperative environment

Hans‐Juergen Boehm, Mark Weiser · 1988

407 citations

Quantum computing: A taxonomy, systematic review and future directions

Sukhpal Singh Gill, Adarsh Kumar · 2021

370 citations