You are here

Scheduling Algorithms for MapReduce Framework

Journal Name:

Publication Year:

Author NameUniversity of Author
Abstract (2. Language): 
Google proposed MapReduce as a simple and flexible parallel programming model, for large-scale distributed data processing. MapReduce framework allows users to quickly develop big-data applications and process big-data effectively. However, unexpected malfunction may be found in cloud environment because a distributed system consists of several hardware, and this malfunction often causes delay of overall processing. In MapReduce framework, the underlying runtime system automatically parallelizes the computation through large-scale nodes of machines, handles machine failures, and schedules inter-machine communication to make use of the network and disks efficiently. Scheduling is one of the important factors in MapRduce. In order to achieve good performance a MapReduce scheduler must avoid unnecessary data transmission. Hence different scheduling algorithms for MapReduce are necessary to provide good performance. how to schedule the service resources to achieve the lowest cost becomes more and more important. In this paper, we describe the overview of fifteen different scheduling algorithms for MapReduce in Hadoop and their scheduling issues and problems. At the end, Advantages and disadvantages of these algorithms are identified.
FULL TEXT (PDF): 
70
78

REFERENCES

References: 

[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph , "A View of Cloud Computing ", Comm. Of the ACM, Vol. 53, No. 4, pp. 50-58, April 2010.
[2] J. Dean and S. Ghemawat, “MapReduce: Simplied Data Processing on Large Clusters”, In Proc. of 5th Symposium on Operating Systems Design and Implementation, 2008, pp. 137-150.
[3] S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google File System", In ACM Symposium on Operating Systems Principles (SOSP), 2003.
[4] W. Jiang, T. Ravi and G. Agrawal, "Comparing MapReduce and Freeride For Data-Intensive Applications", In Proc. Of Cluster Computing and Workshops, 2009, pp. 1-10.
[5] C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, K. Olukotun, Map-reduce for machine learning on multicore, http://www.cs.standford.edu/peop;e/ang//papers/nips06- mapreducemulticoure.pdf(2006), Accessed 1 March 2012.
[6] J. Ekanayake, S. Pallickara, G. Fox, "MapReduce for data intensive scientific analyses", Proceedings of the IEEE Fourth International Conference on eScience, 2008.
[7] G. Mackey, S. Sehrish, J. Bent, J. Lopez, S. Habib, J. Wang, "Introducing map-reduce to high end computing", Proceedings of the 3rd Patascale Data Storage Workshop, 2008.
[8] M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, "Dryad: distributed data-parallel programs from sequential building blocks", Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, 2007.
[9] H. Han, H. Jung, H. Eom, H. Y. Yeom, "Scatter-Gather-Merge:an efficient star-join query processing algorithm for data-parallel frameworks", Cluster Computing, 14(2), 2010.
[10] B. He, Q. Luo, N. K. Govindaraju, "Mars: accelerating MapReduce with graphics processors", IEEE Trans. Parallel Distribute System, 22(4), 2011.
[11] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, C. Kozyrakis, "Evaluating mapreduce for multi-core and multiprocessor systems", IEEE 13th International Symposium on High Performance Computer Architecture, 2007.
[12] The Apache Software Foundation: Hadoop (2012).http://hadoop.apache.org. Accessed 1 March 2012.
[13] Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo, “ SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment ”, In Proc. of the 10th IEEE International Conference on Computer and information Technology, 2010, pp. 2736-2743.
[14] O. H. Ibarra and C. E. Kim. "Heuristic algorithms for scheduling independent tasks on nonidentical processors", Journal of the ACM, 24(2), 1977, pp. 280–289.
[15] C. Tian, H. Zhou, Y. He and L. Zha, “A Dynamic MapReduce Scheduler for Heterogeneous Workloads”, In Proc. of the Eighth
International Conference on Grid and Cooperative Computing, 2009, pp. 218-224.
[16] M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker and I. Stoica, “ Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling”, In: Proceedings of the fifth European conference on computer systems, New York, NY, USA: ACM, 2010, pp. 265–278.
[17] Hadoop, “Hadoop home page.” http://hadoop.apache.org/.
[18] Hadoop’s Fair Scheduler. https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.
[19] B. P. Andrews and A. Binu, “ Survey on Job Schedulers in Hadoop Cluster ”, IOSR Journal of Computer Engineering, Vol.15, NO. 1, Sep - Oct. 2013, pp. 46-50.
[20] The Apache Hadoop Project. http://www.hadoop.org.
[21] Y. Tao Y, Q. Zhang, L. Shi and P. Chen, “ Job scheduling optimization for multi-user MapReduce clusters ”, In: The fourth international symposium on parallel architectures algorithms and programming, IEEE, 2011, pp. 213–217.
[22] J. Chen, D. Wang and W. Zhao, “ A Task Scheduling Algorithm for Hadoop Platform ”, JOURNAL OF COMPUTERS, VOL. 8, NO. 4, APRIL 2013, pp. 929-936.
[23] P. Nguyen, T. Simon, M. Halem, D. Chapman and Q. Le, “A hybrid scheduling algorithm for data intensive workloads in aMapReduce environment”, In: Proceedings of the 2012 IEEE/ ACM fifth international conference on utility and cloud computing. Washington, DC, USA: IEEE computer society; UCC'12, 2012, pp. 161-168.
[24] I. Polato “A comprehensive view of Hadoop research—A systematic literature review ”, Journal of Network and Computer Applications, 2014, http://dx.doi.org/10.1016/j.jnca.2014.07.022.
[25] B. A. Kumar and T. Ravichandran, "Instance and value (IVH) algorithm and dodging dependency for scheduling multiple instances in hybrid cloud computing", Pattern Recognition, Informatics and Mobile Engineering (PRIME), International Conference, 2013.
[26] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz and I. Stoica, “Improving MapReduce performance in heterogeneous environments ”, In: OSDI 2008: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008.
[27] M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, "Job scheduling for multi-user MapReduce clusters", http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.pdf.
[28] A. Konwinski, "Improving mapreduce performance in heterogeneous environments", Technical Report No. UCB/EECS-2009-183, University of California, Berkeley, 2009.
[29] X. Sun, C. He and Y. Lu, “ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm”, IEEE 18th International Conference on Parallel and Distributed Systems, 2012.
[30] A. P. Kulkarni and M. Khandewal, “ Survey on Hadoop and Introduction to YARN ”, International Journal of Emerging Technology and Advanced Engineering, Vol.4, NO. 5, May 2014, pp. 82-87.
[31] K. A. Kumar, V. K. Konishetty, K. Voruganti and G. Rao, “ CASH: context aware scheduler for Hadoop”, In: Proceedings of the international conference on advances in computing, communications and informatics, New York, NY, USA: ACM, 2012, pp. 52–61.
[32] K. Kc and K. Anyanwu, “Scheduling hadoop jobs to meet deadlines", In 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2010, pp. 388 –392.
[33] Z. Tang. J. Zhou, K. Li and R. Li, "A MapReduce task scheduling algorithm for deadline constraints", Cluster Computing, Vol. 16, 2013.
[34] C. He, Y. Lu, D. Swanson, "Matchmaking: A New MapReduce Scheduling Technique".
[35] J. Wang, Q. Li, Y. Shi, "SLO-Driven Task Scheduling in MapReduce Environments ", 10th Web Information System and Application Conference, 2013, pp. 308-313.
[36] X. Wang, D. Shen, G. Yu, T. Nie, Y. Kou, "A Throughput Driven Task Scheduler for Improving MapReduce Performance in Job-intensive Environments ", IEEE International Congress on Big Data, 2013, pp. 211-218.
International Journal of Science and Engineering Investigations, Volume 4, Issue 43, August 2015 78
www.IJSEI.com Paper ID: 44315-11
ISSN: 2251-8843
[37] J. Polo, D. Carrera, Y. Becerra, M. Steinder, and I. Whalley, "Performance-driven task co-scheduling for mapreduce environments", In Network Operations and Management Symposium (NOMS), IEEE, 2010, pp. 373 –380, 19-23.
[38] X. Dong, Y. Wang, H. Liao, “Scheduling Mixed Realtime and Non-real-time Applications in MapReduce Environment”, In the proceeding of 17th International Conference on Parallel and Distributed Systems, 2011, pp. 9 – 16.
[39] C. He, Y. Lu and D. Swanson, " Real-Time Scheduling in MapReduce Clusters ", IEEE International Conference on High Performance Computing and Communications & IEEE International Conference on Embedded and Ubiquitous Computing, 2013, pp.1536-1544.
[40] M. Hammoud and M. Sakr, “ Locality-aware reduce task scheduling for MapReduce”, In: The third international conference on cloud computing technology and science, IEEE, 2011, pp. 570–576.

Thank you for copying data from http://www.arastirmax.com