Performance Evaluation of Distributed Deep Learning Frameworks in Cloud Environment
2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.
 Witten, Ian H., et al. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.
 Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. "Supervised machine learning: A review of classification techniques." Emerging artificial intelligence applications in computer engineering 160 (2007): 3-24.
 LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436.
 Schmidhuber, Jürgen. "Deep learning in neural networks: An overview." Neural networks 61 (2015): 85-117.
 Abadi, Martin, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016.
 Yao, Xin. "Evolving artificial neural networks." Proceedings of the IEEE 87.9 (1999): 1423-1447.
 Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
 Lee, Seunghak, et al. "On model parallelization and scheduling strategies for distributed machine learning." Advances in neural information processing systems. 2014.
 Abadi, Martín, et al. "Tensorflow: A system for large-scale machine learning." 12th Symposium on Operating Systems Design and Implementation, 2016.
 Sergeev, Alexander, and Mike Del Balso. "Horovod: fast and easy distributed deep learning in TensorFlow." arXiv preprint arXiv:1802.05799 (2018).
 Chilimbi, Trishul, et al. "Project adam: Building an efficient and scalable deep learning training system." 11th Symposium on Operating Systems Design and Implementation, 2014.
 Large Scale Visual Recognition Challenge 2012 (ILSVRC2012), http://www.image-net.org/challenges/LSVRC/2012/
 Hasanov, Khalid, and Alexey Lastovetsky. "Hierarchical redesign of classic MPI reduction algorithms." The Journal of Supercomputing 73.2 (2017): 713-725.
 Li, Shengbo Eben, Shaobing Xu, and Dongsuk Kum. "Efficient and accurate computation of model predictive control using pseudospectral discretization." Neurocomputing 177 (2016): 363-372.
 Potluri, Sreeram, et al. "Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs." 2013 42nd International Conference on Parallel Processing. IEEE, 2013.
 NCHC, National Center for High-performance Computing. http://www.nchc.org.tw/
 Soltesz, Stephen, et al. “Container-based Operating System Virtualization: a Scalable, High-performance Alternative to Hypervisors,” ACM SIGOPS Operating Systems Review. Vol. 41. No. 3. ACM, 2007.
 Snell, Quinn O., Armin R. Mikler, and John L. Gustafson. "Netpipe: A network protocol independent performance evaluator." IASTED international conference on intelligent information management and systems. Vol. 6. 1996.
 Bernstein, David. "Containers and cloud: From lxc to docker to kubernetes." IEEE Cloud Computing 1.3 (2014): 81-84.
 Kurtzer, Gregory M., Vanessa Sochat, and Michael W. Bauer. "Singularity: Scientific containers for mobility of compute." PloS one 12.5 (2017): e0177459.
 Pena, Dexmont, et al. "Benchmarking of CNNs for low-cost, low-power robotics applications." RSS 2017 Workshop: New Frontier for Deep Learning in Robotics. 2017.