Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500
 A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry, “Challenges in Parallel Graph Processing,” Parallel Processing Letters, vol. 17, no. 1, pp. 5–20, 2007.
 D. Bader, J. Berry, S. Kahan, R. Murphy, E. Riedy, and J. Willcock, “Graph 500 Benchmark 1 (Search),” 2010, www.graph500.org.
 M. El-Fituri, “Analyzing and Improving the Performance of Parallel Graph Algorithms, Ph.D. Thesis,” New Mexico State University.
 M. Elfituri, J. Cook, and J. Cook, “Binary instrumentation support for measuring performance in openmp programs,” in Proceedings of the 5th International Workshop on Software Engineering for Computational Science and Engineering, ser. SE-CSE ’13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 19–23. (Online). Available: http://dl.acm.org/citation.cfm?id=2663370.2663374
 J. Vetter and M. McCracken, “Statistical Scalability Analysis of Communication Operations in Distributed Applications,” in Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP), 2001.
 M. Roth, M. J. Best, C. Mustard, and A. Fedorova, “Deconstructing the overhead in parallel applications,” 2012 IEEE International Symposium on Workload Characterization (IISWC), vol. 0, pp. 59–68, 2012.
 M. Pavlovic, Y. Etsion, and A. Ramirez, “On the memory system requirements of future scientific applications: Four case-studies,” in Proceedings of the 2011 IEEE International Symposium on Workload Characterization, ser. IISWC ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 159–170.
 T. Suzumura, K. Ueno, H. Sato, K. Fujisawa, and S. Matsuoka, “Performance characteristics of graph500 on large-scale distributed environment,” in Proceedings of the 2011 IEEE International Symposium on Workload Characterization, ser. IISWC ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 149–158.
 M. Kambadur, K. Tang, and M. A. Kim, “Harmony: Collection and analysis of parallel block vectors,” in Proceedings of the 39th Annual International Symposium on Computer Architecture, ser. ISCA ’12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 452–463.
 M. A. Kim and S. A. Edwards, “Computation vs. memory systems: Pinning down accelerator bottlenecks,” in Proceedings of the 2010 International Conference on Computer Architecture, ser. ISCA’10. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 86–98.
 R. Murphy, “On the effects of memory latency and bandwidth on supercomputer application performance,” in Proc. 2007 IEEE 10th Int’l Symp. on Workload Characterization, ser. IISWC ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 35–43.
 M. Burtscher, R. Nasre, and K. Pingali, “A quantitative study of irregular programs on gpus,” 2012 IEEE International Symposium on Workload Characterization (IISWC), vol. 0, pp. 141–151, 2012.
 R. Jongerius, P. Stanley-Marbell, and H. Corporaal, “Quantifying the common computational problems in contemporary applications,” in Proc. 2011 IEEE Int’l Symp. on Workload Characterization, ser. IISWC ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 74–.
 H. Inoue and T. Nakatani, “Performance of multi-process and multithread processing on multi-core smt processors,” in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10), ser. IISWC ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1–10.
 V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader, “Scalable graph exploration on multicore processors,” in Proc. 2010 ACM/IEEE Int’l Conf. for High Performance Computing, Networking, Storage and Analysis, ser. SC ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1–11.
 E. Krepska, T. Kielmann, W. Fokkink, and H. Bal, “Hipg: Parallel processing of large-scale graphs,” SIGOPS Oper. Syst. Rev., vol. 45, no. 2, pp. 3–13, Jul. 2011.
 G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale graph processing,” in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 135–146.
 V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haridasan, “Managing large graphs on multi-cores with graph awareness,” in Proceedings of the 2012 USENIX Conference on Annual Technical Conference, ser. USENIX ATC’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 4–4.
 S. Beamer, K. Asanovic, and D. Patterson, "Locality exists in graph processing: Workload characterization on an ivy bridge server," in Proc. of IEEE International Symposium on Workload Characterization (IISWC), pp. 56--65, October 2015.
 Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali, “Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory”, PVLDB 13, 8 , 2020, 1304–13
 Laxman Dhulipala, Changwan Hong, and Julian Shun, ConnectIt, “A Framework for Static and Incremental Parallel Graph Connectivity Algorithms”, Proceedings of the VLDB Endowment, 2021. To appear.
 G. Gill, R. Dathathri, L. Hoang, and K. Pingali. A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms. volume 12 of PVLDB, 2018.
 Kartik Lakhotia, Rajgopal Kannan, Sourav Pati, and Viktor Prasanna, “A cache and memory-efficient framework for graph processing over partitions”, In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, 2019, GPOP, (PPoPP’19). ACM, New York, NY, 393—394
 D. A. B. et al. Hpcs “scalable synthetic compact applications #2 graph analysis”, version 2.2. http://www.graphanalysis.org/benchmark/HPCSSSCA2