贾禛 (Zhen Jia)

Postdoctoral Research Associate
Department of Computer Science
Princeton University

35 Olden Street
Princeton, NJ 08540
Email: zhenj (at) cs (dot) princeton (dot) edu

Google Scholar | LinkedIn | Facebook

[CV]

I'm a Postdoctoral Research Associate working with Prof. Kai Li and Prof. Sebastian Seung. Prior to starting my postdoc, I received my Ph.D degree in 2016 at Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), supervised by Prof. Lixin Zhang and Prof. Jianfeng Zhan and my Bachelor degree at Dalian University of Technology in 2010 .

My research interests include: convolutional neural network, benchmarking, workload characterization, and performance optimization for data analytics.

	Princeton University Postdoctoral Research Associate Sep 2016--Present
	Institute of Computing Technology, Chinese Academy of Sciences Research Assistant, Ph.D. Candidate Sep 2010-- Jul 2016
	IBM China Research Laboratory Research Intern Oct 2014-- Feb 2016
	Dalian University of Technology Undergrad Student Sep 2006-- Jul 2010

Efficient deep learning package for Intel® Xeon Phi

Intel® Xeon Phi™, based on Many-Integrated-Core (MIC) architecture, offers an alternative to GPUs for deep learning, because its peak floating-point performance and cost are on par with a GPU, while offering several advantages such as easy to program, binary compatible with host processor, and direct access to large host memory. However, it is still challenging to fully take advantage of the hardware capabilities. This project aims at developing an efficient deep learning package for Intel® Xeon Phi™ processors. The proposed optimizations include trading memory space for computation, intelligently choosing direct vs. FFT-based convolution for each layer of the network, choosing the right flavor of task parallelism, intelligent tiling to optimize L2 cache performance, and careful data structure layouts to maximize the utilization of AVX-512 vector units.

ArchTuner: An auto-tuning tool for IBM POWER8 Systems

ArchTuner is a machine learning based architectural tuner for IBM POWER8 Systems running big data, HPC, and database workloads. Work on a prefetcher tuner is ongoing, and an SMT tuner for Spark based applications has been finished. We propose a novel, user/application-transparent framework for online SMT configuration adaptation and integrate it into the Spark software stack to implement dynamic SMT.

BDTune: A multi-level profiler for big data workloads

In order to identify potential bottleneck in big data systems, BDTune gathers performance data on multiple levels of the software stack, including the application, the software stack, the OS, and the hardware. To do this, it uses tracing, log analysis, and PMU accessing technologies.

BigDataBench: A benchmark suite for big data systems and researchers

A benchmark suite, that not only covers broad application scenarios, but also includes diverse and representative data sets. It also includes most famous big data software stacks and the corresponding representative workloads.

BigDataBench Architecture Subset: A subset for computer architects

Intended for the architecture community, this suite currently downsizes the full BigDataBench 3.0, 77 workloads, to 17 representative workloads, each representing a workload cluster with a different size.

DCBench: A benchmark suite for data center workloads

The first release of DCBench provides 19 representative workloads from data center systems. The suite includes diverse kinds of workloads (online and offline) and with different programming models (MPI, MapReduce, etc.) and programming languages (now merged into BigDataBench).

Search: A search prototype

Since we lacked permission to probe real-world web search engines, we set up a search server in our lab using Nutch as the search engine and SoGou web corpus as the indices and snapshot data. However, we have obtained the permission to use three real workload traces, one from SoGou and the other two from two of the largest search service providers in China. We have released the system, which is simply named Search, as a benchmark for datacenter computing.

Publications

Towards Optimal Winograd Convolution on Manycores.
Zhen Jia, Aleksandar Zlateski, Fredo Durand, Kai Li.
SysML 2018.

A Deeper Look at FFT and Winograd Convolutions.
Aleksandar Zlateski, Zhen Jia, Kai Li, Fredo Durand.
SysML 2018.

Optimizing N-Dimensional, Winograd-Based Convolution for Manycore CPUs.
Zhen Jia, Aleksandar Zlateski, Fredo Durand, Kai Li.
ACM Principles and Practice of Parallel Programming (PPoPP), to appear in February 2018.

Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers.
Zhen Jia, Wanling Gao, Yingjie Shi, Sally A McKee, Jianfeng Zhan, Lei Wang, Lixin Zhang.
IEEE Transactions on Big Data.

BigDataBench-S: An Open-Source Scientific Big Data Benchmark Suite.
Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang, Jianfeng Zhan.
IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017.

Understanding Big Data Analytics Workloads on Modern Processors.
Zhen Jia, Jianfeng Zhan, Lei Wang, Chunjie Luo, Wanling Gao, Yi Jin, Rui Han, and Lixin Zhang.
IEEE Transactions on Parallel and Distributed Systems 28 (6), 1797-1810

CloudMix: Generating Diverse and Reducible Workloads for Cloud Systems.
Rui Han, Zan Zong, Fan Zhang, Jose Luis Vazquez-Poletti, Zhen Jia, Lei Wang.
IEEE 10th International Conference on Cloud Computing (CLOUD 2017).

BDTUne: Hierarchical correlation-based performance analysis and rule-based diagnosis for big data systems.
Rui Ren, Zhen Jia, Lei Wang, Jianfeng Zhan, Tianxu Yi.
IEEE International Conference on Big Data (IEEE Big Data 2017), December 2016, pp. 555-562.

Understanding Data Analytics Workloads on Intel Xeon Phi.
Biwei Xie, Xu Liu, Sally A McKee, Jianfeng Zhan, Zhen Jia, Lei Wang, Lixin Zhang.
IEEE 18th International Conference on High Performance Computing and Communications (HPCC 2016), December 2016, pp. 206-215.

Characterizing OS Behaviors of Datacenter and Big Data Workloads.
Chen Zheng, Jianfeng Zhan, Zhen Jia, Lixin Zhang.
IEEE 18th International Conference on High Performance Computing and Communications (HPCC 2016), December 2016, pp. 1079-1086.

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading.
Zhen Jia, Chao Xue, Guancheng Chen, Jianfeng Zhan, Lixin Zhang, Yonghua Lin, Peter Hofstee.
International Conference on Parallel Architectures and Compilation (PACT 2016), September 2016, pp. 387-400.

Characterization and Architectural Implications of Big Data Workloads.
Lei Wang, Rui Ren, Jianfeng Zhan, Zhen Jia
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2016), April 2016, pp. 145-146.

Characterizing Data Analytics Workloadson Intel Xeon Phi.
Biwei Xie, Xu Liu, Jianfeng Zhan, Zhen Jia, Yuqing Zhu, Lei Wang, Lixin Zhang.
IEEE International Symposium on Workload Characterization (IISWC 2015), October 2015, pp.114 – 115.

Characterizing and Subsetting Big Data Workloads.
Zhen Jia, Jianfeng Zhan, Wang Lei, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, and Jingwei Li.
IEEE International Symposium on Workload Characterization (IISWC 2014), October 2014, pp. 191-201.

Understanding the Behavior of In-Memory Computing Workloads.
Tao Jiang, Qianlong Zhang, Rui Hou, Lin Chai, Sally A. McKee, Zhen Jia, Ninghui Sun
IEEE International Symposium on Workload Characterization (IISWC 2014), October 2014, pp. 22-30.

BigDataBench: a Big Data Benchmark Suite from Internet Services.
Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, WanlingGao, Zhen Jia, Yingjie Shi, Shujie Zhang, Cheng Zhen, Gang Lu, Kent Zhan, Xiaona Li, Bizhu Qiu.
IEEE International Symposium On High Performance Computer Architecture (HPCA 2014), February 2014, pp.488 - 499.

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems.
Zhen Jia, Runlin Zhou, Chunge Zhu, Lei Wang, Wangling Gao, Yingjie Shi, Jianfeng Zhan, Lixin Zhang.
Springer Lecture Note in Computer Science (LNCS) vol. 8163, 2014, pp. 44-59.

CloudRank-V: A Desktop Cloud Benchmark with Complex Workloads.
Lin Cai, Zhen Jia, Yong Qi.
15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), November 2013, pp. 415-421.

Characterizing Data Analysis Workloads in Data Centers (Best Paper Award).
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, and Chunjie Luo.
IEEE International Symposium on Workload Characterization (IISWC 2013), September 2013, pp. 66-76

BigDataBench: a Big Data Benchmark Suite from Web Search Engines.
Ganling Gao, Yuqing Zhu, Zhen Jia, Chunjie Luo, Lei Wang, Zhiguo Li, Jianfeng Zhan, Yong Qi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu.
Third Workshop on Architectures and Systems for Big Data (ASBD), in conjunction with ISCA, June 2013.

Characterizing OS behavior of Scale-out Data Center Workloads.
Chen Zheng, Jianfeng Zhan, Zhen Jia, Lixin Zhang.
Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA).In conjunction with ISCA, June 2013.

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems.
Zhen Jia, Lei Wang, Jianfeng Zhan, and Lixin Zhang.
Second Workshop on Big Data Benchmarking (WBDB India), December 2012.

LogMaster: Mining Event Correlations in Logs of Large-scale Cluster Systems.
Xiaoyu Fu, Rui Ren, Jianfeng Zhan, Wei Zhou, Zhen Jia, Gang Lu.
31st IEEE International Symposium on Reliable Distributed Systems (SRDS), October 2012, pp. 71-80.

CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications.
Chunjie Luo, Jianfeng Zhan, Zhen Jia, Lixin Zhang, Cheng-Zhong Xu, and Ninghui Sun.
SpringerFrontiers in Computer Science (FCS), 6(4):347–362, August 2012.

Precise, Scalable, and Online Request Tracing of Multi-tier Services of Black Boxes.
Bo Sang, Jianfeng Zhan, Gang Lu, Haining Wang, Dongyan Xu, Lei Wang, Zhen Jia.
IEEE Transactions on Parallel and Distributed Systems (TPDS),23(6): 1159-1167, June 2012.

High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers.
Jianfeng Zhan, Lixin Zhang, Ninghui Sun, Lei Wang, Zhen Jia, Chunjie Luo.
Workshop on Large-Scale Parallel Processing (LSPP) in conjunction with IPDPS, May 2012.

Characterization of Real Workloads of Web Search Engines.
Huafeng Xi, Jianfeng Zhan, Zhen Jia, Xuehai Hong, Lei Wang, Lixin Zhang, Ninghui Sun, Gang Lu.
IEEE International Symposium on Workload Characterization (IISWC 2011), November 2011, pp.15-25.

Invited Talks & Tutorials

Invited Talks

Characterizing High Volume Computing Workloads.
Huawei. Hangzhou,China, June 2013.

DCBench: a Data Center Benchmark Suite.
Second Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, in conjunction with CCF HPC China. Guilin, China,October 2013.

Tutorials

High Volume Computing: The Motivations, Metrics, and Benchmarks Suites for Data Center Computer Systems.
in conjunction with the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013)

Tutorial: BigDataBench
in conjunction with the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2014)

IISWC 2013 Best paper award

The National Scholarship for Ph.D.

Director’s Scholarship of Institute of Computing Technology, Chinese Academy of Sciences

IBM Ph.D. Fellowship

PC member, BPOE-9, in conjunction with ASPLOS 2018
PC member, BPOE-8, in conjunction with ASPLOS 2017
Publicity Chair, BPOE-6, in conjunction with VLDB 2015 Submission Chair, IISWC 2014 Publicity Chair, BPOE-5, in conjunction with VLDB 2014 Publicity Chair, BPOE-4, in conjunction with ASPLOS 2014 Subreviewer, NPC 2014 Publicity Chair, BPOE-1, in conjunction with IEEE Big Data 2013 Reviewer, BPOE-1, in conjunction with IEEE Big Data 2013 Reviewer, Journal of Parallel and Distributed Computing (JPDC) Reviewer, Transactions on Architecture and Code Optimization (TACO) Reviewer, ACM Transactions on Storage (TOS)

贾禛 (Zhen Jia)

Postdoctoral Research Associate
Department of Computer Science
Princeton University

35 Olden Street
Princeton, NJ 08540
Email: zhenj (at) cs (dot) princeton (dot) edu

Google Scholar | LinkedIn | Facebook

[CV]

Experience

Princeton University
Postdoctoral Research Associate
Sep 2016--Present

Institute of Computing Technology, Chinese Academy of Sciences
Research Assistant, Ph.D. Candidate
Sep 2010-- Jul 2016

IBM China Research Laboratory
Research Intern
Oct 2014-- Feb 2016

Dalian University of Technology
Undergrad Student
Sep 2006-- Jul 2010

Projects

Efficient deep learning package for Intel® Xeon Phi

ArchTuner: An auto-tuning tool for IBM POWER8 Systems

BDTune: A multi-level profiler for big data workloads

In order to identify potential bottleneck in big data systems, BDTune gathers performance data on multiple levels of the software stack, including the application, the software stack, the OS, and the hardware. To do this, it uses tracing, log analysis, and PMU accessing technologies.

BigDataBench: A benchmark suite for big data systems and researchers

A benchmark suite, that not only covers broad application scenarios, but also includes diverse and representative data sets. It also includes most famous big data software stacks and the corresponding representative workloads.

BigDataBench Architecture Subset: A subset for computer architects

Intended for the architecture community, this suite currently downsizes the full BigDataBench 3.0, 77 workloads, to 17 representative workloads, each representing a workload cluster with a different size.

DCBench: A benchmark suite for data center workloads

The first release of DCBench provides 19 representative workloads from data center systems. The suite includes diverse kinds of workloads (online and offline) and with different programming models (MPI, MapReduce, etc.) and programming languages (now merged into BigDataBench).

Search: A search prototype

Publications

Invited Talks & Tutorials

Invited Talks

Characterizing High Volume Computing Workloads.
Huawei. Hangzhou,China, June 2013.

DCBench: a Data Center Benchmark Suite.
Second Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, in conjunction with CCF HPC China. Guilin, China,October 2013.

Tutorials

Honors and Awards

IISWC 2013 Best paper award

The National Scholarship for Ph.D.

Director’s Scholarship of Institute of Computing Technology, Chinese Academy of Sciences

IBM Ph.D. Fellowship

Professional Activities

贾 禛 (Zhen Jia)

Postdoctoral Research Associate Department of Computer Science Princeton University

35 Olden Street Princeton, NJ 08540 Email: zhenj (at) cs (dot) princeton (dot) edu Google Scholar | LinkedIn | Facebook [CV]

Experience

Princeton University Postdoctoral Research Associate Sep 2016--Present

Institute of Computing Technology, Chinese Academy of Sciences Research Assistant, Ph.D. Candidate Sep 2010-- Jul 2016

IBM China Research Laboratory Research Intern Oct 2014-- Feb 2016

Dalian University of Technology Undergrad Student Sep 2006-- Jul 2010

Projects

Efficient deep learning package for Intel® Xeon Phi

ArchTuner: An auto-tuning tool for IBM POWER8 Systems

BDTune: A multi-level profiler for big data workloads

In order to identify potential bottleneck in big data systems, BDTune gathers performance data on multiple levels of the software stack, including the application, the software stack, the OS, and the hardware. To do this, it uses tracing, log analysis, and PMU accessing technologies.

BigDataBench: A benchmark suite for big data systems and researchers

A benchmark suite, that not only covers broad application scenarios, but also includes diverse and representative data sets. It also includes most famous big data software stacks and the corresponding representative workloads.

BigDataBench Architecture Subset: A subset for computer architects

Intended for the architecture community, this suite currently downsizes the full BigDataBench 3.0, 77 workloads, to 17 representative workloads, each representing a workload cluster with a different size.

DCBench: A benchmark suite for data center workloads

The first release of DCBench provides 19 representative workloads from data center systems. The suite includes diverse kinds of workloads (online and offline) and with different programming models (MPI, MapReduce, etc.) and programming languages (now merged into BigDataBench).

Search: A search prototype

Publications

Invited Talks & Tutorials

Invited Talks

Characterizing High Volume Computing Workloads. Huawei. Hangzhou,China, June 2013. DCBench: a Data Center Benchmark Suite. Second Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, in conjunction with CCF HPC China. Guilin, China,October 2013.

Tutorials

Honors and Awards

IISWC 2013 Best paper award The National Scholarship for Ph.D. Director’s Scholarship of Institute of Computing Technology, Chinese Academy of Sciences IBM Ph.D. Fellowship

Professional Activities

贾禛 (Zhen Jia)

Postdoctoral Research Associate
Department of Computer Science
Princeton University

35 Olden Street
Princeton, NJ 08540
Email: zhenj (at) cs (dot) princeton (dot) edu

Google Scholar | LinkedIn | Facebook

[CV]

Princeton University
Postdoctoral Research Associate
Sep 2016--Present

Institute of Computing Technology, Chinese Academy of Sciences
Research Assistant, Ph.D. Candidate
Sep 2010-- Jul 2016

IBM China Research Laboratory
Research Intern
Oct 2014-- Feb 2016

Dalian University of Technology
Undergrad Student
Sep 2006-- Jul 2010

Characterizing High Volume Computing Workloads.
Huawei. Hangzhou,China, June 2013.

DCBench: a Data Center Benchmark Suite.
Second Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, in conjunction with CCF HPC China. Guilin, China,October 2013.

IISWC 2013 Best paper award

The National Scholarship for Ph.D.

Director’s Scholarship of Institute of Computing Technology, Chinese Academy of Sciences

IBM Ph.D. Fellowship