Kan Bowen
Email: kanbowen17@163.com
Professional Skills
Main Research Areas:
- High-Performance Computing (HPC) processor architecture
- Parallel computing
- NVIDIA GPGPU architecture
- Domestic processors SW26010 and MT3000 architecture
Education
Jilin University
Master’s Degree in Computer System Architecture
School of Computer Science and Technology
September 2017 - July 2020
Changchun, Jilin Province
Research Experience:
- Researched traditional parallel models, including PRAM, LogP, BSP, and multi-BSP models.
- Investigated NVIDIA GPGPU HPC architecture models, focusing on memory hierarchy.
- Led the “Shenwei” chip group during the “Star of Scientific Research” summer camp in 2018, responsible for the overall layout of research on the SW26010 domestic processor, qualitative and quantitative modeling, and achieved a 13.9% performance improvement through software simulations.
- Researched memory and bandwidth-based optimizations for GPUs.
Jilin University
Bachelor’s Degree in Computer Science and Technology
School of Computer Science and Technology
September 2010 - July 2014
Changchun, Jilin Province
Internship Experience
National Supercomputing Center in Tianjin
System Department Intern
October 2019 - January 2020
Binhai New Area, Tianjin
Responsible for deploying and testing benchmarks such as stream, memtester, osu, iozone, mdtest, iperf, IOR, SEPC2006, and HPCG on TH-1A, HPC, and TH-3E systems.
Work Experience
China National Petroleum Corporation Jilin Oilfield Branch
Supervision Engineer
July 2014 - July 2015
Songyuan, Jilin Province
China National Petroleum Corporation Jilin Oilfield Branch
Secretary in the General Office
July 2015 - July 2017
Songyuan, Jilin Province
National Supercomputing Center in Tianjin
HPC System Engineer
July 2020 - June 2021
Binhai New Area, Tianjin
- Energy Efficiency in HPC:
- Investigated energy-saving plugins based on SLURM, CPU-based energy-saving modes, and application optimization to reduce execution time.
- SLURM Resource Management:
- Managed plugins like Power Management, Power Saving, GRES:GPU, Burst Buffer, Failure Management, pam modules, and MCS modules.
- Configured various parameters in files such as slurm.conf, node.conf, and partition.conf.
- Cluster Environment Setup:
- Assisted in deploying and testing HPC cluster networks, CPU, and I/O performance at Tianjin Medical University.
- Assisted in the setup of a high-performance cluster environment for the “Artificial Intelligence Project” at ZheJiang Lab.
National Supercomputing Center in Tianjin
Parallel Computing Engineer
June 2021 - Present
Tianjin
- Developed 10 float and double functions for the BLAS library level 1, along with 4 single complex and 2 double complex functions.
- Studied the core code of GROMACS, wrote, and optimized heterogeneous code based on the TH3 processor.
- Developed a set of inline assembly functions based on SVR registers, enabling vector-scalar conversions and vector assembly.
- Optimized the GROMACS source code to simulate a 300 million atom system.
- Reduced preprocessing time in GROMACS by 50% through source code optimization.
- Developed an HTC system for Vina with minimal impact on SLURM and Lustre, achieving features such as submission, monitoring, inspection, analysis, and resubmission.
Project Experience
Research on Parameter Analysis and Optimization Methods for Earth System Models and System Development
(2017YFA0604503)
Researcher
July 2017 - July 2022
Wuxi
- Tsinghua University: Earth System Model Parameter Analysis, Optimization, and Integrated Coupling System.
- Jilin University: Automated Optimization Deployment Platform for Earth System Model Public Software.
- Institute of Atmospheric Physics, Chinese Academy of Sciences: Earth System Model Experiment Scenario System.
- National Meteorological Center: Earth System Model Diagnostic and Evaluation System.
- Jiangnan Institute of Computing Technology, Wuxi: Environmental Support and System Maintenance.
Responsible for modeling and optimizing the SW26010 processor.
Phase I Completion of Project 1903
Parallel Computing Engineer
June 2021 - September 2021
Tianjin
- Ported GROMACS to the new generation of domestic processors MT3000 and optimized memory access.
- Built the basic environment for TH-3F system applications.
- Ported HPL to MT3000, responsible for the host-end code.
Tianhe-3 EF-level HPC System Optimization and Upgrade
(2021YFB0300105)
Parallel Computing Engineer
December 2021 - Present
Tianjin
- Expanded the molecular simulation limit of GROMACS to 410 million, enabling 65,536-core parallel computing.
- Optimized the preprocessing part of GROMACS, reducing the time by 50%, and published a patent (1/4).
- Developed a high-performance, low-overhead high-throughput scheduling system, achieving 100,000 concurrent Vina tasks, capable of scheduling 320,000 CPU cores for parallel computing, and completed 750 million Vina molecular docking tasks in 15 days with minimal impact on SLURM and Lustre. Successfully published a patent (1/8).
Honors and Awards
- Patent: “Job Scheduling Method, Device, Electronic Equipment, and Storage Medium” (1/8) Patent No.: ZL 2022 1 1100132.2 (2022.12.20)
- Patent: “Preprocessing Acceleration Method, Device, Equipment, and Storage Medium” (1/4) Patent No.: ZL 2022 1 0650452.9 (2022.09.16)
- Patent: “Numerical Processing System for Magnetic Confinement Fusion Based on Supercomputers” (8/8) Patent No.: ZL 2022 1 0388166.X (2022.07.08)
- Software Copyright: “Sunway 26010 Chip Performance Simulation Software” (1/10) Registration No.: 2019SR0646021 (2019.06.24)
- Software Copyright: “GPU Performance Simulation Software Based on Fermi Architecture” (2/10) Registration No.: 2019SR0547211 (2019.05.30)
- Software Copyright: “Workflow Visualization Software Based on Dynamic Programming” (3/10) Registration No.: 2019SR0525649 (2019.05.27)
- Outstanding Student Award, School of Computer Science and Technology, Jilin University (2014.06.15)
- Second Prize Scholarship, School of Computer Science and Technology, Jilin University (2014.06.15)