Machine Learning CPU Compiler Engineer
Apple · Full-time
Jul 2023 - Present
• 1 yr 1 moCerebras Systems
Sep 2019 - Jul 2023
Senior Member of Technical Staff
Nov 2020 - Jul 2023
• 2 yrs 9 mosI lead a team of ~10 engineers to develop our automatic kernel generator and integrate it into the Cerebras Graph Compiler (CGC). Automatically-generated kernels are critical to the flexibility and robustness of our accelerator as they provide efficient on-demand kernels when hand-written implementations do not exist.
I remain primarily hands-on and spend a large portion of my time implementing features and collaborating on technical problems within my team and across teams.
Beyond the code generator, I have worked on and led several efforts to ensure the end-to-end robustness, reliability, and generalizability of the CGC, so that user's high-level model is efficiently compiled to binaries that execute the model at performance on the the wafer-scale engine.
Member of Technical Staff
Sep 2019 - Nov 2020
• 1 yr 3 mosSoftware engineer designing and implementing an automatic code generator for the massively-parallel Cerebras Wafer Scale Engine. Using polyhedral compilation techniques, the code generator takes high level graph operations and compiles them to efficient low-level architecture-specific code for our chips.
PhD Research Intern
Google, LASER team
Jun 2017 - Sep 2017
• 4 mosThe primary goal of the internship was to research new approaches to computing low-rank matrix completions, such as the Hadamard Multifactorization, with a focus on applications like recommendation systems (e.g., movie/music recommendations) and for Natural Language Processing.
I implemented high-performance solvers for computing approximate low-rank matrix factorizations using Weighted-Alternating-Least-Squares. The solvers were written in python using numpy and scipy, and continued to be used by the team for ongoing experiments after my internship ended.
As part of the research, I demonstrated cases where Hadamard Multifactorization outperforms traditional low-rank matrix completion for computing word embeddings, particularly when computing embeddings for several languages simultaneously.
Science Researcher
The University of British Columbia
Jun 2016 - Sep 2016
• 4 mos- Derived and developed fast iterative methods for (possibly non-symmetric) saddle-point linear systems; such linear systems are ubiquitous within engineering applications.
- Work was performed in collaboration with Prof. Chen Greif.
Software Development Engineering Intern
Microsoft
Jun 2015 - Sep 2015
• 4 mos- Worked in Elastic Scale team, implementing feature for distributed database transactions in the cloud using research conducted at Microsoft Research.
- Created design document, implemented it within the SQL Server Engine code and implemented a test suite for the feature.
- More details to follow when feature is in public preview.