Will Chen

Software Engineer|

Building production-grade ETL pipelines and distributed systems. Passionate about data infrastructure, cloud-native tooling, and reliable engineering at scale.

PythonGoSQLAWSDockerKubernetes

01 /About Me

Professional Summary

Building production-grade ETL pipelines and distributed systems. Passionate about data infrastructure, cloud-native tooling, and reliable engineering at scale.

Education

Master of Science in Analytics (Data Science)

Georgia Institute of Technology

Current Student • January 2026 - Present

Bachelor of Arts in Computer Science and Psychology

Rutgers University

GPA 3.44 • Graduated May 2022

Technical Skills

Languages & Databases

Python
Go
SQL
JavaScript
TypeScript
PostgreSQL
Redis
Amazon Redshift

Cloud & Infrastructure

AWS
Glue
Lambda
Step Functions
S3
Redshift
Docker
Kubernetes
GitHub Actions
Datadog
Bash

Data Platform & Infrastructure

CI/CD pipelines
distributed systems
REST APIs
observability and alerting
pipeline orchestration
Terraform
Infrastructure as Code
AI-assisted development
Claude
GitHub Copilot
technical documentation

Experience

Data Platform Engineer (Federal Contract)

Viatrie

Apr 2024 - Mar 2026
  • Pipeline Platform: Built the reusable pipeline platform for a USDA federal contract, shipping 50+ end-to-end ETL pipelines from S3 to Amazon Redshift in Python and AWS Glue; owned 150+ pipelines across the team and cut new feed onboarding time by 50% through reusable templates other engineers built on top of
  • Runtime Wins: Killed recurring Glue job timeout failures by re-architecting multi-step workflows with AWS Lambda and Step Functions, partitioned loads, and tuned DPU allocation, cutting end-to-end pipeline runtime by 60%
  • Delivery Infrastructure: Built the deployment system engineers used daily with GitHub Actions CI/CD, Docker container builds, and Terraform-managed AWS resources across 200+ tables in 3 environments, taking deploys from ~1 hour of manual work to ~10 minutes and eliminating unplanned schema rollbacks
  • Catching Bad Data Early: Wrote data quality checks into AWS Glue jobs that fail-fast on whitespace, type mismatches, and malformed records; automated schema drift detection caught 3 upstream schema changes over 6 months before any bad data reached USDA analyst dashboards
  • Platform Observability: Instrumented 150+ pipelines in Datadog with SLA tracking, volume anomaly detection, and on-call alerting, giving the team a single pane of glass into pipeline health instead of grepping through Glue logs
  • Unblocking Analysts: Shipped self-serve Tableau dashboards and written runbooks serving 100+ users at USDA, powering congressional reports and internal budget reviews, and saving the team ~2 hours/week of ad-hoc report pulls

Research Software Engineer

Rutgers Chlamydia Lab

August 2022 - Present
  • Protein Data Analysis: Lifted unknown protein classification accuracy from ~70% to 89% by training scikit-learn ensemble models on mass spectrometry data, serving 10-30 researchers, scientists, and doctors across 10+ collaborating labs in bacterial sample analysis
  • Automation & Reliability: Cut 2+ hours of daily manual work by scripting instrument data processing in Python; kept the 5TB research database intact with automated weekly backups and upload validation that catches corrupted writes before downstream analysis

Open Source Contributor

Isaac Lab (NVIDIA)

January 2024 - Present

Open Source Contribution: Contribute to Isaac Lab, NVIDIA's open-source GPU-accelerated framework for reinforcement learning and imitation learning across multiple robot embodiments, building modular Python simulation environments for manipulator and AMR tasks in PhysX

02 /Featured Projects

Distributed Task Queue

  • Job Engine: Built a distributed task queue in Go using Redis lists with BLPOP for blocking dequeue, processing 5K jobs across 4 workers in ~10 seconds; wrote the worker and scheduler from scratch instead of pulling in Celery to learn the internals of Go concurrency and Redis
  • Failure Handling: Handled worker crashes with a 5-second heartbeat written to Redis; jobs from workers that go silent for 30 seconds get requeued by a sweeper, giving at-least-once delivery without a separate broker, plus PostgreSQL-backed status history and exponential-backoff retries
  • Ship Path: Containerized with Docker and deployed to Kubernetes with separate deployments for API, workers, and databases; horizontal pod autoscaling on queue depth and automated build/test/rollout through GitHub Actions
GoRedisPostgreSQLDockerKubernetesGitHub Actions

Portfolio Website

Modern, minimal portfolio with dark mode, Notion CMS integration, and 95+ Lighthouse score. Built with TDD practices and Notion-powered content management.

ReactTypeScriptTailwind CSSVite

03 /Let's Connect

Open to new data engineering and software engineering roles. Let's chat.

or reach me directly

View Resume