Experience

Data Engineering, Analytics Engineering, and Applied AI—delivering end-to-end systems with measurable business impact.

Featured

$10M

annual infra savings (cloud modernization)

150M+

records/day processed (streaming)

1,500+

tables migrated with 99%+ accuracy

16 hrs/day

manual ops eliminated (Airflow)

Data Analyst Research Assistant — Applied AI / MLOps

University of Southern CaliforniaLos Angeles, CA

May 2025 – Present

Orchestrated ML training workflows across Vertex AI and Snowflake; introduced Git-based CI/CD automation to standardize runs and reduce training time by 2 hours per cycle.
Operationalized reproducible experimentation by versioning data extracts, feature definitions, and training artifacts to improve traceability and reviewability in research iterations.
Built feature engineering pipelines (Python/SQL) and collaborated with stakeholders to translate research requirements into production-ready ML features.

Developed exploratory analysis in Tableau to surface signal quality issues, guide feature prioritization, and accelerate model iteration.
Optimized BigQuery workloads through partitioning and clustering strategies, driving a 38% reduction in query costs while preserving analytical fidelity.

Prototyped retrieval-augmented analysis patterns (embeddings + semantic retrieval) to enable natural-language exploration of datasets, documentation, and research outputs.
Established lightweight evaluation checks (e.g., relevance and consistency) to keep model-assisted outputs auditable and aligned with source data.

Faster ML cycles

2 hours saved per training cycle via CI/CD + standardization

Lower analytics spend

38% BigQuery cost reduction through physical design optimization

Vertex AI Snowflake BigQuery Python SQL MLOps RAG Embeddings LangChain Tableau CI/CD

SAG-AFTRA Health PlanBurbank, CA

May 2024 – Aug 2024

Managed AWS infrastructure for big data analytics; optimized EC2 sizing and scheduling, reducing operating costs by $23k annually.
Implemented monitoring and alerting standards for pipeline health (latency, failures, SLA breaches), improving operational reliability and time-to-detect.

Automated ingestion from ThoughtSpot APIs into Snowflake via REST integrations, reducing data latency by 42%.
Automated ThoughtSpot worksheet metadata synchronization using Python + Snowflake business glossary inputs (TML processing), improving governance and self-serve discoverability.

AWS EC2 Snowflake Python REST APIs ThoughtSpot Observability

Quantiphi, Inc.Bengaluru, India

Feb 2021 – Jul 2023

Led migration of a legacy Hadoop estate to Google Cloud Platform; modernized 1,500+ Hadoop/MapReduce jobs to Dataproc.
Optimized Apache Spark parameters (executors, shuffle management, SSD balancing), delivering $10M annual infrastructure savings.

Directed an 8-engineer program to migrate 1,500+ tables to BigQuery with zero downtime.
Implemented SQL validation checks and Data Validation Tool (DVT), achieving 99%+ accuracy and audit-ready reporting.

Productionized 50+ ETL/ELT pipelines using Apache Airflow (GCS → BigQuery), eliminating 16 hours/day of manual operations.
Authored 100+ LookML models in Looker to operationalize a governed metrics layer; dashboards enabled $1.2M annual savings.
Presented KPI insights and adoption plan to client VPs, enabling cross-functional rollout of governed metrics and self-serve analytics.

Built a Kafka + PySpark streaming pipeline processing 150M+ raw records/day from 30+ sources for real-time analytics.

GCP Dataproc BigQuery Airflow Kafka PySpark Looker

Download the resume or connect on LinkedIn.

Resume LinkedIn