About Me

  • Innovative and results-driven Data Engineering leader with over 10+ years of deep technical expertise, including 5+ years of impactful experience leading, designing and implementing complex distributed data engineering projects within startup environments.
  • Managed a team of 10+ data and analytics engineers, acting as a Technical Lead, and consistently delivered high-performance, scalable data solutions.
  • Proven success in migrating on-premises data services to cloud environments utilizing managed services, implementing comprehensive data governance strategies, and optimizing data processing workflows for performance and efficiency.
  • Collaborated extensively with Product and Engineering teams to build integrated data solutions for larger organizational groups.
  • Designed robust data quality frameworks to ensure data integrity and reliability.
  • Proficient in a wide range of technologies, including Python, SQL, Spark, Airflow, Scala, Kafka, and Cloud Data platforms , with a strong commitment to mentoring, guiding junior engineers, and facilitating effective cross-functional collaboration.
  • Proven success in migrating on-premises data services to cloud environments utilizing managed services, implementing comprehensive data governance strategies, and optimizing data processing workflows for performance and efficiency. Proficient in mentoring junior engineers and facilitating effective cross-functional collaboration.

Skill-Set

Programming Languages

Python Scala Java SQL Terraform

Big Data Tools

Apache Spark PySpark Hive Flink

Streaming & ETL

Kafka DBT

AWS Services

S3 EMR Glue DynamoDB Kinesis MWAA RDS Lambda Secret Managers

Azure Services

HDInsights ADLS ACR Blob Storage KeyVault

GCP Services

Dataproc GCS Cloud SQL Secret Manager

Data Warehouses

Snowflake BigQuery Databricks

Orchestration

Airflow Control M

Containerization

Kubernetes Docker

Key Achievements

On-Prem to GCP Migration

Successfully migrated existing Spark pipelines to GCP Dataproc orchestrated by Airflow and data from Hive warehouses to GCS, BigQuery respectively.

Augmented Lakehouse Engine

Designed and built a configuration-driven ETL framework using Spark, incorporating ingestion, processing, and data quality features.

Snowflake Cost Optimization

Reduced annual Snowflake costs from $600K to $450K through performance optimization and resource management

GCP Cost Optimization

Achieved significant cost reduction in GCP analytics infrastructure from $57K to $2.3K monthly

AWS re:Invent Presentation

Contributed to demonstration of a commercial analytics solution at AWS re:Invent conference

Industry Recognition

Accenture Innovation Award, IBM Best Graduate Hire Award, and IBM Star Award for excellence