Home | Arindam Bhattacharjee

About Me

Innovative and results-driven Data Engineering leader with over 10+ years of deep technical expertise, including 5+ years of impactful experience leading, designing and implementing complex distributed data engineering projects within startup environments.
Managed a team of 10+ data and analytics engineers, acting as a Technical Lead, and consistently delivered high-performance, scalable data solutions.
Proven success in migrating on-premises data services to cloud environments utilizing managed services, implementing comprehensive data governance strategies, and optimizing data processing workflows for performance and efficiency.
Collaborated extensively with Product and Engineering teams to build integrated data solutions for larger organizational groups.
Designed robust data quality frameworks to ensure data integrity and reliability.
Proficient in a wide range of technologies, including Python, SQL, Spark, Airflow, Scala, Kafka, and Cloud Data platforms , with a strong commitment to mentoring, guiding junior engineers, and facilitating effective cross-functional collaboration.
Proven success in migrating on-premises data services to cloud environments utilizing managed services, implementing comprehensive data governance strategies, and optimizing data processing workflows for performance and efficiency. Proficient in mentoring junior engineers and facilitating effective cross-functional collaboration.

Python Scala Java SQL Terraform

Apache Spark PySpark Hive Flink

Kafka DBT

S3 EMR Glue DynamoDB Kinesis MWAA RDS Lambda Secret Managers

HDInsights ADLS ACR Blob Storage KeyVault

Dataproc GCS Cloud SQL Secret Manager

Snowflake BigQuery Databricks

Airflow Control M

Kubernetes Docker

Successfully migrated existing Spark pipelines to GCP Dataproc orchestrated by Airflow and data from Hive warehouses to GCS, BigQuery respectively.

Designed and built a configuration-driven ETL framework using Spark, incorporating ingestion, processing, and data quality features.

Reduced annual Snowflake costs from $600K to $450K through performance optimization and resource management

Achieved significant cost reduction in GCP analytics infrastructure from $57K to $2.3K monthly

Contributed to demonstration of a commercial analytics solution at AWS re:Invent conference

Accenture Innovation Award, IBM Best Graduate Hire Award, and IBM Star Award for excellence