Azure Data Engineer (Lead) with skills Data Engineering, Python, Apache Hadoop, Apache Hive, Apache Airflow, synapse, Databricks, SQL, Apache Spark, Azure Data Factory, Pyspark, GenAI Fundamentals, Cloud Pub/Sub, BigQuery for location Gurugram, India
ROLES & RESPONSIBILITIES

Key Responsibilities

  • Lead design and execution of Dataproc ? Databricks PySpark migration roadmap.

  • Define modernization strategy, including data ingestion, transformation, orchestration, and governance.

  • Architect scalable Delta Lake and Unity Catalog–based solutions.

  • Manage and guide teams on code conversion, dependency mapping, and data validation.

  • Collaborate with platform, infra, and DevOps teams to optimize compute costs and performance.

  • Own the automation & GenAI acceleration layer, integrating code parsers, lineage tools, and validation utilities.

  • Conduct performance benchmarking, cost optimization, and platform tuning (Photon, Auto-scaling, Delta Caching).

  • Mentor senior and mid-level developers, ensuring quality standards, documentation, and delivery timelines.

Technical Skills

  • Languages: Python, PySpark, SQL

  • Platforms: Databricks (Jobs, Workflows, Delta Live Tables, Unity Catalog), GCP Dataproc

  • Data Tools: Hadoop, Hive, Pig, Spark (RDD & DataFrame APIs), Delta Lake

  • Cloud & Integration: GCS, BigQuery, Pub/Sub, Cloud Composer, Airflow

  • Automation: GenAI-powered migration tools, custom Python utilities for code conversion

  • Version Control & DevOps: Git, Terraform, Jenkins, CI/CD pipelines

  • Other: Performance tuning, cost optimization, and lineage tracking with Unity Catalog

Preferred Experience

  • 10–14 years of data engineering experience with at least 3 years leading Databricks or Spark modernization programs.

  • Proven success in migration or replatforming projects from Hadoop or Dataproc to Databricks.

  • Exposure to AI/GenAI in code transformation or data engineering automation.

  • Strong stakeholder management and technical leadership skills.

EXPERIENCE
  • 11-12 Years
SKILLS
  • Primary Skill: Data Engineering
  • Sub Skill(s): Data Engineering
  • Additional Skill(s): Python, Apache Hadoop, Apache Hive, Apache Airflow, synapse, Databricks, SQL, Apache Spark, Azure Data Factory, Pyspark, GenAI Fundamentals, Cloud Pub/Sub, BigQuery
ABOUT THE COMPANY

Infogain is a human-centered digital platform and software engineering company based out of Silicon Valley. We engineer business outcomes for Fortune 500 companies and digital natives in the technology, healthcare, insurance, travel, telecom, and retail & CPG industries using technologies such as cloud, microservices, automation, IoT, and artificial intelligence. We accelerate experience-led transformation in the delivery of digital platforms. Infogain is also a Microsoft (NASDAQ: MSFT) Gold Partner and Azure Expert Managed Services Provider (MSP).

Infogain, an Apax Funds portfolio company, has offices in California, Washington, Texas, the UK, the UAE, and Singapore, with delivery centers in Seattle, Houston, Austin, Kraków, Noida, Gurgaon, Mumbai, Pune, and Bengaluru.

Express Application
Upload Microsoft word, PDF file upto 500KB.
Recent Jobs
Posted on December 07, 2025
Python Developer (Lead) | 8-11 Years | Open Source Development - ReactJS, Python, Go Microservices, GoLang
Posted on December 07, 2025
Cloud Native App Developer (Lead) | 8-11 Years | CNA Development - ReactJS, Core Java, Java Webservices, Spring Boot, GCP-Apps...
Posted on December 07, 2025
Network Engineer (Senior) | 6-8 Years | Network Engineer - LAN, Network Operations, Firewall
Posted on December 07, 2025
Network Engineer (Senior) | 6-8 Years | Network Engineer - LAN, Network Operations, Firewall