GCP Data Engineer (Lead) with skills Data Engineering, Big Data, GCP-Apps, Pyspark, BigQuery for location Any Infogain Base Location (Noida, Gurugram, Bangalore, Mumbai, Pune)
ROLES & RESPONSIBILITIES

Core Skills

Required Skills & Experience

  • 9–12 years of experience in data engineering or analytics, with 3+ years hands-on on GCP.

  • Strong experience with PySpark, Dataproc, GCS, BigQuery, and JDBC ingestion.

  • Proven experience migrating SAS workloads to PySpark or SQL-based systems.

  • Hands-on knowledge of GCP Medallion architecture (Bronze/Silver/Gold).

  • Understanding of Dataplex, IAM, policy tags, and secure data handling.

  • Experience with CI/CD (Cloud Build/GitHub Actions) and workflow orchestration (Cloud Composer/Airflow).

  • Strong problem-solving ability, debugging skills, and ability to guide teams through technical challenges.


Preferred Skills

  • Experience with Vertex AI, ML Ops, and ML pipeline deployment.

  • Knowledge of Delta/Iceberg/Hudi table formats on GCS.

  • Exposure to real-time ingestion (Pub/Sub, Dataflow).

  • Google Cloud Professional Data Engineer or Cloud Architect certification.


Soft Skills

  • Strong leadership and mentoring capabilities.

  • Excellent communication skills to support developers, architects, and business teams.

  • Ability to manage multiple priorities, resolve conflicts, and maintain steady progress under pressure.


 

Key Responsibilities

1. Hands-on Technical Leadership

  • Work closely with development teams on a daily basis to guide solution design, troubleshoot issues, and resolve technical blockers.

  • Enforce engineering best practices, coding standards, and architectural guidelines across all data pipelines and workloads.

  • Perform design and code reviews, ensuring quality, scalability, and reliability of the platform.

2. Data Engineering on GCP

  • Lead development of ingestion pipelines via Direct JDBC connectivity from Oracle and Teradata into the Raw/Bronze layer on GCS.

  • Develop and optimize PySpark workloads on Dataproc for data cleansing, transformation, and harmonization into the Curated/Silver layer.

  • Contribute to design of the Gold layer in BigQuery, including table structures, partitioning, clustering, and performance optimization.

3. Migration from SAS to GCP

  • Translate existing SAS logic into PySpark, ensuring functional parity, improved performance, and operational efficiency.

  • Provide guidance on PySpark coding patterns, UDFs, optimization strategies, shuffle/skew handling, and best practices for Dataproc jobs.

4. BigQuery Engineering & Optimization

  • Build and optimize SQL models, materialized views, and analytical datasets in BigQuery.

  • Apply query optimization techniques, cost controls, and data modeling best practices (star/snowflake).

  • Implement RLS/CLS for secure reporting and work with BI teams to integrate BigQuery into reporting tools.

5. Vertex AI & ML Support

  • Assist data scientists in building ML pipelines using Vertex AI (training, prediction, feature engineering).

  • Guide integration of feature pipelines from Silver ? Vertex AI Feature Store.

  • Ensure reproducibility, lineage, and model monitoring (drift, bias).

6. Data Governance & Security (Dataplex + IAM)

  • Implement and enforce governance standards using Dataplex, including cataloging, policy tags, and data domains.

  • Ensure datasets follow proper IAM roles, tagging, and compliance (PII/PCI/PHI masking where needed).

  • Support lineage metadata, DQ implementation, and documentation.

7. Operations, Monitoring & Cost Optimization

  • Optimize Dataproc clusters (autoscaling, Preemptibles), GCS storage lifecycle policies, and BigQuery costs.

  • Establish monitoring dashboards, logs, alerts, and operational KPIs.

  • Troubleshoot and resolve production issues, ensuring high availability and reliability.

EXPERIENCE
  • 11-12 Years
SKILLS
  • Primary Skill: Data Engineering
  • Sub Skill(s): Data Engineering
  • Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery
ABOUT THE COMPANY

Infogain is a human-centered digital platform and software engineering company based out of Silicon Valley. We engineer business outcomes for Fortune 500 companies and digital natives in the technology, healthcare, insurance, travel, telecom, and retail & CPG industries using technologies such as cloud, microservices, automation, IoT, and artificial intelligence. We accelerate experience-led transformation in the delivery of digital platforms. Infogain is also a Microsoft (NASDAQ: MSFT) Gold Partner and Azure Expert Managed Services Provider (MSP).

Infogain, an Apax Funds portfolio company, has offices in California, Washington, Texas, the UK, the UAE, and Singapore, with delivery centers in Seattle, Houston, Austin, Kraków, Noida, Gurgaon, Mumbai, Pune, and Bengaluru.

Express Application
Upload Microsoft word, PDF file upto 500KB.
Recent Jobs
Posted on March 10, 2026
DevOps Engineer (Senior) | 8-11 Years | DevOps Engineering - GIT / GITHUB, Jenkins, Kubernetes, Cloud Service Incident Management, Cloud Service Manager...
Posted on March 10, 2026
DevOps Engineer (Senior) | 8-11 Years | DevOps Engineering - GIT / GITHUB, Jenkins, Kubernetes, Cloud Service Incident Management, Cloud Service Manager...
Posted on March 10, 2026
QA Automation Engineer (Lead) | 8-11 Years | SDET - Automated Testing, Selenium, API & Web Services Testing
Posted on March 10, 2026
AI / ML Developer (Lead) | 8-11 Years | AI/ML Development - AI/ML Architecture, AI/ML Development, TensorFlow, Pytorch, AiBuilder...