GCP Data Architect (Principal) with skills Data Engineering, Big Data, GCP-Apps, Pyspark, BigQuery for location Any Infogain Base Location (Noida, Gurugram, Bangalore, Mumbai, Pune)
ROLES & RESPONSIBILITIES
Core Skills
Qualification
· 18+ years in data/analytics engineering with 10+ years architecting solutions on public cloud; 5+ years hands-on with GCP.
· Proven delivery of Medallion (Bronze/Silver/Gold) architectures on GCP with GCS + Dataproc + BigQuery at enterprise scale.
· Expert in PySpark and Dataproc (job orchestration, autoscaling, cluster policies, tuning, troubleshooting).
· Strong BigQuery expertise: storage/compute separation, partitioning, clustering, materialized views, BI Engine, slot management, RLS/CLS.
· Hands-on experience with Vertex AI (Pipelines, Feature Store, training/serving, registry, monitoring) and ML Ops best practices.
· Implemented Dataplex for centralized governance (catalog, policy tags) and IAM for least-privilege, plus data security/compliance controls.
· Practical integration with Oracle and Teradata via JDBC; familiarity with CDC patterns and schema evolution.
· CI/CD for data platforms (Cloud Build/GitHub Actions), orchestration (Cloud Composer/Airflow), and Infrastructure as Code (Terraform).
· Deep understanding of data modeling, data quality, lineage, and observability for data systems.
· Excellent communication, stakeholder management, and leadership across technical and business teams.
· Google certifications: Professional Cloud Architect and/or Professional Data Engineer (highly preferred).
· Experience modernizing SAS workloads and translating SAS macros/PROCs to PySpark/SQL on GCP.
· Knowledge of streaming (Pub/Sub, Dataflow/Flink/Spark Structured Streaming) for near-real-time requirements.
· Experience with VPC Service Controls, Private Service Connect, Organization Policy, Workload Identity Federation.
· Familiarity with Delta/Iceberg/Hudi tables and open table formats on GCS; data sharing patterns (Analytics Hub).
· Bachelor’s/Master’s in Computer Science, Engineering, Information Systems, or equivalent experience.
Job Description:
· Design and own the end-to-end analytics architecture on GCP, ensuring alignment with business, security, cost, and performance goals.
· Implement a Medallion architecture:
o Bronze (Raw) ingestion on GCS via JDBC from Oracle/Teradata.
o Silver (Curated) transformations using PySpark on Dataproc.
o Gold optimized in BigQuery for analytics and BI.
· Define canonical data models, storage formats (Parquet/ORC/Delta/Iceberg), and partitioning/clustering strategies.
· Lead migration from SAS to PySpark, establish coding standards, and optimize Spark jobs.
· Build JDBC ingestion pipelines with CDC, robust retries, schema evolution, and orchestrate workflows via Cloud Composer/Airflow and CI/CD.
· Architect BigQuery models, manage cost/performance, enforce SLAs, and integrate securely with BI tools using RLS/CLS.
· Define ML Ops workflows on Vertex AI, including feature pipelines, automated training, deployment, and model monitoring for drift/bias.
· Implement centralized governance via Dataplex (catalog, policy tags), IAM least privilege, VPC-SC, and data security/compliance controls.
· Drive cost optimization, reliability/SRE practices, monitoring, DR/BCP, and FinOps governance.
· Provide architectural leadership, mentor teams, set standards, and create roadmaps, ADRs, and executive-level communication.
EXPERIENCE
- 18-21 Years
SKILLS
- Primary Skill: Data Engineering
- Sub Skill(s): Data Engineering
- Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery