Job Description
ROLE SUMMARY We are looking for a Senior Data Platform / Migration Engineer to lead the modernization of an enterprise data ecosystem, including migration from Cloudera DataIQ DSS to MapR. This role requires deep expertise in large-scale distributed data systems, migration strategy, and performance optimization, with a strong focus on zero data loss, minimal downtime, and production stability. KEY RESPONSIBILITIES β’ Lead end-to-end migration of enterprise data lake from Cloudera (DataIQ, DSS, CDP) to MapR β’ Define and execute migration strategy ensuring data integrity, minimal downtime, and rollback readiness β’ Design and build scalable, production-grade data pipelines post-migration β’ Optimize cluster performance including compute, storage, and resource utilization β’ Partner with BI/reporting teams to ensure schema consistency and data availability β’ Implement data validation frameworks to ensure accuracy and completeness post-migration β’ Document architecture, runbooks, lineage, and operational procedures β’ Collaborate with governance teams on data quality, lineage, and compliance requirements REQUIRED SKILLS AND EXPERIENCE β’ 8+ years in Data Engineering / Data Platform Engineering β’ Strong hands-on experience with Cloudera (CDP, DSS, DataIQ) and/or MapR β’ Strong hands-on experience with Apache Spark, Hive, Hadoop, HDFS β’ Proven experience executing large-scale data lake migrations β’ Strong programming skills in Python, Scala, or SQL β’ Deep understanding of distributed data processing and storage systems β’ Experience with ETL/ELT frameworks (Informatica, Talend, dbt, or similar) PREFERRED QUALIFICATIONS β’ Prior MapR implementation or certification β’ Experience with streaming platforms (Kafka, Pulsar) β’ Exposure to cloud-native data platforms (AWS S3, Azure Data Lake, Google Cloud Platform) β’ Familiarity with data governance, lineage, and catalog tools β’ Experience working in high-scale enterprise environments (multi-terabyte/petabyte) CORE TECHNOLOGY STACK Cloudera DSS / DataIQ / CDP, MapR, Apache Spark, Hive, Hadoop, HDFS, Kafka, Python, SQL, dbt, Informatica / Talend