Responsibilities

• Build and optimize ETL/ELT processes leveraging Databricks' native capabilities to handle large volumes of structured and unstructured data from various sources

• Implement data quality frameworks and monitoring solutions using Databricks data quality features to ensure data accuracy and reliability across all data products

• Establish best practices for data governance, security, and compliance within the Databricks ecosystem and integrate with enterprise systems
• Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance across all Databricks workloads and clusters

• Implement comprehensive logging, alerting, and monitoring systems using Databricks monitoring capabilities and integration with enterprise monitoring tools

• Perform regular health checks on Databricks cluster performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively

• Manage incident response procedures for Databricks pipeline failures, including root cause analysis, resolution, and post-incident reviews

• Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment

• Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency

• Implement automated testing frameworks for Databricks-based data pipelines, including unit tests, integration tests, and data validation checks

• Maintain comprehensive documentation for all Databricks operational procedures, runbooks, and troubleshooting guides

• Coordinate scheduled maintenance windows and Databricks system upgrades with minimal business impact

• Manage user access controls, workspace configurations, and security policies within Databricks environments

• Monitor data lineage using Databricks Unity Catalog and maintain metadata management systems to support operational transparency and compliance requirements

• Establish capacity planning processes to forecast Databricks infrastructure needs and manage cloud costs effectively

• Provide technical guidance and mentorship to junior team members on Databricks best practices and data engineering principles

• Participate in on-call rotation for critical production systems with focus on Databricks platform stability

• Lead operational reviews and contribute to continuous improvement initiatives for Databricks platform reliability and efficiency

• Coordinate with infrastructure teams on Databricks cluster provisioning, network configurations, and security implementations

Requirements

• Degree in Computer Science or Computer Engineering

• Minimum 8-10 years working experience in system operations compliance and management areas

• Project hands-on experience specifically with Databricks platform (primary requirement)

• project experience in cloud operations or cloud architecture

• Must be cloud certified (AWS)
• Databricks certification (Associate or Professional level) - highly preferred

• Exposure to hospital information/clinical systems is an added advantage

• Understanding of DevOps practices and CI/CD pipelines for Databricks-based data engineering projects

• Knowledge of ITIL frameworks and operational best practices

• Expert-level proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration

• Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs

• Extensive experience with Delta Lake, including data versioning, time travel, and ACID transactions

• Proficiency in Databricks Unity Catalog for data governance and metadata management

• Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques

• Strong knowledge of monitoring, incident management, and cloud cost control
• Databricks (primary and most critical skill)

• AWS cloud services and architecture

• IDMC (Informatica Data Management Cloud)

• Tableau for data visualization

• Oracle Database management

• ML Ops practices within Databricks environment (Good to have)

• STATA for statistical analysis is advantage (Good to have)

• Amazon SageMaker integration with Databricks (Good to have)

• DataRobot platform integration (Good to have)
• Good interpersonal skills with the ability to work with different groups of stakeholders

• Strong problem-solving skills and ability to work independently in a fast-paced environment with minimal supervision

• Excellent communication skills for technical documentation and cross-team collaboration

Shortlisted candidates will be offered a 1 Year agency contract employment.