Be wary of WhatsApp messages impersonating Jobline Resources's staff offering job opportunities. Those who encounter suspicious messages can contact Jobline at +65 6339 7198
Responsibilities
Perform availability monitoring, outage detection, and performance optimization of Azure AI cloud platform Support incident response, root cause analysis, and implement disaster recovery strategies to ensure business continuity Support security audits, compliance reporting, and ensure alignment with Singtel policies, regulatory frameworks and industry best practices Collaborate with other developer teams to integrate monitoring, automation, and security best practices into AI/ML workflows Drive continuous improvement in platform operations through automation, observability, and operational excellence initiatives
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field 1-2 years of experience in cloud administration and/or operations. Expertise in Azure operations and monitoring services including Azure Monitor, Log Analytics, Application Insights Proficiency in infrastructure-as-code (Terraform, Bicep, ARM) and automation scripting (PowerShell, Python) Familiarity with AI/ML infrastructure (AKS, GPU VMs, data pipelines, model hosting) and their operational demands Excellent problem-solving, communication, and leadership skills, especially in high-pressure incident scenarios Forward thinking ability to identify possible failure scenarios and formulate effective response plans