Be wary of WhatsApp messages impersonating Jobline Resources's staff offering job opportunities. Those who encounter suspicious messages can contact Jobline at +65 6339 7198

Responsibilities

  • Perform availability monitoring, outage detection, and performance optimization of Azure AI cloud platform
  • Support incident response, root cause analysis, and implement disaster recovery strategies to ensure business continuity
  • Support security audits, compliance reporting, and ensure alignment with Singtel policies, regulatory frameworks and industry best practices
  • Collaborate with other developer teams to integrate monitoring, automation, and security best practices into AI/ML workflows
  • Drive continuous improvement in platform operations through automation, observability, and operational excellence initiatives

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 1-2 years of experience in cloud administration and/or operations. 
  • Expertise in Azure operations and monitoring services including Azure Monitor, Log Analytics, Application Insights
  • Proficiency in infrastructure-as-code (Terraform, Bicep, ARM) and automation scripting (PowerShell, Python)
  • Familiarity with AI/ML infrastructure (AKS, GPU VMs, data pipelines, model hosting) and their operational demands
  • Excellent problem-solving, communication, and leadership skills, especially in high-pressure incident scenarios
  • Forward thinking ability to identify possible failure scenarios and formulate effective response plans