Be wary of WhatsApp messages impersonating Jobline Resources's staff offering job opportunities. Those who encounter suspicious messages can contact Jobline at +65 6339 7198

Responsibilities

  • Build and maintain runbooks for telemetry onboarding, parsers, and dashboards; contribute improvements via code reviews and documentation.
  • Run short enablement sessions so product squads can self-serve standardized dashboards and apply tagging/SLO standards.
  • Implement and operate log/metric/trace pipelines (agents, processors, parsing, routing, archive) targeting p95 ingest latency ≤ 60s and drop rate ≤ 0.1%.
  • Execute phased Splunk → Datadog migrations with query/dashboard/monitor parity and validation checks.
  • Apply and enforce tag standards (service, env, tier, team, owner_email, cost_center) via IaC/CI.
  • Improve multi-cloud/on-prem discovery to >98% asset coverage; reconcile CIs/relationships; track and reduce CMDB data deltas.
  • Align telemetry tags with the service portfolio/catalog; maintain service maps linking infrastructure to business services.
  • Define and monitor CI data-quality KPIs (staleness, duplicates, orphaned CIs) and drive remediation with owning squads.
  • Partner with SRE to define SLIs/SLOs, burn-rate alerts, and “golden” dashboards (≤15-minute freshness) for critical services.
  • Provide post-incident analytics and feed learnings into instrumentation and configuration hygiene.
  • Deliver infrastructure-as-code (Terraform/Ansible) for agents, pipelines, monitors, and dashboards.
  • Build API/ETL integrations from observability/CMDB into BI platforms (e.g., Power BI/Fabric) for executive reporting.
  • Evaluate lightweight streaming/collector options (e.g., OpenTelemetry/Fluent/“Tool X”) to control cost and enable fan-out where justified.

Requirements

  • Bachelor's/Master's in Computer Science /IT or equivalent practical experience with 5-8 years across Observability / Platform / CMDB engineering with production ownership at scale.
  • Hands-on with Datadog (Logs, APM/RUM, monitors, facets/measures, APIs)
  • Strong in multi-cloud (AWS/Azure/GCP) discovery/inventory and CI reconciliation patterns (tool-agnostic).
  • Scripting (Python/PowerShell), parsing (JSON/grok/regex), APIs; IaC (Terraform/Ansible).
  • Familiar with SRE practices (SLIs/SLOs, error budgets), containers/Kubernetes, and secure RBAC for high-privilege systems.
  • Demonstrated ability to build and guide high-performing, cross-functional teams through clear direction and structured planning.
  • Strong interpersonal skills to collaborate with a diverse set of stakeholders and drive consensus on complex technical decisions.
  • Organized and detail-oriented approach, aligned with delivering consistent, measurable results.