Responsibilities
Own and evolve Datadog-based observability platform—collection, pipelines, analytics, alerting, dashboards, and SLOs—to deliver real-time visibility and faster incident response.
As a secondary capability, apply asset-discovery knowledge to publish high-quality discovery feeds and support the CMDB team with accurate, timely inventory data. This is a Tier-0 role (admin by FTE only).
1) Datadog platform engineering (primary)
- Operate Datadog orgs/projects, RBAC, log pipelines/indexes/archives, metrics, traces/APM, Synthetics, RUM, and DBM at enterprise scale.
- Drive tagging standards and ownership metadata to enable service-aligned dashboards and alert routing.
- Optimize cost/performance (sampling, routing, tiering/archives, retention, metric cardinality).
2) Monitoring-as-Code (MaC) & CI/CD (primary)
- Define monitors, dashboards, SLOs, synthetics, notebooks, service catalog entries, and RBAC as code using Terraform/OpenTofu (Datadog provider) and datadog-ci.
- Build gated pipelines: linting, query/unit tests, cost/volume guardrails, PII/residency checks, drift detection, and promotion (dev → staging → prod) with automated rollback.
- Maintain change evidence (who/what/when), versioning, and approvals; rotate tokens/secrets via vault.
3) Telemetry ingestion & data quality (primary)
- Engineer unified ingest via Datadog Agent, APIs, and gateways; integrate OpenTelemetry where appropriate.
- Enforce schema contracts and mandatory tags (e.g., service, env, tier, owner, cost_center); implement validation, deduplication, lineage, and freshness checks.
4) Asset discovery support for CMDB (secondary)
- Apply discovery expertise across datacenter/VM, containers/K8s, multi-cloud (AWS/Azure/GCP), network devices, endpoints, and key SaaS.
- Publish curated discovery feeds (coverage, freshness, deltas) and support reconciliation/exception workflows.