Requirements
- Bachelor’s degree in Computer Science, Information Systems, or a related field.
- 5+ years of experience in observability engineering or SRE roles within large-scale distributed systems.
- Deep, hands-on expertise with Datadog, including APM, Logs, Metrics, RUM, and Synthetics.
- Strong proficiency in:
- Infrastructure as Code (IaC): Terraform
- Automation: Python, Bash, or similar scripting languages
- CI/CD pipelines: Jenkins, GitLab, or GitHub Actions
- Strong understanding of monitoring patterns, tracing, and event correlation for complex systems.
- Familiarity with OpenTelemetry and modern observability frameworks.
- Experience supporting multi-cloud environments (AWS, GCP, Azure).
- Familiarity with container orchestration (Kubernetes, ECS) and service mesh observability.
- Understanding of data visualization and analytics for operational reporting.
- Exposure to AI-driven observability enhancements or integration with LLM-based insights (a plus).
- Certification in Datadog, AWS, or GCP is advantageous.
Shortlisted candidates will be offered a 1 Year agency contract employment.