Skip to content
← Back to job listings

Site Reliability Engineer

Tasq Staffing Solutions, Inc. · Quezon City, Metro Manila, Philippines

Software DevelopmentSenior LevelQuick applyfull-timeabout 2 hours ago

About The Role

  • At least 5 years of experience with in a similar role, with deep understanding of monitoring and application performance management
  • Bachelor's degree of any specialization
  • Knowledge of SLO platforms (e.g., Nobl9) and experience contributing to standards/governance artifacts.
  • Knowledge of proactive monitoring using Azure monitor services, telemetry, and synthetic transactions.
  • Understanding of network architecture and security: WAN/LAN, TCP/IP, PKI.Internal - General Use
  • Familiarity with ITSM processes and tools (e.g., ServiceNow), and compliance processes
  • Have AIOps vision and awareness
  • Excellent communication skills to drive continuous improvement by reducing alert noise, shorten MTTR, and improve change success by embedding postmortem learnings into patterns, rules, and pipeline:
  • Cloud Observability: Azure Monitor/App Insights/Log Analytics (KQL)
  • Knowledge of Grafana, Prometheus, App Dynamics, ThousandEyes
  • Uses SLI/SLOs, postmortems, and CMDB and other context to reduce noise, drive self-healing, and measurably improve MTTR and KPIs.

Key Responsibilities

  • You will design and define standards, patterns, and automations opportunities that elevate monitoring and reliability across platforms and applications, with a strong focus on Azure Monitor, ServiceNow ITOM Event Management, Grafana, and APM/Synthetics tooling
  • You’ll partner with product teams to implement SLO/SLI-driven operations, reduce alert noise, accelerate incident response, and embed self-healing where it matters most.
  • Engineer enterprise monitoring & event patterns by authoring and maintaining reference architectures, runbooks, and event management models (alert → event → incident) with actionable alerts and incidents routing.
  • Contribute to Monitoring and Observability & Event Management Strategy and tooling intake/governance checkpoints and coach product teamsInternal - General Use
  • Excellent communication skills to drive continuous improvement by reducing alert noise, shorten MTTR, and improve change success by embedding postmortem learnings into patterns, rules, and pipelines.

Technologies and Tools

  • SRE Practices: Observability and Monitoring
  • Cloud Observability: Azure Monitor/App Insights/Log Analytics (KQL)
  • Grafana/Prometheus for metrics visualization where applicable
  • ServiceNow ITOM Event Management
  • Azure Fundamentals, Azure Monitor
  • DevOps and Automation Tools
  • Grafana, Prometheus, App Dynamics, ThousandEyes
  • Application Performance Monitoring and Digital User Experience tools

Additional Details

  • Work set-up: Hybrid 3x / RTO 2x per week
  • Work shift: Nightshift
  • Location: Eton Centris or Ayala, Makati

This listing was posted by a verified recruiter at Tasq Staffing Solutions, Inc.. Report this listing