Observability Engineer –/3 Production Support & Monitoring (SRE) (Toronto)

Observability Engineer –/3 Production Support & Monitoring (SRE) (Toronto)

16 Apr
|
Company
|
Toronto

16 Apr

Company

Toronto

Observability Engineer – Production Support & Monitoring (SRE)

Contract: 6 months (high likelihood of extension)

Location: core downtown Toronto

Hybrid – 2 days onsite

Rate: market rate (looking for the best experience/rate ratio)

Main Deliverables

- Ensure reliability, performance, and capacity of enterprise production platforms
- Own and operate observability and monitoring tooling across infrastructure and applications
- Execute automation and operational hygiene to support roadmap-driven growth

Technical Stack

- Monitoring & Observability: ITRS Geneos (primary), ISINGA / Insignia, Faddom, Corvil, Dynatrace
- Infrastructure: Linux / Unix, VMware, AWS (Cloud Watch)
- Scripting & Automation: Perl, Bash / Shell, Python
- Messaging / Middleware: IBM MQ, Market Data Monitoring
- Databases: SQL-based relational databases (operational support)
- ITSM & Collaboration: Service Now, Microsoft Teams
- Legacy / Transition: SCOM (planned decommissioning)

Must‑Haves

- 5+ years of experience in Production Support, SRE, or Operations Engineering
- Solid, hands-on ITRS Geneos experience in enterprise production environments
- Advanced scripting skills in Perl, Bash/Shell, and Python
- Experience supporting large-scale production environments (hundreds to thousands of servers)
- Strong Linux / Unix systems knowledge
- Experience with enterprise monitoring platforms (Geneos, Dynatrace, Corvil, Faddom)
- Experience with incident and event management using Service Now




- Operational SQL skills for troubleshooting and validation
- Willingness to participate in a defined on-call rotation

Other Requirements

- Experience monitoring infrastructure and applications (CPU, memory, disk, network, processes)
- Experience with capacity planning, trend analysis, and platform scaling
- Familiarity with monitoring integrations: AWS Cloud Watch, VMware, IBM MQ and Synthetic Monitoring, Market Data Monitoring
- Experience integrating alerts with: Service Now, Microsoft Teams, Email and webhook-based notifications
- Exposure to hybrid environments (on‑prem + cloud)

Responsibilities

- Provide L2/L3 production support for business‑critical platforms
- Operate and enhance enterprise monitoring platforms, with Geneos as the core solution
- Perform capacity planning and infrastructure performance analysis
- Develop automation to:

-Execute hygiene routines (log cleanup, validation, health checks)

-Reduce alert noise and manual operational effort

-Support reporting and alert validation

- Configure monitoring for: Infrastructure, applications, APIs, logs, batch jobs, FIX, file watches, databases
- Maintain runbooks, SOPs, and monitoring configuration lifecycle
- Participate in incident response, RCA, and post‑incident remediation
- Support monitoring platform rollouts, onboarding, and gateway scaling
- Improve on-call effectiveness through tuning, automation, and proactive monitoring

📌 Observability Engineer –/3 Production Support & Monitoring (SRE) (Toronto)
🏢 Company
📍 Toronto

Reply to this offer

Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.

Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: observability engineer –/3 production support & monitoring (sre) (toronto) / toronto
Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: observability engineer –/3 production support & monitoring (sre) (toronto) / toronto