Site Reliability Engineer / AI OPS Engineer - toronto

Site Reliability Engineer / AI OPS Engineer - toronto

19 Apr
|
Cloudious
|
Toronto

19 Apr

Cloudious

Toronto

Required Skills & Experience

Core Technical Skills

- Hands‑on experience with:
- Dynatrace (including Davis AI)
- Splunk (ITSI, Machine Learning Toolkit preferred)
- Moogsoft AIOps
- PagerDuty
- Ansible
- Git & GitHub Actions
- Python scripting

AI Ops & Automation

- Experience leveraging AI/ML features within observability and incident‑management tools.
- Ability to design automated workflows that use AI insights for:
- Event correlation
- Predictive alerting
- Automated remediation
- Intelligent routing

SRE Expertise

- Strong understanding of distributed systems, cloud infrastructure, and reliability engineering.
- Experience with SLO/SLI design, error budgets, and performance optimization.
- Familiarity with containerized environments (Kubernetes, Docker) is a plus.





Soft Skills

- Robust analytical mindset with a passion for automation and continuous improvement.
- Excellent communication and cross‑team collaboration abilities.
- Ability to translate operational challenges into scalable engineering solutions.

Preferred Qualifications

- Experience with cloud platform Redhat Openshift
- Exposure to LLM‑based automation or generative AI for operational workflows.
- Background in building or integrating with ChatOps frameworks.

Knowledge of event‑driven architectures and message queues

📌 Site Reliability Engineer / AI OPS Engineer - toronto
🏢 Cloudious
📍 Toronto

Reply to this offer

Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.

Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: site reliability engineer / ai ops engineer - toronto / toronto
Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: site reliability engineer / ai ops engineer - toronto / toronto