19 Apr
|
Cloudious
|
Toronto
Apply on Kit Job: kitjob.ca/job/2g8pkl
Required Skills & Experience
Core Technical Skills
- Hands‑on experience with:
- Dynatrace (including Davis AI)
- Splunk (ITSI, Machine Learning Toolkit preferred)
- Moogsoft AIOps
- PagerDuty
- Ansible
- Git & GitHub Actions
- Python scripting
AI Ops & Automation
- Experience leveraging AI/ML features within observability and incident‑management tools.
- Ability to design automated workflows that use AI insights for:
- Event correlation
- Predictive alerting
- Automated remediation
- Intelligent routing
SRE Expertise
- Strong understanding of distributed systems, cloud infrastructure, and reliability engineering.
- Experience with SLO/SLI design, error budgets, and performance optimization.
- Familiarity with containerized environments (Kubernetes, Docker) is a plus.
Soft Skills
- Robust analytical mindset with a passion for automation and continuous improvement.
- Excellent communication and cross‑team collaboration abilities.
- Ability to translate operational challenges into scalable engineering solutions.
Preferred Qualifications
- Experience with cloud platform Redhat Openshift
- Exposure to LLM‑based automation or generative AI for operational workflows.
- Background in building or integrating with ChatOps frameworks.
Knowledge of event‑driven architectures and message queues
Apply on Kit Job: kitjob.ca/job/2g8pkl
📌 Site Reliability Engineer / AI OPS Engineer - toronto
🏢 Cloudious
📍 Toronto