Drive operational excellence as a Principal Site Reliability Engineer in a complex cloud setting. Shape and oversee infrastructure to ensure reliability and security across distributed AWS/EKS systems.
This technical leadership role involves taking end-to-end ownership of large-scale infrastructure projects. Collaborate with engineering leaders and customer-focused teams to implement automated incident responses and optimize performance and cost. Ideal candidates will have a strong problem-solving mindset and thrive in high-autonomy situations, aiming to impact customer success positively.
Key Responsibilities:
• Lead cloud infrastructure enhancement initiatives • Manage Kubernetes cluster operations and scaling • Conduct incident response and root cause analysis • Ensure cloud security and compliance protocols • Collaborate on infrastructure design and deployment strategies
Requirements: • Minimum 5 years in SRE or infrastructure roles • Expertise in AWS services and Kubernetes management • Proficient in Terraform for infrastructure automation • Familiarity with CI/CD frameworks and observability tools • Strong networking design knowledge
Use your skills to lead the enhancement of cloud infrastructure and ensure high availability for essential workloads. #J-18808-Ljbffr
Apply on Kit Job: kitjob.ca/job/2g9cgw
📌 Principal Engineer for Cloud Reliability (Toronto)
🏢 Jobgether
📍 Toronto
Reply to this offer
Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.