Lead Site Reliability Engineer (Toronto)

17 Apr

Movable Ink

Toronto

17 Apr

Movable Ink

Toronto

Movable Ink scales content personalization for marketers through data-activated content generation and AI decisioning. The world's most cutting-edge brands rely on Movable Ink to maximize revenue, simplify workflow and boost marketing agility. Headquartered in New York City with close to 600 employees, Movable Ink serves its global client base with operations throughout North America, Central America, Europe, Australia, and Japan.As one of our Lead Site Reliability Engineers, you will combine hands‐on technical expertise with strategic technical leadership across infrastructure and software development. You will own the design and evolution of major systems within our multi‐cloud, multi‐region, active‐active content serving platform that serves upwards of 25 Billion requests daily. Through a combination of architectural vision, cross‐team collaboration and mentorship, you will help drive the reliability initiatives and define the technical strategy that scales our platform to 50 Billion requests per day and beyond.ResponsibilitiesDefine and drive the automation strategy for infrastructure tooling, establishing standards that minimize manual work, increase performance and reduce incident frequency and severity of incidentsOwn the design, reliability and evolution of core platform applications, mentoring team members on best practices and ensuring systems meet long‐term business objectivesArchitect and lead the logging platform strategy, driving its design and balancing availability, retention and cost optimizationEstablish capacity planning and performance management frameworks,

proactively identifying scaling opportunities and guiding teams through complex troubleshooting scenariosLead cross‐functional reliability initiatives with SRE and service engineering teams, influencing architectural decisions and championing practices that ensure resilient service deliveryDemonstrate a high level of autonomy in anticipating, identifying, and addressing systemic weaknesses and opportunities for platform improvement without direct supervision.QualificationsProven track record in Site Reliability or Software Engineering, designing, building, and owning scalable, resilient services with a focus on long‐term reliability strategyDeep expertise in architecting and operating complex distributed systems such as Apache Pulsar, Apache Kafka, Grafana Loki, ScyllaDB/Cassandra, with the ability to guide teams through distributed system challengesDesigning and owning automation strategies to manage services at scale, with expertise in establishing performance analysis frameworks and mentoring others on diagnostics and resolutionDeep, hands‐on experience (6+ years) in Site Reliability or Software Engineering, specifically leading and shaping multi‐cloud architecture and strategy (AWS and GCP).Experience architecting and leading large‐scale observability platforms, including defining observability standards and SLO frameworks. We use Prometheus and Thanos with Grafana Alloy, Loki and TempoExperience leading on‐call excellence, including driving improvements to monitoring and alerting strategies,

automating runbooks and mentoring team members on incident response best practices. Every member of the SRE team does a week long on‐call rotationExpert‐level proficiency with infrastructure as code, including defining IaC standards and patterns across teams. We use Terraform and ChefAdvanced Kubernetes expertise, including cluster architecture design, multi‐tenancy strategies, and guiding teams on container orchestration best practices. We use EKS and GKEProficiency in multiple programming languages with the ability to design and review code that meets reliability standards. We use NodeJS, Golang, Ruby, Python and shell scriptingAdvanced Linux systems expertise, with the ability to diagnose complex system‐level issues and mentor others on performance tuning and troubleshootingStudies have shown that women, communities of color, and historically underrepresented people are less likely to apply to jobs unless they meet every single qualification. We are committed to building a diverse and inclusive culture where all Inkers can thrive. If you're excited about the role but don't meet all of the abovementioned qualifications, we encourage you to apply. Our differences bring a breadth of knowledge and perspectives that makes us collectively stronger.We welcome and employ people regardless of race, color, gender identity or expression, religion, genetic information, parental or pregnancy status, national origin, sexual orientation, age, citizenship, marital status, ethnicity, family or marital status, physical and mental ability, political affiliation, disability, Veteran status, or other protected characteristics. We are proud to be an equal opportunity employer.#J-18808-Ljbffr

📌 Lead Site Reliability Engineer (Toronto)
🏢 Movable Ink
📍 Toronto

Reply to this offer

Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.

Toronto SEO Expert: Local, Technical & Content Strategy

05 May

Pankajkumarseo

Toronto

05 May
Pankajkumarseo
Toronto

If you are running a business in Canada’s most competitive market, Toronto, then having the right \n SEO expert in Toronto is no longer an option – it’s a necessity. With thousands of business [...]

Manager, Special Collections Operations (Toronto)

06 May

Toronto Public Library

Toronto

06 May
Toronto Public Library
Toronto

Manager, Special Collections Operations JOB SUMMARY: Oversee the operations, services, and staff of Toronto Public Library’s Special Collections team to ensure effective public access, preservation [...]

GP/Family Physician (Toronto)

06 May

Prospect Health

Toronto

06 May
Prospect Health
Toronto

Home Jobs GP/Family Physician Back to search results Apply Save International GP Team 01423 813451 Linkedin More from International GP Team Back to search results GP/Family Physician J679866 Toronto O [...]

Shelter Support Worker - Evening- YWCA Davenport (Toronto)

07 May

YWCA Toronto

Toronto

07 May
YWCA Toronto
Toronto

Employment Type: Full-Time, Permanent Position Status: Vacant Work Hours: 35 hours per week, Friday to Tuesday 4:00 pm t0 12:00 am (on -call responsibilities; may be required to work outside of regu [...]

Lead Site Reliability Engineer (Toronto)

Lead Site Reliability Engineer (Toronto)

Reply to this offer

Toronto SEO Expert: Local, Technical & Content Strategy

Toronto SEO Expert: Local, Technical & Content Strategy

Manager, Special Collections Operations (Toronto)

Manager, Special Collections Operations (Toronto)

Subscribe to this job alert:

Enter Your E-mail address to receive the latest job offers for: lead site reliability engineer (toronto) / toronto

GP/Family Physician (Toronto)

GP/Family Physician (Toronto)

Shelter Support Worker - Evening- YWCA Davenport (Toronto)

Shelter Support Worker - Evening- YWCA Davenport (Toronto)

Subscribe to this job alert:

Enter Your E-mail address to receive the latest job offers for: lead site reliability engineer (toronto) / toronto