Site Reliability Engineer for Cloud Infrastructure and GenAI Systems (Toronto)

17 Apr

apptoza

Toronto

17 Apr

apptoza

Toronto

Join a progressive team as an SRE focusing on cloud infrastructure for GenAI systems. With 8+ years of experience, you’ll drive automation and implement robust monitoring strategies.
This position involves scaling and supporting infrastructure for cutting-edge GenAI applications. You will automate GPU clusters and define crucial SLOs and SLAs while maintaining a strong focus on incident response. Your comprehensive understanding of networking and system engineering will be vital to achieving operational excellence.
Key Responsibilities:
• Scale and automate GPU cluster operations
• Define SLOs and SLAs for system reliability
• Implement monitoring solutions and incident responses
• Optimize cloud infrastructure for performance
• Drive security and disaster recovery initiatives
Requirements:
• Minimum 8 years in Site Reliability Engineering
• Expertise with monitoring tools like Datadog and ELK
• Strong background in networking and systems
• Experience in finance or security regulations a plus
• AI/ML infrastructure knowledge is beneficial
Make an impact by enhancing system reliability and security for advanced GenAI applications.
#J-18808-Ljbffr

📌 Site Reliability Engineer for Cloud Infrastructure and GenAI Systems (Toronto)
🏢 apptoza
📍 Toronto

Reply to this offer

Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.

Toronto SEO Expert: Local, Technical & Content Strategy

05 May

Pankajkumarseo

Toronto

05 May
Pankajkumarseo
Toronto

If you are running a business in Canada’s most competitive market, Toronto, then having the right \n SEO expert in Toronto is no longer an option – it’s a necessity. With thousands of business [...]

Manager, Special Collections Operations (Toronto)

06 May

Toronto Public Library

Toronto

06 May
Toronto Public Library
Toronto

Manager, Special Collections Operations JOB SUMMARY: Oversee the operations, services, and staff of Toronto Public Library’s Special Collections team to ensure effective public access, preservation [...]

GP/Family Physician (Toronto)

06 May

Prospect Health

Toronto

06 May
Prospect Health
Toronto

Home Jobs GP/Family Physician Back to search results Apply Save International GP Team 01423 813451 Linkedin More from International GP Team Back to search results GP/Family Physician J679866 Toronto O [...]

Shelter Support Worker - Evening- YWCA Davenport (Toronto)

07 May

YWCA Toronto

Toronto

07 May
YWCA Toronto
Toronto

Employment Type: Full-Time, Permanent Position Status: Vacant Work Hours: 35 hours per week, Friday to Tuesday 4:00 pm t0 12:00 am (on -call responsibilities; may be required to work outside of regu [...]

Site Reliability Engineer for Cloud Infrastructure and GenAI Systems (Toronto)

Site Reliability Engineer for Cloud Infrastructure and GenAI Systems (Toronto)

Reply to this offer

Toronto SEO Expert: Local, Technical & Content Strategy

Toronto SEO Expert: Local, Technical & Content Strategy

Manager, Special Collections Operations (Toronto)

Manager, Special Collections Operations (Toronto)

Subscribe to this job alert:

Enter Your E-mail address to receive the latest job offers for: site reliability engineer for cloud infrastructure and genai systems (toronto) / toronto

GP/Family Physician (Toronto)

GP/Family Physician (Toronto)

Shelter Support Worker - Evening- YWCA Davenport (Toronto)

Shelter Support Worker - Evening- YWCA Davenport (Toronto)

Subscribe to this job alert:

Enter Your E-mail address to receive the latest job offers for: site reliability engineer for cloud infrastructure and genai systems (toronto) / toronto