19 Apr
|
ThoughtStorm
|
Toronto
19 Apr
ThoughtStorm
Toronto
Apply on Kit Job: kitjob.ca/job/2g9ato
The AIOps Engineer is responsible for architecting, provisioning, and operationalizing multi-workplace AI platforms on Google Cloud (Sandbox, Dev, Prod). The role includes cloud environment setup, IAM governance, CI/CD pipeline development, AIOps automation, drift detection, lifecycle process design, documentation, and alignment with broader enterprise platforms. Responsibilities
Conduct workshops to gather GCP environment requirements. Design cloud architecture including VPC, IAM, subnetting, quotas, endpoints, and security controls. Lead the provisioning of Sandbox, Dev, and Prod GCP projects using Terraform. Oversee API enablement, configuration, and validation testing. Role Definitions IAM Governance
Define IAM roles for AI platform users (Owner, Support, ML Engineer, Viewer). Create IAM matrices, RACI charts, and detailed access control documentation. Ensure least-privilege access policies across Vertex AI and GCP services. Coordinate reviews and approvals with security and architecture teams. AIOps Framework Development
Design and implement drift detection, anomaly monitoring, canary releases, automated rollback, and observability components. Build reusable CI/CD pipelines using Vertex Pipelines and Cloud Build. Develop SOPs, diagrams, runbooks,
and the full AIOps operations playbook. Execute and validate synthetic drift, monitoring, and pipeline test scenarios. Lifecycle Processes
Define the complete ML lifecycle from environment setup through deployment, monitoring, retraining triggers, and retirement. Integrate lifecycle processes within CI/CD and AIOps automation. Document all lifecycle flows in Confluence and conduct validation sessions. Develop team structure, roles, and support plans. Build cost and usage models using GCP calculators and automation scripts. Prepare development and production usage forecasts and long-term TCO estimates. Core Technical Skills
Deep experience with Terraform and Infrastructure as Code workflows. Practical experience with AIOps and MLOps frameworks. Proficient in Python for automation and monitoring jobs. Experience designing and operating CI/CD pipelines for ML workloads. Knowledge of observability tools such as Cloud Monitoring, Logging, and OpenTelemetry. Preferred Qualifications
GCP Professional ML Engineer or Cloud Architect certification. Experience with Looker or other operational dashboards.
#J-18808-Ljbffr
Apply on Kit Job: kitjob.ca/job/2g9ato
📌 AI/ML Engineer (Toronto)
🏢 ThoughtStorm
📍 Toronto