17 Apr
|
Aarorn Technologies
|
Toronto
17 Apr
Aarorn Technologies
Toronto
Apply on Kit Job: kitjob.ca/job/2fs8qv
Overview
Job Title: Senior Production Support Engineer
Location: Toronto, ON (4x onsite a week)
Employment Type: Contract
Pay Rate: CAD$45 - $50/HR INC
Interview Type: Face 2 Face (Onsite Interview Only)
Job Description
Responsibilities
- Toil Removal & Infrastructure Maintenance (15%)
- Execute SSL/TLS certificate updates and renewals across production environments
- Perform Windows and Linux server patching and security updates
- Manage NPID password updates and credential rotation protocols
- Implement security vulnerability remediation in production systems
- Identify, document, and eliminate repetitive manual operational tasks
- Infrastructure & Database Cluster Management (20%)
- Manage and support Elasticsearch cluster operations (deployment, scaling, monitoring, troubleshooting, performance tuning)
- Administer MongoDB clusters including replication, sharding, backup, recovery, and maintenance
- Operate and maintain Redis instances for caching and session management
- Monitor cluster health, capacity planning, and optimization
- Execute failover and disaster recovery procedures
- Ensure data integrity and backup compliance
- Automation & SRE Activities (15%)
- Develop, maintain, and enhance Ansible playbooks for infrastructure automation
- Build infrastructure-as-code solutions to reduce manual intervention
- Create and maintain comprehensive runbooks and operational playbooks
- Design monitoring, alerting, and observability solutions
- Implement automated remediation for common operational issues
- Quantify and prioritize toil reduction opportunities
- Production Application Support (50%)
- Troubleshoot and resolve production incidents affecting digital applications
- Collaborate with application development and support teams on issue diagnosis
- Participate in incident response, root cause analysis, and post-mortems
- Monitor and respond to application performance degradation
Technical Requirements
Required Expertise (Must-Have)
- Ansible 2+ years hands-on experience writing playbooks roles and automation workflows
- Elasticsearch 2+ years managing and troubleshooting Elasticsearch clusters in production
- MongoDB 2+ years with replica sets sharding backup recovery and performance tuning
- Redis Proficiency in deployment configuration and operational support
- OpenShift Experience deploying and managing containerized applications on OpenShift
- Azure Knowledge of Azure cloud services resource management and deployments
- Linux Administration 3+ years with RHEL CentOS or Ubuntu in production environments
- Windows Server Administration Experience with patching certificate management and maintenance
- Shell Scripting Bash scripting for automation and operational tasks
- Incident Management Experience responding to and resolving critical production incidents
Preferred Skills
- Kubernetes or container orchestration platforms
- Python or Go scripting for automation
- CI/CD pipeline experience Jenkins GitLab CI Azure DevOps
- Monitoring and observability tools Prometheus Grafana ELK Stack Datadog
- Infrastructure-as-Code tools Terraform CloudFormation
- Security best practices and vulnerability management
- Relevant certifications AZ-900 CKA Elasticsearch etc
Required Qualifications
- Minimum 5 years of production infrastructure support or SRE experience
- Minimum 3 years with at least 2 of the core technologies Elasticsearch MongoDB Ansible OpenShift
- Experience working in regulated financial services environment preferred
- Ability to work independently and in teams
- Robust troubleshooting and analytical capabilities
- Excellent documentation and communication skills
- Must be available for on-call support rotation with reasonable notice
Operational Expectations
- On-Call Rotation Participates in production support on-call schedule
- Incident Response Available for critical incident resolution outside standard business hours as required
- Availability Core business hours + flexibility for critical production issues
- Response Time First response to critical incidents within 30 minutes
- Documentation Maintains detailed runbooks playbooks and knowledge base articles
- Collaboration Regular communication with infrastructure development and operations teams
Disclaimer: AI tools may assist in the recruitment process; however, all hiring decisions are made by the recruitment team based on a comprehensive evaluation of candidates.
#J-18808-Ljbffr
Apply on Kit Job: kitjob.ca/job/2fs8qv
📌 Senior Production Support Engineer (Toronto)
🏢 Aarorn Technologies
📍 Toronto