Site Reliability Engineer (SRE)

Job Type Permanent
Job ID 17066
Salary $0.00
Job Description

Title: Site Reliability Engineer (SRE)

Location: Kitchener-Waterloo, ON

Type: Full Time Permanent

Base Salary: TBD

Benefits: Dental Care, Vision Care, Disability Insurance, Extended Health Care, Life Insurance, Wellness Program, Company Events, Flexible Schedule, Paid Time Off, Casual

Remote; yes


Our client develops machine learning and artificial intelligence (ML/A) solutions for automotive and manufacturing industries. 


Job Summary

We are passionate about building software that solves real world challenges in the manufacturing industry. We depend on our site reliability engineering team to empower our users with a highly performant, highly available platform with a rich feature set to augment their own quality control processes.


We are seeking an experienced SRE to help deliver excellence through our software development and software delivery processes. Specifically, we are searching for someone who brings fresh and creative ideas, demonstrates a unique and informed point of view, and enjoys collaboration with a cross-functional team to develop real-world solutions and deliver positive, measurable user experiences at every interaction.



  • Manage our Production, Development and other environments and infrastructure through monitoring, automation, and other methods
  • Create, manage, and operate CI/CD Pipelines across a suite of services and applications
  • Drive improvements in reliability, quality, and time-to-market for our various platforms
  • Measure and optimize system performance, continually contributing to innovation and improvement
  • Provide primary operational support and engineering for multiple distributed software components
  • Contribute to the improvement of our development, testing, and deployment processes

 Daily and Monthly Responsibilities

  • Partner with development teams to improve CI/CD services with a focus on developer enablement and successful deployments
  • Gather and analyze metrics to assist in performance tuning and fault finding
  • Participate in system design consulting, platform management, and capacity planning discussions
  • Crate sustainable systems and services through automation
  • Balance feature development and delivery with reliability and user experience Job Title: Site Reliability Engineer Department: SRE Reports to: SRE Director
  • Participate in an on-call rotation to provide rapid response to critical issues in production

 Required Skills and Qualifications

  • Bachelor’s degree in Computer Science or other highly technical, scientific, or engineering discipline
  • Ability to program (structured and OO) with one or more high-level languages such as Python, Java, Ruby, JavaScript
  • Ability to work in a fast-paced agile environment 
  • A passion for identifying reusable patterns and automating them
  • An appropriate respect for Best Practices, Documentation, and Process
  • Experience in a disciplined production environment with good knowledge of Azure cloud services
  • Experience in deployment and developer workflows using Docker and Kubernetes 
  • Deployment, logging, monitoring, security, and automatic failover experience with container orchestration platforms on Azure, AWS, or GCP
  • Experience in microservices architecture and service mesh 
  • Hands-on expertise in configuration management and infrastructure deployment tools like Terraform, etc. 
  • Detail-oriented with excellent analytical skills 
  • Flexibility to adjust to changing priorities, requirements, and schedules.


  • 5+ Years of work experience across multiple roles with a demonstrated strong understanding of SRE/DevOps and service management principles

 Interview Process:

  • 3 rounds of virtual interviews

If you have the skills and experience that we are looking for to be successful in this role, please submit your resume to Masood Noor by email at

Apply Online