Site Reliability Engineer
We are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling, investigating and resolving outdates, identifying and implementing preventive measures proactively, collaborating with key stakeholders, continuously looking for ways to provide real-time visual feedback for all the metrics and statuses.
What you will do:
Proactively build and implement services to make IT and support better at their jobs.
Design and implement dashboard that provide valuable real-time insights of platform key metrics.
Leads engagement with software developers, DevOps and other infrastructure engineers to integrate software development and delivery from inception to full operation, ensuring robust released software and systems.
Optimizing on-call rotations & processes.
Ensure Incidents assigned to the team are being managed within agreed SLAs
Ensure alarms are documented in up to date Knowledge Base Articles.
Conduct pot-incident reviews to identify platform status.
What we’re looking for:
Bachelor’s degree in computer science or equivalent relevant to SR or Automation/development experience.
7+ years’ experience focussed on Site Reliability Engineering or related position in some of the majors Cloud Platforms.
Involved in the automation of multi-tenant systems, preferably in a cloud environment.
Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation;
Ability to explain technical concepts in clear, non-technical language
Experience building Infrastructure-As-Code.
Experience in Docker and Kubernetes and networking concepts.
Experience with Graphana and Prometeus.
Integration experience with Pager-Duty, ServiceNow, Datadog.
Expertise with system and performance monitoring tools (Dynatrace, Splunk, etc.).
Hybrid position based in Mexico City, Monterrey or Guadalajara.
- Client
- Engineering
- Ubicaciones
- Guadalajara
- Estado remoto
- Híbrido
- Cliente
- Techmahindra
Acerca de Valce Talent Solutions
We help our clients enhance their talent attraction capacities, especially in technological profiles.
We constantly innovate and actively seek to find the best solutions for clients and professionals. We understand the needs of our customers and aim to be the industry specialists.
We offer consulting services to technology companies in various areas, including IT, software development, cybersecurity, and project management. Our employees are the reason for the company's existence, and their satisfaction translates into that of our customers.
Site Reliability Engineer
Cargando formulario de solicitud