Domino Data Lab
Who we are At Domino Data Lab, we have an ambitious vision for data science. Our platform helps data science teams accelerate research, increase collaboration, and rapidly deploy predictive models.
Our customers are the most sophisticated analytical organizations in the world, including companies like Bristol Myers Squibb, Allstate, Bayer, and Red Hat. Backed by Sequoia Capital, Coatue Management, Bloomberg Beta, and Zetta Venture Partners, we are at the epicenter of the data science revolution, helping companies develop the next breakthrough in medicine, build better cars, or recommend the best song play next. What we are building
The Customer Reliability Engineering team is focused on making sure that our customers have a performant and reliable experience on Domino. We take SRE principles and apply them directly on our customer-managed deployments of our product. As a senior engineer in our customer-facing organization, you will help clients govern their infrastructure to maximize uptime while also serving as a subject matter expert within the team.
What your impact will be Your work directly assists our largest customers in producing the next generation of AI products. They rely heavily on our product to perform in a smooth and stable manner and this is not currently possible without the involvement of the Customer Reliability Engineering team.
You will be responsible for production deployments of our product, which run on a variety of infrastructure and involve a growing number of technical components. Ensuring stability involves building out our observability systems, automation, while also optimizing existing deployments and outage response. What we look for in this role Experience with managing cloud environments (AWS, GCP, Azure) Strong coding ability (Python, Bash, Go) Systems fluency (Linux, storage,
networking) Experience with container management (Kubernetes, Docker, EKS) Observability systems (New Relic, Prometheus, Grafana) Infrastructure and configuration automation (Terraform) Operating stacks based on modern software components (ex. MongoDB, RabbitMQ, Redis, ElasticSearch, PostgreSQL) Customer Focus: The Customer Reliability Engineering team (CRE) improves the Domino experience for our most valuable customers.
CRE, in collaboration with other customer-facing teams, leads urgent, coordinated responses to large-scale, customer-facing production issues. This involves: Incident Response - Investigating unexpected loss of Domino functionality Broader Deployment challenges - We investigate when multiple,
deep technical issues on a deployment threaten to negatively impact a customers experience Comprehensive technical health checks - We inspect customers deployments to ensure they are configured properly based on their usage patterns CRE partners with the larger Reliability Engineering team, in addition to Dominos Engineering pods, to act as a center of excellence for any urgent investigations What we value
We value a growth mindset. High-performing creative individuals who dig into problems and see the opportunities for success We believe in individuals who seek truth and speak the truth and can be their whole selves at work.
We value all of you that believe improving is always possible At Domino Everything is a work in progress we can do better at everything.
We emphasize an environment of teaching and learning to equip employees with the tools needed to be successful in their function and the company. We strongly believe in the value of growing a diverse team and encourage people of all backgrounds, genders, ethnicities, abilities, and sexual orientations to apply
Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.
Putting people first, every day BDO is a firm built on a foundation of positive relationships with our people and our clients. Each day, our professionals provide exceptional service, helping clients [...]
Site Reliability Engineer – Software Engineering OneApp & OneWeb-2 The opportunity For our TECH NL Omnichannel department, we are looking for a Site Reliability Engineer with experience in rollin [...]
Interact provides enterprise-grade intranet software that connects over three million employees to leading global names like Levi's, Domino's, Teva Pharmaceuticals, and Sony PlayStation. Our team o [...]
Job#: 1273538 : If anything stands out please send your resumes to [email protected], thank you!! This is a remote position with one of our largest service providers!!! What You'll Do: - Improve [...]