Machine Learning / Data Pipeline Engineer

Telecommute · San Francisco, California, United States · Engineering

Description

About Us

Vorstella is an AI platform that automatically manages large scale distributed systems like Cassandra, Hadoop, Spark and Kafka for large companies. Founded by ex-DataStax engineers, we’ve designed some of the largest distributed system deployments in the world and we wanted to make this technology accessible to everyone. You shouldn’t need 3 years of experience to feel comfortable running a new system at scale. We take the guesswork out of using new technology and let you focus on building your applications.

Who we're looking for

We’re looking for someone that is probably 60% engineer, 40% machine learning. A little more street fighter, a little less ivory tower. Someone that can write production quality code, solve engineering problems, and knows enough ML to find good-enough solutions. Someone that’s creative and can solve problems without always reaching for the ML hammer. Sometimes we use rules, sometime we use ML, sometimes we need to ask the user better questions.

What you’ll be working on

You’ll be working on the machine learning pipeline and models. We’ve got multiple signals both synthesized and raw being fed into root-cause analysis, database tuning algorithms and cost optimization. These models feed data to the UI/API which presents next best action to the end user. We’re always looking for a solution that gets us to good outcomes as quickly as possible. Sometimes it’s basic, sometimes we’re pushing beyond the boundaries of what’s published.

Our stack

Our deployment target is Docker and Kubernetes. On the frontend we use React/Redux. Back-end services are written in Go, with the machine learning code written in Python. Our continuous integration system is CircleCI, and we use GitHub for all our code. We’re multi-cloud with deployments currently in Google and AWS.

Requirements

Benefits

Apply for this job