Senior Site Reliability Engineer, Availability Group

InfluxData is a fast-growing Series D startup that offers InfluxDB, a time series database recognized industry-wide as the clear leader of the fastest growing database category.

InfluxDB is running on hundreds of thousands of machines, used by teams worldwide to monitor and understand their systems and devices. The majority of our users are developers and builders using our open source software, but we also offer enterprise and cloud SaaS products, all exposing a single unified API.

We’re looking for a Senior Site Reliability Engineer to join a team responsible for architecting, maintaining, and monitoring our cloud-first SaaS products. We have big plans to expand across numerous cloud platforms – and we need you to help us get there!

Our SRE team is geographically distributed with a home office in San Francisco, CA.*From a remote perspective, we currently support the UK, Germany, Italy and the following US states: AZ, CA, CO, CT, FL, GA, ID, IL, MA, MD, MN, NC, NJ, NY, OK, OH, OR, TX, UT, VA, WA.

What You’ll Be Doing

You’ll be using your operations skill, software crafting abilities, and leadership qualities to maintain and extend foundational technologies to provide highly available, distributed services to customers who rely on InfluxData’s time series platform where metrics and events are first-class citizens.

You’ll join a strong team of SREs as a peer and a leader, bringing extensive experience in designing, implementing, and operating large scale, multi-cloud Kubernetes infrastructure driven by GitOps methodology. You understand that great products require great culture – you design and implement with respect and kindness.

You’ll work across disciplines with engineering teams to facilitate infrastructure and tools to accelerate our customer’s Time to Awesome on the cloud and beyond using cutting-edge technology.

We embrace an empathetic, supportive, and communicative environment pulling from one another’s strengths and perseverance through failure and the resulting lessons learned.

Our SRE Team

  • Encourages engineering-wide on-call duty to respond to availability incidents
  • Uses alerting and telemetry data alongside coding skill to prevent incidents from recurring: alerts are simply gaps in automation that can be fixed
  • Runs a multi-cloud, multi-tenant SaaS platform built on Kubernetes, Terraform, and the complete InfluxData software suite
  • Drives documentation and activity analysis to automate manual actions and reduce the surface area for error
  • Consults with adjacent engineering teams to further promote SRE methodologies and mindset
  • Runs blameless post-mortem analyses on incidents and identifies areas of improvement or optimization — and owns next step implementation

What You’ll Bring

  • 3+ years experience running Kubernetes; bonus points for having CKAD/CKA certification
  • 5+ years experience running production-level workloads on AWS and GCP
  • Strong to intermediate programming skills: you’ve written Go and/or Python code to solve real-world problems at scale
  • Strong asynchronous, remote-first communication skills. Previous remote work experience a plus
  • Self-starting, ambitious, and inclusive: you want to get problems solved and help others follow your lead
  • Open-source software is a passion; consumer or producer, the value of OSS is important to you

InfluxData is the creator of InfluxDB, the leading open source time series database. We are a Series D-funded startup, backed by Sapphire Ventures, Norwest Venture Partners, Mayfield Fund, Trinity Ventures, and Battery Ventures, and a Y Combinator success story. Headquartered in San Francisco, InfluxData’s workforce is distributed throughout the U.S. and across Europe. The company was recently named one of the 50 Best Workplaces for Innovators by Fast Company.

Our technology is purpose-built to handle the massive volumes of time-stamped data produced by IoT devices, applications, networks, containers and computers. We are on a mission to help developers and organizations, such as Cisco, IBM, PayPal, and Tesla, store and analyze real-time data, empowering them to build transformative monitoring, analytics, and IoT applications quicker and to scale.

We offer fantastic benefits; in the US these include:

  • Medical/ dental/vision insurance with 100% coverage for employees and dependents
  • Company contribution to FSA and commuter benefits
  • Open PTO – take the time you need
  • Life lnsurance, short- and long-term disability insurance
  • 401k (non matching)
  • ….and more!

Our Core Values

We Hire And Live By These Core Values

Our employees are the heart of the company and only by having a core set of beliefs and values will we be successful.

  • We value each other
  • We get stuff done
  • We believe humility drives learning
  • We embrace failure
  • We are committed to open source

Visit our careers page to learn more about working at InfluxData.

InfluxData is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status.

InfluxData does not accept unsolicited resumes from headhunters and recruitment agencies through our website, job board or directly to employees. InfluxData will not pay fees to any third-party agency, headhunter or company that does not have a signed agreement for this position in place.

InfluxData

InfluxData is the creator of InfluxDB, the open source time series database. Our technology is purpose-built to handle the massive volumes of time-stamped data produced by IoT devices, applications, networks, containers and computers.

Technology we use

Javascript
Go
MongoDB
Redis
React