SRE Engineer (Site Reliability Engineering)

  • New Jersey

At Cellwize, we are on a mission to connect the world with the networks of tomorrow, through our industry-leading, advanced, cloudified, AI-driven RAN automation and orchestration platform.

We are growing fast! And we’re looking for exceptional people to become part of our team as we move ahead rapidly in establishing dominance in the mobile market If you live and breathe innovation, are passionate about building things that change the world, and will settle for nothing less than amazing, then we want you to join our team as our newest SRE Engineer in New Jersey.

In a nutshell

In this role you will be responsible for keeping our services, streaming frameworks, NoSQL/RDBMS databases, and distributed analytical platforms running in multi-cloud environments to deliver unprecedented IT automation and insight into user experiences driven by our AI services over a geographically distributed customer’s network.

What you’ll be doing

  • Build infrastructure as a code using Terraform.
  • Build, create, and enable Kubernetes clusters.
  • Manage and performance tune either databases (NIFI, Elasticsearch) or streaming data pipelines (Kafka).
  • Manage CICD pipelines, configuration, automation tools for infrastructure provisioning.
  • Write and maintain runbooks for knowledge-driven automated processes and bots.
  • Do capacity planning based on performance, usage, and utilization stats.
  • Partner with developers and quality engineering teams to automate the monitoring, alerting, availability, and scalability of our applications and systems.
  • Ensure system availability and business continuity by implementing redundant servers/services.
  • Manage after-hours infrastructure updates and maintenance.
  • Proactively research and propose the use of new concepts, processes, technologies, and tools.
  • Proactive monitoring, diagnosis, on-call rotation, and resolution of issues in a 24×7 of multi-cloud environment (OpenStack), analyze failures, and provide support for software engineers to debug production issues across microservices and distributed platforms.
  • Follow SRE best practices and procedures.

What you’ll bring with you

  • Delivering reliable operations for web-scale infrastructure for a global market at high release velocity.
  • Must have solid experience with at least 1 of the languages: Go, Python.
  • Experience with Kafka, Mesos, Nifi, Elasticsearch, MySQL, Vertica, Zookeeper, Nginx.
  • 5-6 years of industry experience in managing infrastructure.
  • 5 years of Linux administration in a large-scale SaaS environment.
  • 5 years maintaining production systems on AWS and/or OpenStack.
  • 3 years experience in managing Kubernetes in a large-scale production environment.
  • Strong familiarity with running and optimizing RDBs and NoSQL databases.
  • 3 years using infrastructure as code software (e.g., Terraform, AWS and Google Cloud Deployment, CloudFormation).
  • 5 years experience in continuous integration practices & tools (Jenkins).
  • Experience with monitoring solutions such as Prometheus, Grafana, ELK.

Why you’ll love it here

  • You want flexibility, you got it. We get the ‘new normal’ and what it means to work from home, office, car, or a park bench.
  • What about your needs? We’ve got your back – with the right working environment and equipment, and perks that will brighten your every day.
  • We make products that are shaking up the mobile world. Do you want to do big things? Cellwize is where you will do them.
  • Blue chip, world-class investors love us and are investing in our fast track to the future.
  • Our people are rock stars, our culture is not only what makes coming to work every day so much fun, it’s also what’s driving us to spectacular new heights of success.

Cellwize is a champion of inclusion and opportunity. We don’t discriminate based on race, color, religion, age, sex, sexual orientation, and gender identity.