Site Reliability Engineer

South Jakarta, Jakarta
Work Type: Full Time

At Sampingan, we’re passionate about building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand our customer deployments, we are currently seeking an experienced SRE to deliver insights from massive scale data in real time. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.

What you will do:

  • Becoming an on-call rotation to respond to Sampingan services availability incidents and provide support for service engineers with customer incidents;
  • Using your on-call shift to prevent incidents from ever happening;Run our infrastructure with Ansible, Terraform and AWS ECS;
  • Making a monitoring and alerting alert on symptoms and not on outages;Document every action so your findings turn into repeatable actions–and then into
  • Building software and systems to manage platform infrastructure and applications;
  • Improving the deployment process to make it as boring as possible;
  • Measuring and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve;
  • Designing, build and maintain core infrastructure pieces that allow Sampingan scaling to support hundred of thousands of concurrent users;
  • Debugging production issues across services and levels of the stack;
  • Planning the growth of Sampingan's infrastructure.

What you need to have:

  • Bachelor’s degree in computer science or other highly technical, scientific discipline;
  • Think about systems - edge cases, failure modes, behaviors, specific implementations;
  • Know your way around Linux, Unix System Internal and the Unix Shell;
  • Know what is the use of config management systems like Ansible (the one we use);
  • Have strong programming skills - Python, Go and/or Rust;
  • Have an urge to collaborate and communicate asynchronously;
  • Have an urge to document all the things so you don't need to learn the same thing twice;
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it;
  • Have an urge for delivering quickly and iterating fast;
  • Have experience with AWS, Nginx, PostgreSQL, MySQL, S3, Kafka, Grafana, Prometheus, InfluxDB, Terraform, Docker, Kubernetes.

Submit Your Application

You have successfully applied
  • You have errors in applying