SPS Commerce

  • Site Reliability Engineer

    Job Locations US-MN-Minneapolis
    Posted Date 2 weeks ago(2 weeks ago)
    Job ID
    2018-3231
    Category
    Operations
    # of Openings
    1
    Job Type
    Regular
    FTE Status
    Full-time
  • Description

    Are you in search of a role where you will work within a fast paced and collaborative environment? We are looking for a Site Reliability Engineer who will partner with development to deliver market leading products and services. The (SRE) team is responsible for delivering highly available platform services and deployment automation that empower our product engineering teams with services that are secure, reliable, cost effective, and foster a high rate of velocity.

    Why Join SPS?

     

    You’ll work alongside talented and enthusiastic professionals who embrace the world of technology. Become a part of the largest retail driven and Omni channel focused community that has gained the trust of 70,000 customers globally. We lead Retail’s transformation to the Digital Retail era through providing retailers with a set of bleeding-edge technology solutions designed to cover all retail challenges and opportunities of today.

    The Day-to-Day

    • Maintain highly available, secure, and cost-effective container orchestration platforms such as Kubernetes and ECS
    • Engineer Continuous Integration & Continuous Delivery (CI/CD) solutions that simplify and improve software deployments to enable high velocity for our Product Engineering partners
    • Develop robust monitoring and observability services and patterns to consistently improve the team’s ability to identify, react, respond, and recover from complex failures.
    • Collaborate with Technology Engineering, Development, and Product Management to help develop, scale, and improve production systems and services
    • Partner with service teams to provide appropriate documentation, cross-training, architecture planning, capacity management, and recommendations for future state
    • Engineer effective ways to cope with failures that may occur
    • Participate in an On-Call rotation to support the availability demands of SPS services
    • Engineer technical solutions to prevent or reduce the frequency of failures
    • Consistently demonstrate superior problem solving and collaboration skills
    • Collaborate with various technology teams to ensure the designs of new systems will have a high rate of reliability and dependability

     What experience and skills do I need?

    • College Degree or equivalent years of experience
    • 2 or more years of experience in the Information Technology field
    • Software Engineering mindset with experience in python and/or golang preferred
    • Experience:
      • administering Linux
      • in Agile development methodology and task execution
      • with immutable and scalable infrastructure (infrastructure as code concepts)
    • Understanding of networking systems
    • Demonstrated understanding of various identity and authorization systems

    Extra credit for the following:

    • Experience building or operating CI/CD pipelines or other deployment automation solutions
    • Interest in platform and service mesh technologies such as Docker, Kubernetes, Istio, Envoy, Consul, ECS, Mesos/Marathon, etc.
    • Experience with Amazon Web Services including EC2, RDS, Dynamo DB, Route53, Elastic Load Balancers, AMIs, IAM Roles, Ops Works, and Cloud Formation
    • A background with advanced monitoring solutions such as metrics platforms, logging, distributed tracing, etc.

     

    Options

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed