Reliability Engineer

Systems Engineering Solutions Corporation
Boston, MA

This role supports the U.S. Air Force Cloud One Architecture and Common Shared Services contract and currently has an opening for a  Reliability Engineer . The Reliability Engineer is responsible for ensuring the availability, performance, scalability, and resiliency of mission‑critical systems. This role applies software engineering principles to infrastructure and operations, with a strong emphasis on automation, monitoring, incident response, and continuous reliability improvement. The reliability engineer serves as the bridge between development, operations, and platform teams to ensure production systems consistently meet defined service level objectives (SLOs) while supporting rapid, safe delivery of new capabilities.

 

 

Location: This position will be hybrid remote. Candidates will be required to work onsite as needed. Candidates preferred to be located near Hanscom AFB (Boston, MA).

Requirements

System Reliability & Availability

  • Design, implement, and maintain highly available, fault-tolerant systems in cloud and hybrid environments
  • Define, measure, and report Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets
  • Identify reliability risks and implement mitigation strategies across the system lifecycle
  • Conduct capacity planning and performance modeling to ensure systems scale to meet demand

Monitoring, Observability & Alerting

  • Implement and manage monitoring, logging, and tracing solutions to provide full system observability
  • Define actionable alerting thresholds that minimize noise and enable rapid incident detection
  • Analyze trends and metrics to proactively identify potential reliability issues

Incident Response & Problem Management

  • Participate in on‑call rotations and lead incident response activities for production systems
  • Coordinate troubleshooting efforts across development, infrastructure, and security teams
  • Conduct post‑incident reviews (PIRs) and develop corrective and preventive action plans
  • Track recurring issues and ensure root causes are resolved

Automation & Engineering Excellence

  • Automate operational tasks to reduce manual intervention and operational risk
  • Develop scripts, tools, and services that improve system reliability and reduce mean time to recovery (MTTR)
  • Promote “automation over toil” and standardize operational workflows

Reliability‑Focused Engineering

  • Participate in architecture and design reviews with an emphasis on reliability, resiliency, and recoverability
  • Validate disaster recovery (DR) and business continuity plans; test failover mechanisms
  • Support chaos engineering, fault injection testing, and resilience validation where appropriate

Collaboration & Governance

  • Partner with DevOps, Platform, and Security teams to ensure reliability aligns with delivery and compliance objectives
  • Document system reliability standards, runbooks, and operational procedures
  • Support compliance and audit activities (e.g., FedRAMP, FISMA, internal operational controls)

 

Required Skills:

· Bachelors and eight (8) years or more of experience; Masters and six (6) years or more of experience. Additional experience may be accepted in lieu of degree.

· Active Secret clearance at a minimum required to start

· US citizenship required 

· Experience with cloud platforms (AWS, Azure, OCI, or GCP), including managed services

· Experience with containerized environments (Docker, Kubernetes)

· Familiarity with CI/CD pipelines and deployment automation

· SLOs and error budgets

· Capacity modeling and performance testing

· Strong understanding of:

· Distributed systems and high‑availability architectures

· Linux/Windows system administration

· Networking fundamentals (DNS, TCP/IP, load balancing)

· Hands-on experience with:

· Monitoring and observability tools (e.g., Prometheus, Grafana, ELK/Elastic, Datadog, Azure Monitor)

· Infrastructure as Code (Terraform, ARM, CloudFormation)

· Scripting or programming languages (Python, Bash, Go, PowerShell, or similar)

· Experience supporting incident management and on‑call operations

 

Preferred Skills

  • Experience with USAF Cloud One or Platform 1.
  • Experience with Zero Trust Architecture
  • Cloud certifications in AWS, Azure, Google, or Oracle clouds

Benefits

SES provides a competitive salary and the following benefits:

  • Medical
  • Dental
  • Vision
  • AD&D
  • STD
  • LTD
  • Company paid Life Insurance
  • 401k with employer contribution
  • Paid Time Off
  • Pet Insurance
Posted 2026-03-31

Recommended Jobs

Funding Servicer

Kriss Law
Needham, MA

Funding Servicer About the Role We are seeking a reliable and detail-oriented Funding Servicer to join our busy real estate law office in Needham, MA. This role plays a critical part in t…

View Details
Posted 2026-03-24

LEARNING DESIGNER, BU Virtual

Boston University
Boston, MA

Boston University Virtual is a unit at Boston University focused on the creation of high-quality online degree and certificate programs. We are seeking an experienced Learning Designer to work collab…

View Details
Posted 2026-02-26

Fitness Trainer

Planet Fitness
Chicopee, MA

Job Summary The Fitness Trainer will be responsible for running the Planet Fitness group fitness program (PE@PF). This includes assisting new members in the achievement of their fitness goals …

View Details
Posted 2026-01-07

Restaurant Supervisor

Cafe Escadrille
Burlington, MA

Cafe Escadrille is looking for a high-energy and hospitality-driven floor manager/supervisor. This position will assist with the day-to-day tasks of running the floor, overseeing front-of-house …

View Details
Posted 2025-08-12

CDL-A Regional Driver - Home Weekends - $1,400 to $1,900/Wk - No Touch

Blue Ridge Haulers
Kingston, MA

Job Description Job Description If you are a professional CDL-A Driver who shows up ready to work, you’ll find a long-term home here. We don’t just offer a job; we offer a predictable lifestyle w…

View Details
Posted 2026-03-20

ED Clinical Social Worker-MSW- 24 Hrs Evenings

Boston Medical Center
Boston, MA

POSITION SUMMARY : Under the clinical supervision of the Department Director and or Manager (or LCSW / LICSW), utilizes clinical social work techniques and theory to provide consultation, asses…

View Details
Posted 2026-03-31

Project Manager- Marketing (Boston, MA)

CEDENT
Boston, MA

Managing dedicated client relationships and all related client interface in partnership with senior designers; Providing strategic oversight in the development, execution and implementation of brande…

View Details
Posted 2025-09-10

Quality Assurance Specialist

SGS Consulting
Massachusetts

Job Responsibilities: Analyze, design, build, and execute test cases for new and existing functionality. Maintain documentation of test results to assist in debugging and modification of softwa…

View Details
Posted 2025-11-14

Manager, Vacation Package Sales

Wyndham Destinations
Massachusetts

We Put the World on Vacation Travel + Leisure Co. is the world’s leading vacation ownership and travel membership company, with a dynamic and growing portfolio of resort, travel club, and lifestyl…

View Details
Posted 2026-03-18

Senior Accountant

Framingham, MA

Kforce's client, a highly successful New England company with a 30+ years strong record of growth and success, is in search of a Senior Accountant. Overview We are seeking a highly motivated and deta…

View Details
Posted 2026-03-19