Reliability Engineer

Systems Engineering Solutions Corporation
Boston, MA

This role supports the U.S. Air Force Cloud One Architecture and Common Shared Services contract and currently has an opening for a  Reliability Engineer . The Reliability Engineer is responsible for ensuring the availability, performance, scalability, and resiliency of mission‑critical systems. This role applies software engineering principles to infrastructure and operations, with a strong emphasis on automation, monitoring, incident response, and continuous reliability improvement. The reliability engineer serves as the bridge between development, operations, and platform teams to ensure production systems consistently meet defined service level objectives (SLOs) while supporting rapid, safe delivery of new capabilities.

 

 

Location: This position will be hybrid remote. Candidates will be required to work onsite as needed. Candidates preferred to be located near Hanscom AFB (Boston, MA).

Requirements

System Reliability & Availability

  • Design, implement, and maintain highly available, fault-tolerant systems in cloud and hybrid environments
  • Define, measure, and report Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets
  • Identify reliability risks and implement mitigation strategies across the system lifecycle
  • Conduct capacity planning and performance modeling to ensure systems scale to meet demand

Monitoring, Observability & Alerting

  • Implement and manage monitoring, logging, and tracing solutions to provide full system observability
  • Define actionable alerting thresholds that minimize noise and enable rapid incident detection
  • Analyze trends and metrics to proactively identify potential reliability issues

Incident Response & Problem Management

  • Participate in on‑call rotations and lead incident response activities for production systems
  • Coordinate troubleshooting efforts across development, infrastructure, and security teams
  • Conduct post‑incident reviews (PIRs) and develop corrective and preventive action plans
  • Track recurring issues and ensure root causes are resolved

Automation & Engineering Excellence

  • Automate operational tasks to reduce manual intervention and operational risk
  • Develop scripts, tools, and services that improve system reliability and reduce mean time to recovery (MTTR)
  • Promote “automation over toil” and standardize operational workflows

Reliability‑Focused Engineering

  • Participate in architecture and design reviews with an emphasis on reliability, resiliency, and recoverability
  • Validate disaster recovery (DR) and business continuity plans; test failover mechanisms
  • Support chaos engineering, fault injection testing, and resilience validation where appropriate

Collaboration & Governance

  • Partner with DevOps, Platform, and Security teams to ensure reliability aligns with delivery and compliance objectives
  • Document system reliability standards, runbooks, and operational procedures
  • Support compliance and audit activities (e.g., FedRAMP, FISMA, internal operational controls)

 

Required Skills:

· Bachelors and eight (8) years or more of experience; Masters and six (6) years or more of experience. Additional experience may be accepted in lieu of degree.

· Active Secret clearance at a minimum required to start

· US citizenship required 

· Experience with cloud platforms (AWS, Azure, OCI, or GCP), including managed services

· Experience with containerized environments (Docker, Kubernetes)

· Familiarity with CI/CD pipelines and deployment automation

· SLOs and error budgets

· Capacity modeling and performance testing

· Strong understanding of:

· Distributed systems and high‑availability architectures

· Linux/Windows system administration

· Networking fundamentals (DNS, TCP/IP, load balancing)

· Hands-on experience with:

· Monitoring and observability tools (e.g., Prometheus, Grafana, ELK/Elastic, Datadog, Azure Monitor)

· Infrastructure as Code (Terraform, ARM, CloudFormation)

· Scripting or programming languages (Python, Bash, Go, PowerShell, or similar)

· Experience supporting incident management and on‑call operations

 

Preferred Skills

  • Experience with USAF Cloud One or Platform 1.
  • Experience with Zero Trust Architecture
  • Cloud certifications in AWS, Azure, Google, or Oracle clouds

Benefits

SES provides a competitive salary and the following benefits:

  • Medical
  • Dental
  • Vision
  • AD&D
  • STD
  • LTD
  • Company paid Life Insurance
  • 401k with employer contribution
  • Paid Time Off
  • Pet Insurance
Posted 2026-03-31

Recommended Jobs

Administrative Assistant

Prove Them Wrong (It's Bigger Than Basketball)
Brewster, MA

Looking for a driven administrative assistant who wants to join our dynamic team. Our diverse group of staff lead Elite Skills Training, School Programming, and Town Partnerships in efforts to build …

View Details
Posted 2026-05-08

Air and Ocean Import agent - Freight Forwarding

Global Scouting Group
Boston, MA

Position Summary   The Import Agent is responsible for "breaking apart" consolidated freight shipments imported into the US via ocean or air carriers. The Import Agent is engaged in the prepara…

View Details
Posted 2026-03-24

Staff Accountant

Century Mechanical Holdings
Harwich Port, MA

Job Description Job Description Benefits: ~401(k) matching ~ Competitive salary ~ Dental insurance ~ Health insurance ~ Opportunity for advancement ~ Paid time off ~ Training & deve…

View Details
Posted 2026-05-10

APP: Pediatric and Neonatal ICU Physician's Assistant/Nurse Practitioner

Boston Medical Center
Boston, MA

Position Title: APP:  Pediatric and Neonatal ICU Physician's Assistant/Nurse Practitioner Summary/About the Role:  Boston Medical Center Department of Pediatrics is seeking  Physician Assistant…

View Details
Posted 2026-04-27

HVAC Install Lead Technician

Hurley & David
Springfield, MA

Job Description Job Description Are you HVAC Install expert who is passionate about problem solving and customer service? Looking to accelerate your career (and income!) with an organization who …

View Details
Posted 2026-04-11

QC Chemist III - MDI

Cipla
Fall River, MA

NOTICE: The posting for local applicants only - is not for those applying for a global assignment and/or for employees working outside of Cipla's U.S. Subsidiaries or Affiliates. Job Title : Qu…

View Details
Posted 2026-04-18

Experienced Auto Body Repair Technician (2nd Shift) - $4,000 Bonus

Carvana
Marlborough, MA

Job Description Job Description We're looking for Autobody Preppers with a minimum of 3 years of professional automotive bodywork experience to join us at Carvana - the fastest-growing used aut…

View Details
Posted 2026-04-16

Server

Gyu-Kaku Japanese BBQ
Brookline, MA

Job Description Job Description Benefits: Employee discounts Opportunity for advancement Training & development Title: Server Employment Type: Hourly Job Description: The Ser…

View Details
Posted 2026-04-04

Business Development Manager - Licensing

Biocytogen
Waltham, MA

Biocytogen Boston Corp is a fast-growing biotech company at the forefront of innovation, equipped with broad cutting-edge technologies. Our expertise spans a wide range of therapeutic areas, includin…

View Details
Posted 2026-05-05

Auto Damage Appraiser

WENZELS AUTO BODY INC
Pocasset, MA

Job Description Job Description Description: Wenzel's Auto Body is a busy auto body shop looking for an auto appraiser to join our growing team. Must be detail oriented and a hard worker. Top pa…

View Details
Posted 2026-03-29