Senior Site Reliability Engineer
Location: United States
Description
Senior Site Reliability Engineer
Location: Remote; Hybrid to Wayne, PA; Hybrid to Naperville, IL
How You’ll Contribute to Our Mission
Frontline Education seeks a senior-level Site Reliability Engineer (SRE) to ensure the reliability, availability and performance of our production services. The role aligns with our SRE responsibilities—proactive reliability improvements, automation of operational workflows, observability, and incident management—while requiring deep expertise in Windows environments. You will work across platform, application and support teams to drive operational excellence in a large-scale, enterprise environment and embody our core working agreements of ownership, feedback, initiative and consistency.
How You’ll Drive Success
- Ensure service reliability: Maintain uptime and performance of production services. Design and implement high availability and failover strategies for critical workloads
- Automation and infrastructure as code: Build and maintain automated workflows for provisioning, configuration and patching of systems using Terraform, Ansible and scripting (e.g., PowerShell). Use infrastructure‑as‑code and configuration management to reduce manual toil and ensure consistency
- Monitoring and observability: Implement and tune monitoring, tracing and logging using tools like Dynatrace, Nagios and log aggregation platforms (e.g., ELK or Splunk). Ensure telemetry provides actionable insights and supports data‑driven incident detection and response
- Incident response and root cause analysis: Participate in a senior on‑call rotation, lead incident response, perform thorough root cause analysis and drive permanent fixes. Conduct blameless post‑mortems and share learnings to prevent recurrence.
- Reliability improvements: Identify systemic issues, define service level indicators and objectives, and lead projects to improve reliability, reduce technical debt and automate manual processes
- Process and documentation: Adhere to ITIL processes for incident, problem and change management; provide clear runbooks, knowledge base articles and documentation. Participate in change advisory boards (CABs) to assess reliability impact of proposed changes.
- Collaboration and mentorship: Work closely with other SREs, DevOps engineers and developers to set reliability standards, influence architectural decisions and mentor junior engineers. Foster a culture of ownership, feedback, initiative and consistency across teams.
What You Bring to Help Us Grow
- Senior-level systems expertise: 7+ years of experience supporting enterprise Windows Server environments (2016/2019/2022) with working knowledge of Linux. Experience should include configuration, performance tuning, patch management and troubleshooting at scale.
- Automation and IaC skills: Hands-on experience with Terraform and Ansible for infrastructure provisioning and configuration management; proficiency in PowerShell and scripting for automation
- Observability: Experience implementing and using monitoring and logging platforms for Windows environments (e.g., Dynatrace, Nagios, Splunk/ELK) and designing alerts and dashboards that drive action
- Enterprise and ITIL experience: Proven track record working in large-scale, enterprise environments with strict uptime requirements. Familiarity with ITIL processes (incident, change, problem management) and participation in CABs.
- Containers and cloud: Expertise with containerization and Kubernetes/EKS/ECS
- Service Mesh: Experience operating or supporting service mesh and API gateway platforms to improve resiliency, security, and observability in distributed systems; hands-on experience with Kong Service Mesh and Kong Gateway is a plus.
- Collaboration and communication: Strong interpersonal skills and the ability to work across disciplines. Demonstrated experience mentoring peers and championing reliability best practices.
What You Need to Thrive
- Experience with ArgoCD or other GitOps deployment tools (as a consumer, not as an owner).
- Exposure to agentic AI or machine learning for observability and incident prediction.
- Background in DevOps or software development that informs reliability improvements.
Our Mission, Our People, Our Purpose
At Frontline Education, we’re reimagining what’s possible by becoming an AI-first organization, transforming how we think, work, and serve the educators who shape our schools every day. By using AI in thoughtful, practical ways, we’re creating tools that help educators save time, gain insights, and focus more on what matters most — their students.
As part of our team, you’ll be expected and empowered to build and apply AI skillsets that grow with you, because at Frontline Education, technology amplifies what matters most: the human drive to learn, improve, and make a difference.
How We Support Growth, Balance, and Well-Being
• Personalized Time Off: Take time when it’s needed most — whether that’s a family vacation, a reset day, or simply time to rest and refocus.
• Paid Sick Time: Separate, dedicated sick leave to care for yourself or loved ones.
• Volunteer Time Off: Paid time to give back and support causes that matter to you.
• Ten Paid Holidays: Enjoy meaningful moments and traditions throughout the year.
• Our Philosophy: We believe time away from work helps you bring your best self to it.
Continuous Learning and Growth
• World-Class Learning Access: Explore thousands of on-demand courses through platforms like LinkedIn Learning.
• Leadership & Technical Skill Building: Develop new capabilities and chart your own professional path.
• AI Empowerment: Use OpenAI tools to build fluency with emerging technology and harness AI as a creative partner for innovation and problem-solving.
• Tuition Reimbursement: Invest in formal education to advance your skills and career.
• Ongoing Learning Culture: Participate in company-led webinars on AI, inclusion, and industry trends—designed to inspire curiosity and continuous improvement.
Health, Happiness, and Purpose
• Wellness Initiatives: Company-sponsored programs that support physical, mental, and emotional well-being.
• Employee Assistance Program (EAP): Confidential support for you and your family’s needs.
• Comprehensive Benefits: Health and financial benefits that support your happiness and future.
• A Culture That Cares: At Frontline Education, we want every team member to learn, grow, and thrive—personally, professionally, and purposefully.
Compensation & Benefits
Salary Range: The salary range for this position is between $120,000 - $137,000 based on experience, skills, and internal equity.
Includes bonus eligibility, 401(k) match, ESPP, comprehensive health benefits, and tuition reimbursement for eligible coursework.
Inclusion, Belonging, & Equal Opportunity
Frontline Education is an equal opportunity/affirmative action employer. We aspire to have an inclusive workplace and strongly encourage suitably qualified applicants from a wide range of backgrounds to apply and join our team.
Interview Process & Data Privacy
As part of our interview process, Frontline uses video conferencing tools that include photo capture and may include automated transcription features. A screenshot or photo will be taken at the start of the interview for internal identification and record-keeping purposes only, and transcription may be used to support notetaking and evaluation consistency. These materials are used solely by our recruiting and hiring teams, stored securely, and not shared outside the hiring process. Candidates may opt out of the transcription at any time by notifying their recruiter in advance. Frontline processes this information in accordance with applicable data privacy laws and only for legitimate business purposes related to recruitment and hiring.
Our Privacy Policy: Your privacy is important to us. Click
here to read our general Privacy Statement, and click
here to read our Applicant Privacy Statement