- Vishakha Sadhwani
- Posts
- So You Want to Be an SRE?
So You Want to Be an SRE?
Role, skills, salary, AI impact and a downloadable interview guide!!
Hi Inner Circle,
Welcome to the series, where I’ll walk you down the roles (infra related), skills, responsibilities, salary range, and the portals where you can start looking for these roles.
If you’ve been following the DevOps or Cloud track — this could be your potential next step.
Let’s talk about the role that quietly runs the internet (and keeps it running): SRE (Site Reliability Engineering).
Reliability Means:
At its core, reliability means ensuring that systems and services stay available and performant when your users need them most.
At its core, reliability is about keeping systems available, fast, and consistent when your users need them most.
In day-to-day work, you’ll see SREs focus on:
→ Minimizing downtime ~ tracing cascading failures, restarting services, and recovering systems before users even notice.
→ Reducing errors ~ finding that one misconfigured load balancer or bad rollout that caused a 1% spike in latency.
→ Keeping every release stable ~ building guardrails so new features don’t bring production down.
→ Managing incidents ~ leading war rooms, writing postmortems, and making sure the same failure never happens twice.
→ Balancing speed vs. safety ~ letting teams ship fast while keeping systems predictable under real traffic.
In the AI season, reliability has a new meaning.
It’s no longer just about keeping servers up — it’s about building systems that can watch, learn, and fix themselves.
Modern SREs design:
→ Self-healing infrastructure that detects issues and recovers automatically.
→ Automated observability that connects metrics, logs, and traces without manual setup.
→ Intelligent incident response where AI helps detect patterns, predict failures, and suggest next steps.
Origin of SRE — and Why It Matters
SRE started at Google when Ben Treynor asked a small group of engineers to handle operations as if it were a software problem. This transformed how organizations approached reliability. Rather than relying on manual approvals and endless checklists, SRE turns reliability into code: automated, repeatable, measurable.
Three Levels of SRE Responsibilities
Level 1 — Beginner SRE (Foundations of Reliability)
Focus: If you come from DevOps, shift towards understanding how reliability is measured and how your choices affect real users.
Core Responsibilities:
Understand and give meaning to SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) covering availability, latency, error rate
Set up dashboards and alerts (using tools like Prometheus, Grafana, ELK stack)
Participate in incident response and blameless postmortems
Maintain runbooks and reliability documentation
ANY Production Outage - SREs are the first responders.
Level 2 — Intermediate SRE (Automating Reliability)
Focus: Move from reactively fixing issues to embedding reliability checks in pipelines and automating core ops work.
Core Responsibilities:
Design deployment automation (automated rollbacks, canary releases)
Implement SLI/SLO validation in CI/CD delivery pipelines
Use error budgets to guide release velocity versus stability
Facilitate postmortems and reduce alert fatigue
Level 3 — Advanced SRE (AI-Driven Reliability)
Focus: Shape systems that diagnose and resolve their own failures using AI and advanced automation.
Core Responsibilities:
Here you apply AIOps Practices: architect auto-remediation workflows (event-driven healing)
Apply machine learning for anomaly detection and log analysis
Lead resilience design: failover, capacity planning, chaos testing
Champion reliability mindset/culture and coach teams on blameless postmortems
Career Progression & Transitions
The SRE growth ladder can look like (but NOT always true):
DevOps Engineer → Cloud Engineer → Platform Engineer → Site Reliability Engineer → Sr. SRE → Staff/Principal SRE → Distinguished Engineer → Director of Infrastructure/CTO
Salary Ranges
Entry-level SRE:
US: $99,500 – $114,499
India: ₹5.5L – ₹8.7LEarly career SRE (1–4 years):
US: $115,000 – $129,999
India: ₹8.5L – ₹12LMid-level SRE (5–9 years):
US: $130,000 – $151,500
India: ₹19L – ₹24LSenior/Principal SRE:
US: $151,500 – $175,000+
India: ₹29.8L – ₹50L+Top Companies/Highest Roles:
US: $175,000 – $220,000+
India: ₹40L – ₹109L, with some rare roles up to ₹141L+
Best Job Boards for SRE Roles (2025)
SRE Tools Comparison (2025 Snapshot)
Area | Top Tools (2025) |
|---|---|
Monitoring & Observability | Prometheus, Grafana, Datadog, New Relic, ELK Stack, OpenTelemetry |
Incident Response | PagerDuty, Opsgenie, Zenduty, Blameless |
Automation & CI/CD | Jenkins, GitHub Actions, GitLab CI, ArgoCD, Spinnaker, CircleCI |
Infrastructure-as-Code | Terraform, Pulumi, AWS CloudFormation, Ansible, Chef |
Containerization & Orchestration | Docker, Kubernetes, Helm, OpenShift |
AIOps & Automation | Moogsoft, KeepHQ, StackState, Datadog ML Models |
Chaos Engineering | Gremlin, Chaos Mesh, LitmusChaos |
Log Management | ELK Stack, Splunk, Fluentd, Loki |
Collaboration | Slack, Teams (for integrated on-call & incident comms) |
(Sources: industry aggregators)
Interview Prep Checklist for SRE Roles
Explain SLIs/SLOs/SLAs and give real-world examples
Deep-dive on monitoring/observability (metrics, logs, traces)
Walk through an incident: root cause, blameless postmortem, and improvement
Automate deployment, rollbacks, and error budget policies
Show familiarity with both cloud and container-native tooling
Discuss AI-driven and self-healing architectures (for advanced roles)
Full list of SRE Kubernetes Scenario-Based Interview Questions here:
Hope this gave you a clearer picture of what the SRE role really looks like — not just in theory, but in practice, and how it can shape your career path in the AI era.
Thanks for reading till the end.
Next week, we’ll shift gears and dive deep into the role of Platform Engineers; the invisible backbone that keeps modern teams moving fast and safely.
Good job, V!!:)
Seeking impartial news? Meet 1440.
Every day, 3.5 million readers turn to 1440 for their factual news. We sift through 100+ sources to bring you a complete summary of politics, global events, business, and culture, all in a brief 5-minute email. Enjoy an impartial news experience.