Lead OS Service Level Agreement (SLA)
Effective Date: March 30, 2026 Last Updated: April 22, 2026 Version: 1.0
Template — not a self-executing contract. This Markdown file is a starting point for your legal and ops teams. It does not create monitoring, status pages, support desks, backups, or credits until you configure infrastructure, publish your real URLs and emails, and sign customer-facing agreements. Replace every placeholder (YOUR_*) below. RPO/RTO, synthetic monitoring regions, and backup frequency in §8 are design targets that only apply when you implement the backup and Postgres posture described in `DEPLOYMENT.md` (continuous backup / PITR, staffed on-call, etc.). For what the open-source kernel ships without extra infra, see `PRODUCT-SURFACES.md` and the in-app summary at/docs/sla.
---
1. Service Level Commitment
Lead OS commits to a 99.9% monthly uptime for all production services, measured as the percentage of time the platform's core API endpoints respond successfully within each calendar month.
1.1 Uptime Calculation
Monthly Uptime % = ((Total Minutes - Downtime Minutes) / Total Minutes) * 100- Total Minutes: All minutes in the calendar month.
- Downtime Minutes: Minutes during which the platform returns HTTP 5xx errors or is unreachable from two or more monitoring regions, excluding Scheduled Maintenance Windows.
1.2 Covered Services
| Service | Endpoint | SLA Target |
|---|---|---|
| Core API | /api/* | 99.9% |
| Dashboard | /dashboard | 99.9% |
| Webhook Delivery | Outbound webhook POST | 99.5% |
| Agent Execution | Background agent runs | 99.5% |
---
2. Scheduled Maintenance
- Window: Tuesdays, 02:00 - 04:00 UTC
- Advance Notice: Minimum 72 hours via email and status page
- Emergency Maintenance: May occur outside windows with best-effort advance notice (minimum 4 hours)
- Scheduled maintenance is excluded from downtime calculations.
---
3. Service Credits
If Lead OS fails to meet the monthly uptime commitment, eligible customers may request a service credit applied to the next billing cycle.
| Monthly Uptime | Credit (% of Monthly Fee) |
|---|---|
| 99.0% -- 99.9% | 10% |
| 95.0% -- 99.0% | 25% |
| Below 95.0% | 50% |
3.1 Credit Request Process
- Submit a request within 30 days of the affected month.
- Include dates, times (UTC), and affected services.
- Credits are reviewed and applied within 15 business days.
- Maximum credit per month: 50% of that month's fees.
---
4. Exclusions
This SLA does not apply to downtime caused by:
- Force Majeure: Natural disasters, war, government actions, pandemics.
- Customer Actions: Misconfiguration, exceeding rate limits, or abuse.
- Third-Party Services: Outages of upstream providers (DNS, CDN, payment processors) unless Lead OS fails to activate documented failover mechanisms.
- Alpha/Beta Features: Services explicitly marked as preview or experimental.
- Scheduled Maintenance: As defined in Section 2.
---
5. Monitoring and Reporting
5.1 Synthetic Monitoring
- Frequency: Every 60 seconds
- Regions: US-East, EU-West, AP-Southeast (minimum 3)
- Protocol: HTTPS GET to
/api/healthand/api/health/deep - Alerting Threshold: 2 consecutive failures from 2+ regions triggers an incident
5.2 Status Page
- Public URL:
YOUR_STATUS_PAGE_URL(configure your Better Stack, Instatus, Atlassian Statuspage, or self-hosted status deployment; example only:https://status.example.com) - Real-time component status, uptime history (30-day and 90-day), and incident timeline — after you connect probes to
/api/healthand/api/health/deepfrom production. - Subscribers receive email/SMS notifications for status changes when your status product is configured.
---
6. Incident Response
6.1 Severity Levels
| Priority | Definition | Response Time | Update Frequency |
|---|---|---|---|
| P1 | Complete service outage or data loss risk | 15 minutes | Every 30 minutes |
| P2 | Major feature degraded, no workaround | 1 hour | Every 2 hours |
| P3 | Minor feature degraded, workaround available | 4 hours | Every 8 hours |
| P4 | Cosmetic issue or documentation question | 24 hours | Best effort |
6.2 Incident Communication
- P1/P2 incidents are posted to the status page within the response time above.
- A post-incident report (PIR) is published within 5 business days of P1 resolution.
- PIR includes root cause, timeline, impact scope, and preventive measures.
---
7. Performance SLA
| Metric | Target |
|---|---|
| API Response Time (p50) | < 200 ms |
| API Response Time (p95) | < 500 ms |
| API Response Time (p99) | < 1,500 ms |
| Webhook Delivery (p95) | < 5 seconds |
| Dashboard Load (p95) | < 3 seconds |
Performance targets are measured across all monitoring regions and exclude client-side network latency.
---
8. Data Protection
Targets below assume managed PostgreSQL (or equivalent) with continuous backup / PITR and object-store or cross-region snapshot policy enabled by the operator. A default local or single-region install does not automatically meet these numbers.
- Backup Frequency: Continuous WAL streaming with daily full snapshots (when configured per
DEPLOYMENT.md). - Recovery Point Objective (RPO): < 1 hour (requires WAL archiving and tested restore drills).
- Recovery Time Objective (RTO): < 4 hours (requires runbooks, spare capacity, and verified failover).
- Retention: 30-day backup retention with geographic redundancy (policy choice — document yours).
---
9. Contact
- Support Portal:
YOUR_SUPPORT_PORTAL_URL(e.g. Zendesk, Linear Asks, or/helpon your own domain) - Email:
YOUR_SUPPORT_EMAIL(map fromNEXT_PUBLIC_SUPPORT_EMAILin production) - Emergency Hotline: Available to Enterprise plan customers only when you publish a real roster (not included in the repository).
---
*This SLA is subject to the Lead OS Terms of Service. In the event of conflict, the Terms of Service prevail.*