TechnicalBuild-in-Public

26 Safeguards We Learned the Hard Way

Synthcore Team10 February 20265 min read

When we first started running autonomous AI agents on real codebases, things went wrong. A lot. Agents would force-push to main. Delete node_modules and reinstall from scratch. Create circular dependencies that crashed the build.

Every single safeguard in Synthcore exists because we hit the problem first.

The categories

Our 26 safeguards fall into five categories:

Git safety — protecting your repository
Execution boundaries — limiting what agents can do
Quality gates — ensuring code meets standards
Operational controls — keeping everything running smoothly
Prompt injection defense — preventing agents from being manipulated

Git safety

These are the most critical. An agent with unrestricted git access can cause irreversible damage in seconds.

1. Safe push system

All pushes go through a controlled push script that validates the target branch, checks for conflicts, and prevents force pushes. No agent can bypass it.

2. Push protection

Agents never push directly to main or master. Every change goes through a feature branch and pull request.

3. Git self-healing

When git state gets corrupted — orphaned branches, broken refs, merge conflicts left behind — the system detects and repairs it automatically before the next cycle.

4. File boundaries

Agents can only modify files within their designated scope. The backend agent can't touch frontend code, and vice versa.

5. Destructive command blocks

Commands like git reset --hard, rm -rf, and git push --force are blocked at the shell level. No exceptions.

6. Config auto-recovery

If an agent corrupts a config file (package.json, tsconfig, etc.), the system restores it from the last known-good state automatically.

Execution boundaries

7. Directory isolation

Each agent is sandboxed to its own working directory. Agents can read shared code but can only write within their assigned boundaries.

8. Privilege separation

Different agents have different permission levels. The QA agent can run tests but can't deploy. The DevOps agent can deploy but can't modify application code.

9. Workspace isolation

Every project runs in its own VM. One project's agents can never access another project's files, environment variables, or secrets.

10. Timeout limits

Long-running tasks are broken into smaller steps with appropriate timeouts. If an agent hangs or loops, it's terminated and restarted cleanly.

11. Cost controls

Daily API spend is capped per project. If agents approach the limit, they pause non-critical work and notify you. No surprise bills.

Quality gates

12. Test validation

Every code change must pass existing tests before being committed. Agents run the full test suite after each modification.

13. Code review gates

Changes that affect security-sensitive files (auth, payments, config) trigger mandatory human review before they can be merged.

14. Memory protection

Agent context windows are bounded and monitored. When context approaches limits, older memories are summarized rather than lost, preventing hallucination drift.

15. Lint & type checks

All code must pass the project's linter and type checker before being committed. TypeScript errors, ESLint violations, and formatting issues are caught before they reach your PR.

Operational controls

16. Watchdog agent

A dedicated agent (Kai) monitors all other agents in real time. It catches regressions, detects anomalies, and flags issues before they compound.

17. Heartbeat monitoring

Every agent sends a heartbeat signal on a regular interval. If an agent goes silent, the system detects it within minutes and takes corrective action.

18. Health checks

Continuous system health checks monitor CPU, memory, disk, and network. Degraded infrastructure is flagged before it impacts agent performance.

19. Config validation

Agent configurations are validated before deployment. Invalid persona files, malformed schedules, or conflicting settings are caught and rejected.

20. Bloat prevention

Workspace size and file counts are monitored. If an agent creates too many files or inflates the repo size, it's flagged and the growth is contained.

21. Escalation protocol

When an agent encounters something it can't handle — merge conflicts, ambiguous requirements, security-sensitive decisions — it escalates to a human rather than guessing.

22. Process management

Agent processes are managed by the scheduler with clean start, stop, and restart semantics. Zombie processes are detected and cleaned up automatically.

23. Output sanitization

Agent output (commit messages, PR descriptions, log entries) is sanitized to prevent accidental exposure of secrets, internal paths, or sensitive data.

Prompt injection defense

24. Trust boundaries

Strict boundaries between what agents can read, write, and execute. External inputs (user comments, PR reviews, issue descriptions) are treated as untrusted and never executed directly.

25. Security directives

Agent system prompts include explicit security directives that resist manipulation. Agents are trained to recognize and reject prompt injection attempts in code comments, commit messages, and external content.

26. Signal validation

When agents communicate with each other (task handoffs, status updates, escalations), those signals are validated against expected schemas. Malformed or unexpected signals are rejected, preventing cascade failures.

Why this matters

Without these safeguards, autonomous agents are a liability. With them, they're a force multiplier.

Every safeguard was learned from a real incident. We broke things so you don't have to.

"The best safety features are the ones you never notice — until they save you from a disaster."

Want to see these safeguards in action? Get started and see them for yourself.