Home
Synergy Agent Platform — Technical Documentation

Synergy Agent Platform

Technical Documentation

Document Purpose: Project approval brief for the Synergy multi-agent AI platform. This document covers the proposal rationale, system architecture, agent design, trade-offs, safety model, and implementation roadmap.

0. Executive Summary

Synergy is a case study based PoC to demonstrate the power of multi-agent AI platform built for WebMD Health Corp that houses three specialized AI products, each solving a distinct operational challenge in a regulated healthcare technology environment:

ProductMissionPrimary Users
VoyagerAI-powered GitHub PR code review with parallel 3-way analysis and human-in-the-loop PR selectionEngineering teams
KiteProduct Requirements Document generator with iterative human refinement loopProduct managers, engineers
NucleusOperational intelligence for incident log analysis, SEV classification, and runbook generationSRE/DevOps, incident commanders

Why now: Healthcare technology organizations face an acute productivity and safety paradox — engineering velocity must increase while regulatory compliance and patient-safety obligations demand higher quality gates. Manual code review, ad-hoc PRD writing, and reactive incident triage are the three biggest drags on WebMD engineering throughput. Synergy eliminates these bottlenecks with AI agents that amplify — not replace — human judgment through structured human-in-the-loop checkpoints.

What this is not: A speculative prototype. Every agent graph, API route, database schema, and safety primitive described in this document is production-quality code, running today on Vercel + Neon Postgres.


1. Proposal & Justification

1.1 Business Problem Statement

Voyager — The Code Review Bottleneck

WebMD engineering teams submit hundreds of PRs monthly. Manual review is time-consuming, inconsistent across reviewers, and creates a compliance risk when reviewers miss HIPAA-relevant data handling patterns. Junior engineers receive delayed feedback; senior engineers spend disproportionate time on review rather than design.

Voyager solves this by performing simultaneous three-dimensional analysis (code quality, documentation completeness, bug/security detection) on any GitHub PR in under 60 seconds, then delivering a structured report that reviewers can act on immediately.

Kite — The PRD Quality Gap

Product requirements documents at WebMD are often written ad-hoc, lack consistent structure, and miss critical acceptance criteria and edge cases for healthcare workflows (e.g., PHI handling, accessibility, fallback states for high-availability requirements). Misaligned requirements discovered late in the development cycle are one of the top sources of rework.

Kite generates comprehensive, structured PRDs from a product idea description, automatically produces acceptance criteria and edge cases with healthcare context, asks targeted clarifying questions, and iterates with the product manager until the document meets their standard — all in a single guided session.

Nucleus — Reactive Incident Response

WebMD's operational environment processes patient health data. Incident response must be fast (patient impact), accurate (avoid mis-remediation), and auditable (regulatory). Current processes involve manual log triage, ad-hoc severity assessment, and runbook lookup — each introducing delay and inconsistency.

Nucleus ingests raw operational logs, classifies incident severity (SEV1–SEV4) with clinical precision, correlates signals across systems, hypothesizes root causes, and for high-severity incidents, auto-generates runbooks in parallel with remediation plans — all while requiring human validation before any action recommendations are finalized.

1.2 Market Context

The healthcare technology sector is experiencing an AI adoption inflection point driven by:

Synergy positions WebMD to lead in responsible AI-augmented engineering — not by automating humans out, but by making every human decision better-informed and faster.

1.3 Value Metrics & ROI Narrative

Diagram

2. Agent & App Architecture

2.1 Platform-Level Architecture

Synergy is a Next.js 15 application deployed on Vercel, using LangGraph for agent orchestration, Neon Postgres for persistence and LangGraph checkpointing, and OpenAI GPT-4o/GPT-4o-mini for intelligence.

Diagram

2.1.1 Database Schema

Diagram

2.1.2 SSE Streaming Data Flow

Diagram

2.2 Voyager — PR Review Agent

Voyager automates GitHub pull request review through a structured multi-stage pipeline with a human-in-the-loop gate for PR selection and parallel analysis execution.

Key capabilities:

Diagram

2.3 Kite — PRD Generator Agent

Kite generates comprehensive, structured PRDs through a sequential generation pipeline followed by a human refinement loop. After a PRD is finalized, subsequent sessions enter a Q&A chat mode with the full document as context.

Key capabilities:

Diagram

2.4 Nucleus — Operational Intelligence Agent

Nucleus provides a full incident analysis pipeline from raw log ingestion through severity classification, signal correlation, root cause hypothesization, and response generation. SEV1/SEV2 incidents trigger parallel runbook + remediation generation for maximum speed.

Key capabilities:

Diagram

2.5 Shared Infrastructure

All three agents share a common infrastructure layer that provides consistency, performance, and reliability.

Diagram

3. Trade-offs & Design Decisions

The following decisions reflect deliberate engineering choices with clear rationale for a healthcare-regulated, startup-velocity context.

DecisionChoice MadeAlternative ConsideredRationale
Agent orchestrationLangGraph StateGraphCustom orchestrationBuilt-in checkpointing, interrupt/resume, parallel Send API, and graph visualization — would take months to reimplement reliably
Model routingGPT-4o for reasoning, GPT-4o-mini for parsing/classificationSingle model3–4× cost reduction on high-frequency operations (log parsing, idea normalization) with no quality loss; GPT-4o reserved for synthesis and judgment tasks
Human-in-the-loopLangGraph interrupt primitiveWebhook + pollingInterrupt maintains graph state atomically — no separate state machine to manage; webhook would require external state reconciliation
DatabaseNeon Postgres (serverless HTTP)WebSocket connection / SupabaseHTTP driver works on Vercel Edge without connection pool limits; PostgresSaver for LangGraph is Postgres-native
DeploymentVercel (serverless)Containers (ECS/GKE)Zero-ops scaling, instant preview deploys, git integration; acceptable for current load profile; containers deferred to Phase 4
StreamingSSE (Server-Sent Events)WebSocketSSE is unidirectional and stateless — perfect for streaming LLM tokens; WebSocket adds bidirectional complexity not needed for this pattern
Rate limitingIn-memory sliding windowRedisNo Redis dependency for MVP; sliding window with cleanup interval is correct and performant; Redis upgrade is Phase 4
AuthenticationSession cookie (anonymous)OAuth / Auth0Reduces onboarding friction for demo/pilot; OAuth is Phase 1 of productionalization roadmap
Diagram

4. Safety, Governance & Trust

Synergy is designed with defense-in-depth across every layer. In a healthcare technology context, trust is non-negotiable.

4.1 Security Architecture

Session Isolation Every database query is scoped by sessionId (extracted from HTTP-only cookie). A user cannot access another user's conversations, agent states, or outputs. The session cookie is HttpOnly; SameSite=Lax — inaccessible to JavaScript, protected against CSRF.

Rate Limiting A sliding-window rate limiter (20 req/60s per session+operation key) prevents abuse. The in-memory implementation uses a Map with automatic cleanup every 5 minutes to prevent unbounded growth. Phase 4 upgrades this to Redis for distributed enforcement.

Input Validation All API boundaries validate inputs with Zod schemas. Agent node inputs are typed via LangGraph state annotations. There is no string interpolation of user content into system prompts without sanitization.

LLM Safeguards

Audit Trail The agent_runs table records every agent invocation with: input payload, output, error (if any), duration_ms, and a nodeTrace array capturing per-node execution timing. This creates a full audit trail for compliance review.

No PII in Logs Application logs contain no PHI or PII. Log entries reference only IDs (conversationId, sessionId) and node names — never content.

Diagram

4.2 HIPAA & Healthcare Compliance Considerations


5. Project & Implementation Plan

5.1 Current State — What's Built

The following capabilities are fully implemented and functional today:

CapabilityVoyagerKiteNucleus
Core agent graph✅ Complete✅ Complete✅ Complete
Human-in-the-loop✅ PR selection✅ PRD review✅ Hypothesis validation
SSE streaming
LangGraph checkpointing✅ Postgres✅ Postgres✅ Postgres
Session isolation
Rate limiting
Conversation persistence
Parallel execution✅ 3-way review❌ Sequential✅ SEV1/2 parallel
Follow-up chat mode
GitHub integration✅ Repos + PRs❌ N/A❌ N/A
Mock data / demo mode

5.2 Productionalization Roadmap

Diagram

5.3 Productionalization Maturity Levels

Diagram

5.4 Scaling Architecture — Future State

Diagram

5.5 Functional Trustworthiness

A production AI system must be testable, observable, and auditable. Synergy's quality strategy:

Testing Strategy

Observability

Human-in-the-Loop as Quality Gate The interrupt pattern is not just UX — it's a quality gate. No action recommendations (remediation steps, PR merge decisions) are issued without explicit human validation. This is the primary safeguard against LLM overconfidence in high-stakes scenarios.


6. Appendix

6.1 Technology Stack Reference

LayerTechnologyVersion / Notes
FrameworkNext.js15.x (App Router)
LanguageTypeScript5.x, strict mode
Agent OrchestrationLangGraph (@langchain/langgraph)0.2.x
LLM IntegrationLangChain OpenAI (@langchain/openai)Latest
LLM ModelsGPT-4o, GPT-4o-miniOpenAI API
DatabaseNeon Postgres (serverless)HTTP driver for edge compat
ORMDrizzle ORMType-safe schema + migrations
Checkpointing@langchain/langgraph-checkpoint-postgresPostgresSaver
UI Componentsshadcn/ui + Tailwind CSSRadix UI primitives
Markdown Renderingreact-markdown + remark-gfmWith Mermaid diagram support
Diagram Renderingmermaid.jsClient-side rendering
DeploymentVercelServerless, edge-compatible
SessionHTTP-only cookie (uuid v4)30-day TTL
Rate LimitingIn-memory sliding window→ Redis in Phase 4
GitHub IntegrationGitHub REST APIRead-only token scoping

6.2 Environment Configuration

VariablePurposeRequired
DATABASE_URLNeon Postgres connection stringYes
OPENAI_API_KEYOpenAI API accessYes
GITHUB_TOKENGitHub API read accessYes (Voyager)
NEXT_PUBLIC_APP_URLApp base URL for absolute linksProduction

6.3 Key Code Artifacts

LLM Singleton Pattern — prevents redundant model initialization:

typescript
// src/lib/agents/shared/llm.ts
let _reasoningModel: ChatOpenAI | null = null;

export function getReasoningModel(): ChatOpenAI {
  if (_reasoningModel) return _reasoningModel;
  _reasoningModel = new ChatOpenAI({
    modelName: "gpt-4o",
    temperature: 0,
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  return _reasoningModel;
}

Parallel Fan-out via Send API — Voyager's 3-way parallel review:

typescript
// src/lib/agents/voyager/graph.ts
.addConditionalEdges("analyze_diff", (_state) => {
  return [
    new Send("code_quality_review", {}),
    new Send("doc_review", {}),
    new Send("bug_check", {}),
  ];
})

Severity-Conditional Parallel Execution — Nucleus SEV1/2 parallel runbook:

typescript
// src/lib/agents/nucleus/graph.ts
.addConditionalEdges("human_validate", (state) => {
  if (state.severity === "SEV1" || state.severity === "SEV2") {
    return [
      new Send("generate_remediation", {}),
      new Send("draft_runbook", {}),
    ];
  }
  return [new Send("generate_remediation", {})];
})

Session-Scoped Rate Limiting:

typescript
// src/lib/rate-limit.ts
export function rateLimit(
  key: string,
  limit = 20,
  windowMs = 60_000,
): { success: boolean; remaining: number } {
  // Sliding window — prune timestamps outside current window
  entry.timestamps = entry.timestamps.filter((t) => t > windowStart);
  if (entry.timestamps.length >= limit) return { success: false, remaining: 0 };
  entry.timestamps.push(now);
  return { success: true, remaining: limit - entry.timestamps.length };
}

6.4 Glossary

TermDefinition
AgentAn AI system that perceives state, calls LLMs and tools, and produces actions
StateGraphLangGraph's graph type where each node reads and writes to a shared typed state object
CheckpointerLangGraph persistence layer that saves graph state after each node — enables interrupt/resume
InterruptLangGraph mechanism to pause graph execution and yield control to a human
Send APILangGraph primitive for parallel fan-out: dispatches multiple nodes simultaneously
SSEServer-Sent Events — HTTP streaming protocol for pushing events from server to browser
SEV1–SEV4Incident severity levels: SEV1 (critical, patient impact) → SEV4 (minor, no user impact)
PRDProduct Requirements Document — structured specification for a software feature or product
HITLHuman-in-the-Loop — explicit human decision point within an automated agent workflow
RAGRetrieval-Augmented Generation — not used in current Synergy MVP; future enhancement
BAABusiness Associate Agreement — HIPAA contract between covered entities and vendors
RBACRole-Based Access Control — permission system controlling what users can access

Thank you for your time and consideration. Synergy was built with care — every design decision, agent graph, and safety primitive reflects a genuine belief that AI should make human judgment sharper, not replace it.

Abhishek Choudhury

ValueLabs· March 2026