A production-style, voice-enabled chat interface that connects a React front end with a cookie-auth REST API and a Wit.ai agent for intent recognition and speech synthesis/transcription. The app implements: chatroom discovery, retrieval of latest messages, login/logout, “who am I”, and a delegated, multi-turn flow for creating a post in a chosen chatroom. Sensitive inputs are masked end-to-end in the client UI.
Agent orchestration layer. I wrote the ChatAgent that interprets free-text/voice via Wit.ai and dispatches to concrete behaviors. The agent bootstraps by fetching available chatrooms from the backend (so downstream intents operate over real data), then routes user prompts to specific handlers based on first-ranked intent (get_help, get_chatrooms, get_messages, login, register, create_message, logout, whoami) and returns either a single response or a timed sequence to emulate conversational pacing. This design localizes intent resolution and API I/O in one cohesive module for testability and clarity.
Stateful dialog via delegation. For multi-turn tasks I implemented a deterministic delegator with sub-agents. The delegator owns a single “active” delegate (LOGIN, REGISTER, CREATE) and forwards subsequent messages until the flow ends. This makes the top-level agent thin and lets each flow implement a precise finite-state sequence (initialize → follow-ups → terminate). I implemented the Login and Create Post sub-agents; the delegator supports Register but that module is intentionally not included here.
Login flow with privacy by design. The login sub-agent first refuses to proceed if you’re already logged in, then collects username and masks the subsequent credential prompt. The client renders the user’s entry as “Sensitive information redacted!” and sets the input field type to password during collection; on submission the agent posts credentials to the cookie-auth endpoint and returns success/failure with an emote. This balances usability (chat-like login) with responsibility (no secrets in the chat log).
Create-post flow with confirmation. The post sub-agent requires an authenticated session and a valid chatroom (from the NLU entity), then collects title → content → explicit confirmation (using a confirmation intent round-trip through Wit.ai). Only on affirmative confirmation does it POST to the API; otherwise it cancels with a clear, user-safe message. This reduces accidental writes and teaches the system to be intent-safe.
Voice input and synthesis hooks. I wired an AudioRecorder that captures, compresses, and hands raw audio to a synthesizer that performs streaming dictation via Wit.ai’s /dictation endpoint and returns the final transcript to the text input; the synthesizer also exposes TTS via /synthesize for short replies. The UI shows clear recording state, allows playback, and avoids blocking the input form. The synthesizer is modular, returning an object URL for audio, keeping the chat loop snappy.
Chat UX and progressive delivery. The TextApp container maintains message state, injects the agent/synthesizer into context, and progressively renders batched AI replies (staggered by 750ms) to mimic conversational cadence. It also auto-scrolls the list on new messages and shows a loader while agent work is in flight. This keeps the experience readable and voice-friendly.
Backend contract (read-only + guarded writes). The app integrates with a campus chat API that exposes chatrooms and messages and supports cookie-based sessions for login/logout and posting. All requests include a required X-CS571-ID header; write operations include credentials and JSON bodies with server-enforced size limits. This contract is respected strictly by the agent/sub-agents.
Front end (React).
TextAppis the orchestration point: it creates the chat agent and synthesizer, manages state, masks sensitive entries, and handles streaming delivery of messages. It switches the input topasswordtype when the agent indicates the next field is sensitive and replaces the visible echo with a redaction string. The component rendersTextAppMessageListand a compactAudioRecorderdocked to the input for voice.TextAppMessageListauto-scrolls to the latest entry using a ref, ensuring keyboard and screen-reader users land on the newest content without manual scroll.AudioRecorderpresents a crisp mic → recording → stop loop with visible spinners, a replay control, and a transcription callback into the input field. It calls into the synthesizer to run dictation.
Agent layer (Wit.ai + flows).
ChatAgentboots by fetching chatrooms, then sends user text to Wit.ai/message, checks top intent, and dispatches. It returns either a single string or an array, which the UI renders with staggered timing. It implements read flows (get_chatrooms,get_messages), session flows (login,logout,whoami), and task delegation entry points (create_message,register).ChatDelegatorguarantees one active sub-agent and exposesbeginDelegation,handleDelegation, andendDelegation. This isolates conversational state machines and lets each flow own its state and follow-ups cleanly.LoginSubAgentruns a two-stage flow (username → masked pin) and hits/loginwithcredentials: "include"; it returns success/failure with emote metadata for the UI.CreatePostSubAgentenforces auth and chatroom presence, collectstitle/content, asks for explicit confirmation via Wit.ai, and then posts to/messages?chatroom=.... It returns structured success (with success emote) or cancellation/errors.ChatSynthesizerstreams final segments from Wit.ai/dictationby parsing the server’s line-delimited JSON and concatenatingis_finalchunks; TTS returns a browser-safe object URL for playback. Backend contract (what’s required and how I adhere).- Endpoints include
/chatrooms,/messages,/register,/login,/logout,/whoami; messages can be filtered bychatroomandnum, and protected ops require a valid cookie session. All requests include a validX-CS571-IDheader and proper JSON content types for POSTs. The agent and sub-agents enforce these protocol details.
- Privacy by default. When the agent requests sensitive input (e.g., login pin), the UI flips the field to
password, suppresses raw echo in the transcript, and stores a neutral “Sensitive information redacted!” message instead of the secret. This prevents shoulder-surfing and log exfiltration of credentials without complicating the flow for the user. - Explicit confirmation for writes. Posting requires a clear, affirmative confirmation via the NLU before touching the API. If the confirmation is absent or negative, the agent cancels the operation safely.
- Error-aware, human feedback. Success and failure states propagate up with emote metadata so the UI can shift avatar state (success/error), reinforcing clarity for users.
- Accessibility & flow. Autoscroll on new messages, loader indicators while the agent works, and progressive delivery timing keep the conversation readable and accessible without overwhelming the user.
- Least surprise API usage. The agent strictly follows backend requirements: cookie session for protected routes, required header for all calls, and body size constraints on posts. This avoids ambiguous auth states and ensures predictable results.
- VITE_WITAIAccessToken for Wit.ai client access (used by agent and sub-agents).
- Ensure your browser sends/accepts cookies; protected endpoints require cookie auth.
# install deps and run server
npm install
npm run dev
# open http://localhost:5173Checkout: https://github.com/DeboJp/Accessible-Campus-Chat-Agent/edit/main/BadgerChat_GenAI.mp4
