Skip to content

Fix Session reconciler error in stream.space#118

Merged
JoshuaAFerguson merged 14 commits into
mainfrom
claude/fix-session-reconciler-error-013WosBjy6ohkEvXEt6Pfmxy
Nov 20, 2025
Merged

Fix Session reconciler error in stream.space#118
JoshuaAFerguson merged 14 commits into
mainfrom
claude/fix-session-reconciler-error-013WosBjy6ohkEvXEt6Pfmxy

Conversation

@JoshuaAFerguson

Copy link
Copy Markdown
Member

No description provided.

Fix error "template brave is not valid: Template is valid and ready to use"
which occurs when the session controller reads a stale cache where Valid=false
but Message contains the success message from a previous validation.

When Valid=false but Message="Template is valid and ready to use", the
controller now waits and requeues instead of failing, allowing the cache
to sync with the template controller's status update.
The Template CRD was missing the status.valid field that the controller
expects. This caused the warning "unknown field status.valid" and prevented
template validation status from being persisted.

Changes:
- Replace status.phase with status.valid boolean field
- Add Valid column to kubectl output
- Update field descriptions
Applications are no longer automatically disabled when their folder path
doesn't exist. Instead, they remain enabled and the controller sync
mechanism will recreate missing templates via AppInstallEvents.

This prevents applications from being incorrectly disabled when the
controller hasn't yet created the Kubernetes resources.
… missing

When the database doesn't have the session URL or phase, fetch the status
directly from Kubernetes. This fixes the issue where:
- Resources are not showing in the UI
- Connect button is unclickable (URL missing)

The database cache may not have the latest status because the controller
updates Kubernetes status but doesn't communicate it back to the API via NATS.
This fix ensures the UI always gets the latest session status.
…eRequirements

The Session CRD had a flat resources schema with just memory/cpu fields,
but the Go types and event handler use the standard Kubernetes
ResourceRequirements structure with nested requests/limits.

This was causing "unknown field spec.resources.limits" and
"unknown field spec.resources.requests" warnings when creating sessions,
preventing resources from being properly set on session pods.

Updated the CRD schema to use the proper structure with:
- resources.requests (minimum resources required)
- resources.limits (maximum resources allowed)

Both use the standard Kubernetes quantity format with x-kubernetes-int-or-string.
… missing

This commit addresses issues with session URL and resources not showing in the UI:

1. Cache sessions to database on creation
   - CreateSession now caches the session to the database immediately
   - This ensures status updates via NATS can find the session to update

2. Update k8s client to parse nested resources structure
   - parseSession now handles the new ResourceRequirements structure
   - Supports both requests/limits format and flat format for compatibility

3. Add publishSessionStatusWithURL function
   - New function allows publishing status events that include URL and pod name
   - Original publishSessionStatus delegates to the new function

The API's convertDBSessionToResponse already has a fallback to fetch from
Kubernetes when the database cache is missing the URL, so sessions should
now display correctly once these changes are deployed.
When a user tries to launch a session but the Kubernetes Template CRD is
missing (even though the application shows as installed), the API now
automatically triggers a reinstallation:

1. Queries the application details from the database
2. Publishes an AppInstallEvent to trigger the controller
3. Updates install_status to 'creating' to track progress
4. Returns 503 Service Unavailable asking user to retry

This handles the case where templates are deleted from Kubernetes but
the database still has the application record. The user just needs to
click the launch button again after a few seconds.
The ApplicationInstallReconciler was updating the CRD status but not
publishing AppStatusEvent via NATS to notify the API. This caused
applications to remain in "creating" or "pending" status in the database
even after templates were successfully created.

Changes:
- Add NATS connection and controller ID to ApplicationInstallReconciler
- Add publishAppStatus method to send status events to API
- Publish "installed" status when template is created or already exists
- Publish "failed" status when template creation fails
- Update main.go to create NATS connection for the reconciler

Now when applications are reinstalled or templates are created, the API
database will be updated with the correct install_status.
The UI Connect button checks if session.status.phase === 'Running' but
the API was storing event.Status (like 'created') instead of event.Phase
(like 'Running'). This caused the Connect button to remain disabled even
when sessions were running successfully.

Also stores pod_name from the event for completeness.
The SessionReconciler was updating the CRD status with Phase and URL but
never publishing to NATS. This meant the API database was never updated,
so the UI's Connect button remained disabled (it requires phase = "Running").

Changes:
- Add NATS connection fields to SessionReconciler struct
- Add publishSessionStatus method to publish status events
- Call publishSessionStatus in handleRunning/handleHibernated/handleTerminated
- Create NATS connection for SessionReconciler in main.go

The controller now publishes session status events after updating the CRD,
allowing the API to update its database and enabling the Connect button.
The WebSocket session broadcast was returning status fields with PascalCase
keys (Phase, PodName, URL) but the UI expects camelCase (phase, podName, url).

This caused the Connect button check `session.status.phase !== 'Running'`
to always fail since JavaScript is case-sensitive and 'Phase' !== 'phase'.

Explicitly map the status fields to the correct JSON key names.
When an application is installed from the catalog, now stores:
- Icon binary data (downloaded from icon URL)
- Icon media type (MIME type)
- Description, category, manifest from the template

This enables applications to survive API restarts and supports
offline/air-gapped deployments where external icon URLs may not
be accessible.

Changes:
- Add icon_data, icon_media_type, description, category, manifest
  columns to installed_applications table
- Update InstallApplication to fetch and store all template data
- Add downloadIcon helper to fetch icons from URLs
- Add GET /api/v1/applications/:id/icon endpoint to serve icons
- Update InstalledApplication model with new fields
The handleSessionStatus function was storing the capitalized Kubernetes
phase (e.g., "Running") in the state field, but the UI expects lowercase
values (e.g., "running") for session lifecycle checks. This caused the
session viewer to show "Session is not running" even when sessions were
actually running.

Convert Phase to lowercase using strings.ToLower() before storing in
the state database field.
The UI expects status.phase to be capitalized (e.g., "Running") while
state should be lowercase (e.g., "running"). The convertDBSessionToResponse
function was using the lowercase state value directly for status.phase.

Inline the capitalization logic to convert "running" -> "Running"
when building the session response status field.
@JoshuaAFerguson JoshuaAFerguson merged commit b183316 into main Nov 20, 2025
7 of 23 checks passed
@JoshuaAFerguson JoshuaAFerguson deleted the claude/fix-session-reconciler-error-013WosBjy6ohkEvXEt6Pfmxy branch November 20, 2025 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants