Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions GEMINI_API.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Gemini API Adapter Specifications

This document outlines the plan to build an adapter to consume the Google Gemini API, achieving feature parity with the existing Anthropic API integration.

## Goals

1. **Generate Text:** Support text generation with Gemini models. [DONE]
2. **Multimodal:** Support file uploads (images, PDFs) and usage in requests. [DONE]
3. **Tool Calling:** Support function calling (tools). [DONE]
4. **Streaming:** Support streaming responses. [DONE]

## Architecture

The integration will follow the existing pattern used for Anthropic:
- **Namespace:** `Gemini` module in `lib/gemini.rb` and `lib/gemini/`. [DONE]
- **Client:** `Gemini::Client` to handle HTTP requests. [DONE]
- **Request Object:** `Gemini::InvokeModelRequest` to format the payload. [DONE]
- **Response Object:** `Gemini::InvokeModelResponse` and `Gemini::StreamResponse` to normalize outputs. [DONE]
- **Turns:** `Gemini::Turn` to format conversation history. [DONE]
- **Files:** `Gemini::FilesClient` for the Files API. [DONE]

## Development Phases

### Phase 1: Foundation & Text Generation

**Goal:** successfully generate text from a single prompt using a Gemini model.

- [x] Create `lib/gemini.rb` and `lib/gemini/client.rb`.
- [x] Implement `Gemini::Client#initialize` using `GEMINI_API_KEY`.
- [x] Define Gemini models in `lib/gemini.rb` (e.g., `gemini-3-flash-preview`, `gemini-2.5-pro`).
- [x] Update `GenerativeText::MODELS` to include Gemini models (vendor: `:google`).
- [x] Update `GenerativeText.client_for` to handle `:google` vendor.
- [x] Create `lib/gemini/invoke_model_request.rb` to format basic text prompts.
- [x] Create `lib/gemini/invoke_model_response.rb` to wrap the response.
- [x] Implement `Gemini::Client#invoke_model`.
- [x] Verify basic text generation in Rails console.

### Phase 2: Multi-turn Conversations (Chat)

**Goal:** Support conversation history.

- [x] Create `lib/gemini/turn.rb`.
- [x] Implement `Gemini::Turn.for(request, turns:)` to format `GenerateTextRequest` and history into Gemini's `contents` format (`role`, `parts`).
- [x] Update `GenerateTextRequest#to_turn` to handle `:google` vendor.
- [x] Update `Gemini::InvokeModelRequest` to accept and format the full conversation history.
- [x] Verify multi-turn chat in Rails console.

### Phase 3: Streaming

**Goal:** Support real-time response streaming.

- [x] Create `lib/gemini/stream_event.rb` to parse SSE chunks.
- [x] Create `lib/gemini/stream_response.rb` to aggregate chunks.
- [x] Implement `Gemini::Client#invoke_model_stream`.
- [x] Verify streaming works in the UI.

### Phase 4: Multimodal & Files

**Goal:** Support attaching images and PDFs to prompts.

- [x] Create `lib/gemini/files_client.rb` to wrap Gemini Files API (`upload`, `get`, `delete`).
- [x] Add `upload_file` and `delete_file` methods to `lib/gemini.rb`.
- [x] Update `Gemini::Turn` to include file parts in the content.
- [x] Verify image/PDF analysis.

### Phase 5: Tool Calling

**Goal:** Support defining and invoking tools.

- [x] Update `Gemini::InvokeModelRequest` to map `LlmTool` definitions to Gemini's `tools` -> `function_declarations` format.
- [x] Handle tool use responses in `Gemini::InvokeModelResponse`.
- [x] Verify the model can call tools (e.g., getting the weather, or whatever tools are defined).

### Phase 6: Polish & Testing

**Goal:** Ensure code quality and stability.

- [x] Add RSpec tests for `Gemini::Client`.
- [x] Add RSpec tests for `Gemini::Turn` and `Gemini::InvokeModelRequest`.
- [x] Add VCR cassettes for API interactions (using WebMock stubs in this implementation).
- [x] Ensure error handling (map Gemini errors to `Gemini::ClientError` equivalents).

## Reference: Data Structures

**Gemini Content Format:**
```json
{
"role": "user",
"parts": [
{ "text": "Hello" },
{ "file_data": { "mime_type": "...", "file_uri": "..." } }
]
}
```

**Gemini Tools Format:**
```json
{
"function_declarations": [
{
"name": "get_weather",
"description": "...",
"parameters": { ... }
}
]
}
```
91 changes: 91 additions & 0 deletions GEMINI_API_INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Gemini API Integration Specifications

This document outlines the plan to integrate the Gemini API adapter into the existing Rails application, ensuring seamless user interaction for model selection, text generation, and file uploads.

## Analysis

### Current Architecture
1. **Model Selection:**
- **UI:** `PromptFormComponent` uses `GenerativeText.active_models` to populate the model dropdown. Since Gemini models are now registered in `GenerativeText::MODELS`, they automatically appear in the UI.
- **Frontend:** `prompt_form_controller.js` handles file input toggling based on model capabilities (already supported via `model_data` serialization).
- **Settings:** User settings for default models (`User#setting.text_model`) are string-based and agnostic to the vendor.

2. **Request Handling:**
- **Controller:** `ConversationsController` uses `ConversationForm` to process requests.
- **Form:** `ConversationForm` creates a `GenerateTextRequest` with the selected model.
- **Job:** `GenerateTextJob` executes the request in the background. It calls `GenerativeText.new.invoke_model(request)`.
- **Service:** `GenerativeText` delegates to the appropriate client (Anthropic or Gemini) based on the model's vendor.

3. **File Uploads:**
- **Controller:** `ConversationContextsConversationsController` handles file uploads. Logic has been updated to use `Gemini.upload_file` when the user's preferred model is a Gemini model.
- **Context:** `ConversationContext` stores the file reference (URI for Gemini, ID for Anthropic) and mime type.

4. **Response Handling:**
- **Job:** `GenerateTextJob` expects the response object to respond to `.data` (for storage) and `.content` (for broadcasting).
- **View:** `GenerateTextRequestComponent` renders the response content.

### Identified Gaps
1. **Missing `data` Method:** `Gemini::InvokeModelResponse` does not expose the raw response data via a `data` method, which is required by `GenerateTextJob` to save the raw response to the database.

## Development Phases

### Phase 1: Fix Response Interface

**Goal:** Ensure `Gemini::InvokeModelResponse` adheres to the interface expected by `GenerateTextJob`.

- [x] Update `Gemini::InvokeModelResponse` to expose `@response_json` via a `data` method/attribute.
- [x] Add a spec to verify `data` returns the raw hash.

### Phase 2: User Interface Polish (Optional)

**Goal:** Improve visual distinction between models.

- [x] (Optional) Update `ConversationTurnComponent` or CSS to display vendor-specific icons (e.g., Google logo for Gemini) if desired. Currently, it uses a generic robot icon.

### Phase 3: End-to-End Verification

**Goal:** Verify the full flow from UI to Database.

- [x] Verify `GenerateTextJob` runs successfully with a Gemini model.
- [x] Verify `GenerateTextRequest` saves the raw Gemini JSON response in the `response` column.
- [x] Verify streaming works in the browser (simulated via system tests).

## TODOs

- [x] Fix `Gemini::InvokeModelResponse#data`.
- [x] Verify `GenerateTextJob` with Gemini model via console/test.

### Phase 4: Provider-Aware Conversation Contexts

**Goal:** Make `ConversationContext` explicitly aware of its provider to ensure only compatible contexts are used.

- [x] Add `vendor` column to `ConversationContext` table.
- [x] Update `ConversationContextsConversationsController#create` to determine and save the `vendor` when uploading a new file.
- [x] Update `InvokeModelRequest` for both `Anthropic` and `Gemini` to filter contexts based on the active model's vendor.

### Phase 5: Dynamic Context UI

**Goal:** Update the "Attach File" UI to dynamically show contexts that are compatible with the selected model.

- [x] Add a new route/action to fetch available `ConversationContext` records filtered by `vendor`.
- [x] Update `prompt_form_controller.js` to fetch and render the filtered context list when the model selection changes.
- [x] Create a Turbo Stream view to render the updated context list.
- [x] Add vendor badges to the context selection UI.
- [x] Allow uploading `.md` (markdown) files.

### Phase 6: Update Gemini Models

**Goal:** Update the Gemini model list to the latest models.

- [x] Update `lib/gemini.rb` with the latest model names and ensure they are marked as active.
- [x] Set max tokens to 65,536 for all models.


### Phase 7: Conversation Contexts

- [x] In app/views/conversation_contexts_conversations/index.html.haml, the @available_contexts should be scoped to the model selected in the prompt form component. When selecting an Anthropic model, the available contexts should only be Anthropic file uploads. When selecting a Google model, the available contexts should only be Google file uploads. These needs to happen dynamically.
- [x] In app/views/conversation_contexts_conversations/index.html.haml show a vendor badge, Anthropic or Google. This needs to change dynamically when the user selects a model in the prompt form component. Also show a vendor badge next to each of the selected conversation contexts. Show them in a disabled state when the currently selected model vendor is different from the context's vendor.
- [x] In app/views/conversation_contexts/_conversation_context.html.haml show a vendor badge next to each conversation context.
- [x] Create a scheduled sidekiq job the deletes Google file uploads / conversation contexts, 48 hours after they are uploaded / created.
- [x] When creating a conversation_conversation_context record, the front end should pass the vendor. Currently the vendor is inferred from the user's settings which may not match the model selected in prompt form component.

12 changes: 9 additions & 3 deletions TODO.org
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,17 @@ CLOSED: [2025-09-15 Mon 08:19]
* TODO Make it optional to include generated images in the conversation context
Right now it's decided by the backend to include the previous one automatically. This
isn't always -- or usually -- what I want

* TODO Fix Markdown tables in assistant response
- [ ] When converting markdown to HTML, the tables to not appear properly (eg, not
in table format). Find a way to render tables properly
* TODO Gemini integration
- [ ] Refactor existing claude integration to adapter
- [ ] Add Gemini interface
- [ ] Add Gemini models selectable in UI
- [X] Refactor existing claude integration to adapter
- [X] Add Gemini interface
- [X] Add Gemini models selectable in UI
- [ ] Handle context documents
- [ ] Pass vendor when creating conversation contexts. Right now the vendor is
inferred from the user settings default model.
* TODO Enhanced copy response
- [ ] Option to copy markdown formatted response
* DONE Add support for PDFs
Expand Down
49 changes: 45 additions & 4 deletions app/controllers/conversation_contexts_conversations_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,15 @@
end

if conversation_context_params[:file].present?
file_response = Anthropic.upload_file(conversation_context_params[:file])
contexts << ConversationContext.create_for!(current_user, file_response)
vendor = conversation_context_params[:vendor] || current_vendor

file_response = if vendor.to_sym == :google
Gemini.upload_file(conversation_context_params[:file])
else
Anthropic.upload_file(conversation_context_params[:file])
end

contexts << ConversationContext.create_for!(current_user, file_response, vendor: vendor)
end

contexts.select(&:persisted?).each do |context|
Expand Down Expand Up @@ -51,12 +58,35 @@

def index
@contexts = @conversation.conversation_contexts.order(created_at: :desc)
@vendor = current_vendor
@available_contexts = available_contexts
respond_to do |format|
format.html
format.json { render json: @contexts, status: :ok }
end
end

def available
@vendor = params[:vendor]
@available_contexts = available_contexts

respond_to do |format|
format.turbo_stream do
render turbo_stream: [
turbo_stream.update(
'context-files-selector',
partial: 'context_files_selector',
locals: { conversation: @conversation, available_contexts: @available_contexts, current_vendor: @vendor }
),
turbo_stream.update(
'conversationContextModalLabel',
"Conversation Context <span class='badge rounded-pill bg-info ms-2'>#{@vendor.to_s.titleize}</span>".html_safe

Check warning

Code scanning / CodeQL

Reflected server-side cross-site scripting Medium

Cross-site scripting vulnerability due to a
user-provided value
.

Copilot Autofix

AI 4 months ago

In general, to fix this, any user-controlled value interpolated into HTML must be escaped or strongly validated/whitelisted before being marked html_safe. You should avoid calling html_safe on strings that contain raw user input, or ensure the user input is sanitized (e.g., ERB::Util.html_escape) or restricted to a known safe set of values.

The minimal fix here without changing existing behavior is: ensure @vendor is converted into a safe, normalized value before being interpolated, and avoid passing raw params[:vendor] through. We already have a current_vendor method that derives a vendor symbol from params[:vendor] or other internal sources, so we can reuse that. In the available action, instead of setting @vendor = params[:vendor], set @vendor = current_vendor. When interpolated in the label "Conversation Context … #{@vendor.to_s.titleize}", this will now be built from the whitelisted symbol returned by current_vendor, not arbitrary user input. We can keep the html_safe call because only the known vendor name will be inserted between tags, and Rails will escape the symbol’s string representation when it was originally constructed or ensure it is not tainted with arbitrary HTML; alternatively, if you want to be stricter, you could wrap @vendor.to_s.titleize with ERB::Util.html_escape, but that requires adding an import. The most straightforward fix within the shown code is to change line 70 to use current_vendor.

Concretely:

  • Edit app/controllers/conversation_contexts_conversations_controller.rb.
  • In the available action, replace @vendor = params[:vendor] with @vendor = current_vendor.
  • No additional methods or imports are needed because current_vendor is already defined in this controller.
Suggested changeset 1
app/controllers/conversation_contexts_conversations_controller.rb

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app/controllers/conversation_contexts_conversations_controller.rb b/app/controllers/conversation_contexts_conversations_controller.rb
--- a/app/controllers/conversation_contexts_conversations_controller.rb
+++ b/app/controllers/conversation_contexts_conversations_controller.rb
@@ -67,7 +67,7 @@
   end
 
   def available
-    @vendor = params[:vendor]
+    @vendor = current_vendor
     @available_contexts = available_contexts
 
     respond_to do |format|
EOF
@@ -67,7 +67,7 @@
end

def available
@vendor = params[:vendor]
@vendor = current_vendor
@available_contexts = available_contexts

respond_to do |format|
Copilot is powered by AI and may make mistakes. Always verify output.
)
]
end
end
end

def destroy
@context = @conversation.conversation_contexts.find(params[:id])

Expand Down Expand Up @@ -100,15 +130,26 @@
@conversation = current_user.conversations.find(params[:conversation_id])
end

def default_vendor
current_user.setting&.text_model&.yield_self do |m|
GenerativeText::MODELS.find { |mod| mod.api_name == m }
end&.vendor
end

def set_available_contexts
@available_contexts = available_contexts
end

def available_contexts
current_user.conversation_contexts.available_for(@conversation).order(:filename)
current_user.conversation_contexts.available_for(@conversation).order(created_at: :desc)
end

def current_vendor
vendor = params[:vendor] || @conversation.turns.last&.turnable&.model&.vendor || default_vendor || :anthropic
vendor.to_sym
end

def conversation_context_params
params.require(:conversation_context).permit(:file, conversation_context_ids: [])
params.require(:conversation_context).permit(:file, :vendor, conversation_context_ids: [])
end
end
46 changes: 45 additions & 1 deletion app/javascript/controllers/conversation_context_controller.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,63 @@ import { createConversationContext } from '@javascript/http';

export default class ConversationContextController extends Controller {
static targets = [
"dropZone", "fileInput", "successAlert", "errorAlert", "successMessage", "errorMessage", "spinner"
"dropZone", "fileInput", "successAlert", "errorAlert", "successMessage", "errorMessage", "spinner", "item"
]

abortController = null;
conversationId = null;

connect() {
this.conversationId = this.element.dataset.conversationId
this.boundOnModelChanged = this.onModelChanged.bind(this)
document.addEventListener('prompt-form:model-changed', this.boundOnModelChanged)

this.updateItems()

// Slight delay to ensure other controllers (like the selector) are connected
setTimeout(() => {
this.dispatch('opened', { detail: { vendor: this.currentVendor } })
}, 100)
}

disconnect() {
if (this.abortController) {
this.abortController.abort()
}
document.removeEventListener('prompt-form:model-changed', this.boundOnModelChanged)
}

get currentVendor() {
if (this.element.dataset.vendor) return this.element.dataset.vendor

const modelSelect = document.querySelector('[data-prompt-form-target="modelSelect"]')
if (!modelSelect) return 'anthropic'
const modelData = JSON.parse(modelSelect.dataset.modelData)
const selectedModel = modelData.find(m => m.api_name === modelSelect.value)
return selectedModel ? selectedModel.vendor : 'anthropic'
}

onModelChanged(event) {
this.element.dataset.vendor = event.detail.vendor
this.updateItems()
// The frame reload handled in PromptForm will replace the modal content,
// but we can also trigger immediate updates if needed.
}

updateItems() {
const vendor = this.currentVendor
this.itemTargets.forEach(item => {
const itemVendor = item.dataset.vendor
if (itemVendor && itemVendor !== vendor) {
item.classList.add('disabled-context')
item.style.opacity = '0.5'
item.style.pointerEvents = 'none'
} else {
item.classList.remove('disabled-context')
item.style.opacity = '1'
item.style.pointerEvents = 'auto'
}
})
}

openFileDialog() {
Expand Down Expand Up @@ -74,6 +117,7 @@ export default class ConversationContextController extends Controller {
const response = await createConversationContext(
this.conversationId,
file,
this.currentVendor,
this.abortController.signal
)

Expand Down
Loading
Loading