Skip to content

ADFA-4388 | embedding model crash#1429

Open
jatezzz wants to merge 4 commits into
stagefrom
fix/ADFA-4388-embedding-model-crash
Open

ADFA-4388 | embedding model crash#1429
jatezzz wants to merge 4 commits into
stagefrom
fix/ADFA-4388-embedding-model-crash

Conversation

@jatezzz

@jatezzz jatezzz commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Description

Crash Before Fix:
FATAL EXCEPTION: Llm-RunLoop
SIGABRT
decode: cannot decode batches with this context (calling encode() instead)

Error Message After Fix:
The selected model 'all-MiniLM-L6-v2-ggml-model-f16.gguf' is an embedding model
designed for semantic search and similarity tasks. It cannot be used for chat or
text generation.

Please select a chat/instruct model instead (e.g., models with 'chat', 'instruct', 'conversational' in their name).

Details

telegram-cloud-document-1-5022011912393590501.mp4

Ticket

ADFA-4388

Observation

This fix was developed based on consistent crash reproduction with embedding models (e.g., all-MiniLM-L6-v2). The crash occurred 100% of the time when users selected
embedding models for chat, resulting in SIGABRT in the Llm-RunLoop thread.

The multi-layer approach ensures that:

  1. Native layer catches the issue first (pooling type check)
  2. Kotlin layer provides fallback validation
  3. UI layer gracefully handles exceptions and displays helpful guidance

The app now correctly identifies incompatible model types and guides users to select appropriate chat/instruct models, eliminating the crash entirely.

jatezzz and others added 4 commits June 19, 2026 14:26
Added multi-layer protection to detect and reject embedding models:

**Native Layer (C++):**
- Check pooling_type in new_context() - reject if not LLAMA_POOLING_TYPE_NONE
- Added get_pooling_type() JNI function for Kotlin validation
- Clear error messages explaining embedding vs generative models

**Kotlin Layer:**
- Validate model during load() in LLamaAndroid.kt
- Catch IllegalStateException and wrap with user-friendly message
- File format validation for ONNX, PyTorch, TensorFlow, etc.

**UI Layer:**
- Proper exception handling in AiSettingsViewModel
- Display error in ModelLoadingState.Error instead of crashing
- Keep bottom sheet expanded after file picker to show error/status

**Infrastructure:**
- Rebuilt llama.cpp AAR with updated native code (v8)
- Updated LLAMA_LIB_VERSION to 8 in DynamicLibraryLoader

The app now gracefully handles embedding models with clear error
messages instead of crashing with SIGABRT.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Removed inline comments explaining bottom sheet behavior
- Extracted file extension strings to named constants (EXT_*)
- Extracted keyword strings to named constants (KEYWORD_*)
- Improved code maintainability and reduced duplication

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Release Notes

Critical Bug Fix

  • Fixed SIGABRT crash when users selected embedding models (e.g., all-MiniLM-L6-v2) for chat operations by implementing multi-layer validation across native, Kotlin, and UI layers

User-Facing Changes

  • Embedding models are now rejected at load time with a clear error message: "The selected model is an embedding model designed for semantic search and similarity tasks. It cannot be used for chat or text generation. Please select a chat/instruct model instead"
  • Model selection bottom sheet now remains expanded to display error messages or status updates after file selection
  • Added file format validation that rejects non-GGUF formats (ONNX, PyTorch, TensorFlow, etc.) with conversion guidance

Technical Changes

Native Layer (C++)

  • Added GGUF magic number validation in model loading to reject non-GGUF files early
  • Added pooling type validation in context creation to detect and reject embedding models
  • Improved text generation initialization with stricter safety checks for context validation and buffer validation
  • Added detailed error handling for prompt evaluation and text generation stepping
  • New JNI functions: get_pooling_type() and get_model_desc() for Kotlin-level model validation

Kotlin Layer

  • Enhanced LLamaAndroid.kt with model metadata fetching via native get_model_desc() call
  • Added pooling type detection and validation to catch embedding models before initialization
  • Implemented dry-run validation using completion_init() to verify model compatibility
  • Improved error messages to distinguish between embedding model failures and other incompatibilities

UI Layer

  • Updated AiSettingsViewModel with comprehensive exception handling (catches IllegalArgumentException for validation failures and other exceptions for generic failures)
  • Improved ModelLoadingState.Error display for user-friendly error reporting

Code Quality

  • Extracted magic strings to named constants (EXT_* and KEYWORD_*) for better maintainability
  • Removed inline comment clutter

Infrastructure

  • Rebuilt llama.cpp AAR with updated native code (version 8)
  • Updated LLAMA_LIB_VERSION from 5 to 8 to force AAR asset refresh and re-optimization

⚠️ Risks and Best Practices Considerations

  • Breaking Change: Models previously loadable but identified as embedding models will now be rejected. Users with saved embedding model configurations will need to select different models.

  • Version Bump Impact: Increasing LLAMA_LIB_VERSION from 5 to 8 forces a rebuild and re-optimization cycle. Any mismatch in version detection could cause unexpected behavior.

  • Fragile Error Detection: The Kotlin layer detects embedding model failures partly by matching error message content from native exceptions, which is prone to breaking if error messages change.

  • Pooling Type Validation Accuracy: The fix relies on pooling type detection to identify embedding models. Incorrect detection could either falsely reject valid chat models or allow embedding models to proceed, partially defeating the fix.

  • Performance Impact: The dry-run completion_init() call during model loading adds validation overhead to every model load operation. Monitor for potential user experience delays during model selection.

  • Resource Management: Native resource cleanup (freeing context, batch, sampler) when rejecting models must be guaranteed to prevent memory leaks in edge cases.

Walkthrough

This PR adds end-to-end GGUF-only and embedding-model rejection across the JNI/C++ layer (llama-android.cpp), the Kotlin native bridge (LLamaAndroid), the engine repository (LlmInferenceEngine), and the ViewModel (AiSettingsViewModel). It also adds auto-expansion of the agent bottom sheet after model load in AiSettingsFragment and bumps the native library version constant.

Changes

Model Format Validation and Embedding Rejection

Layer / File(s) Summary
Native model load, context creation, and JNI utilities
llama-impl/src/main/cpp/llama-android.cpp
Adds GGUF magic-number validation before llama_model_load_from_file, clamps n_ctx to training context, rejects embedding models immediately after context creation, and exports get_pooling_type and get_model_desc JNI functions.
Native completion_init and completion_loop safety checks
llama-impl/src/main/cpp/llama-android.cpp
Adds pooling-type checks, input/batch validation, tokenization error handling, and detailed decode-failure exceptions in completion_init; adds batch-state validation and decode-failure early stop in completion_loop.
Kotlin native bridge load() and send() enhancements
llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt
Declares get_pooling_type and get_model_desc native bindings; enhances load() with model-description logging, pooling-type rejection, and a dry-run completion_init validation step; reworks send() with pooling-type pre-checks, context-window validation, empty-output detection, and iteration limiting.
Engine format constants, validateModelFormat, and loadModelFromUri
app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt
Adds private const val entries for file extensions and keyword substrings; reworks validateModelFormat to throw IllegalArgumentException for non-GGUF formats and warn on embedding-model filenames; moves URI parsing before the try block; updates exception handling; refactors detectModelFamily to use the new constants.
ViewModel exception propagation
app/src/main/java/com/itsaky/androidide/agent/viewmodel/AiSettingsViewModel.kt
Wraps initModelFromFile in try/catch, separately handling IllegalArgumentException (validation errors) and generic Exception (unexpected errors), and removes stale comment blocks.
UI auto-expand and library version bump
app/src/main/java/com/itsaky/androidide/agent/fragments/AiSettingsFragment.kt, app/src/main/java/com/itsaky/androidide/utils/DynamicLibraryLoader.kt
Schedules postDelayed bottom-sheet expansion and agent-tab selection after both file-picker and saved-model load paths; bumps LLAMA_LIB_VERSION from 5 to 8.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant AiSettingsFragment
  participant AiSettingsViewModel
  participant LlmInferenceEngine
  participant LLamaAndroid
  participant NativeJNI as llama-android.cpp

  User->>AiSettingsFragment: picks model file (URI)
  AiSettingsFragment->>AiSettingsViewModel: loadModelFromUri(uri)
  AiSettingsViewModel->>LlmInferenceEngine: loadModelFromUri(uri)
  LlmInferenceEngine->>LlmInferenceEngine: validateModelFormat(filename)
  alt non-GGUF or embedding keyword
    LlmInferenceEngine-->>AiSettingsViewModel: throw IllegalArgumentException
    AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Error(message)
  else valid
    LlmInferenceEngine->>LLamaAndroid: initModelFromFile(path)
    LLamaAndroid->>NativeJNI: load_model → validate GGUF magic
    LLamaAndroid->>NativeJNI: new_context → check pooling type
    LLamaAndroid->>NativeJNI: completion_init (dry run)
    alt embedding or incompatible
      NativeJNI-->>LLamaAndroid: throw IllegalStateException
      LLamaAndroid-->>LlmInferenceEngine: throw IllegalStateException
      LlmInferenceEngine-->>AiSettingsViewModel: throw IllegalArgumentException
      AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Error(message)
    else success
      LLamaAndroid-->>LlmInferenceEngine: loadedModelName
      LlmInferenceEngine-->>AiSettingsViewModel: ModelLoadingState.Loaded
      AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Loaded
      AiSettingsFragment->>AiSettingsFragment: postDelayed → expand bottom sheet, switch to TAB_AGENT
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • itsaky-adfa
  • jomen-adfa

Poem

🐇 Hop hop, I check the magic bytes,
No ONNX sneaks past my sights!
Only .gguf may load and run,
Embeddings? Rejected, every one.
The sheet expands when models shine—
A bunny's validation, oh so fine! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ADFA-4388 | embedding model crash' directly reflects the main change: fixing a crash caused by embedding models being used for chat operations.
Description check ✅ Passed The description comprehensively details the crash scenario, the fix implementation across multiple layers, and the improved user experience with clear error messaging.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ADFA-4388-embedding-model-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt`:
- Around line 343-344: The error message check in LlmInferenceEngine.kt at the
if statement checking for "embedding model" is case-sensitive and will miss
error messages with different casing like "Embedding model". Make the message
comparison case-insensitive by converting the error message to lowercase before
performing the contains check, or by using a case-insensitive comparison method
that ignores the case parameter. This ensures that all variations of the
embedding model error message are properly detected and the user receives
appropriate guidance.

In `@llama-impl/src/main/cpp/llama-android.cpp`:
- Around line 814-821: The validation check for `batch->n_tokens < 0` in the
llama-android.cpp decode function is unreachable because it is placed after a
code path that already requires `batch->n_tokens > 0`, meaning negative values
will never reach this validation block. Move the negative `n_tokens` validation
earlier in the function, before any conditional logic that assumes a positive
token count, to ensure the check is actually executed when the batch object has
been corrupted with a negative count value.
- Around line 993-997: The batch null-check block containing the LOGe statement
and return nullptr is positioned too late in the function. Move this entire
safety check block to the very beginning of the function implementation, before
any code that dereferences or uses the batch parameter, to ensure the
null/invalid batch is caught before any potential crash occurs from
dereferencing batch in subsequent operations.

In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt`:
- Around line 276-291: The broad catch block around the context validation logic
is catching the intentionally-thrown IllegalStateException about message length
being too long and re-wrapping it with a generic "Failed to validate message
length" error, which loses the specific user-facing error detail. Modify the
exception handling to either rethrow the IllegalStateException that is
explicitly thrown with the "Message is too long for the model's context window"
message, or restructure the try-catch to only catch exceptions from specific
operations like tokenize() and model_n_ctx() calls rather than catching all
exceptions after the explicit throw statement.
- Around line 192-197: The new_batch() and new_sampler() calls in the
initialization sequence do not properly clean up previously allocated native
resources when allocation fails. If new_sampler() fails after new_batch()
succeeds, the batch handle is leaked. Wrap these allocation calls with proper
resource cleanup by implementing a try-catch block that ensures all allocated
native resources (batch, sampler, model, and context handles) are freed before
re-throwing the exception when either new_batch() or new_sampler() fails. Use
the corresponding free functions to release the handles in the correct order.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78d630fd-79f4-49b1-81c7-a75f68d14029

📥 Commits

Reviewing files that changed from the base of the PR and between 7acac35 and 8200a6b.

📒 Files selected for processing (6)
  • app/src/main/java/com/itsaky/androidide/agent/fragments/AiSettingsFragment.kt
  • app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt
  • app/src/main/java/com/itsaky/androidide/agent/viewmodel/AiSettingsViewModel.kt
  • app/src/main/java/com/itsaky/androidide/utils/DynamicLibraryLoader.kt
  • llama-impl/src/main/cpp/llama-android.cpp
  • llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt

Comment on lines +343 to +344
if (e.message?.contains("embedding model") == true) {
log.error("Cannot use embedding model for chat: {}", displayName, e)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make embedding-error detection case-insensitive.

The current message check can miss variants like Embedding model, which bypasses the intended IllegalArgumentException mapping and user guidance.

Suggested fix
-            if (e.message?.contains("embedding model") == true) {
+            if (e.message?.contains("embedding model", ignoreCase = true) == true) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (e.message?.contains("embedding model") == true) {
log.error("Cannot use embedding model for chat: {}", displayName, e)
if (e.message?.contains("embedding model", ignoreCase = true) == true) {
log.error("Cannot use embedding model for chat: {}", displayName, e)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt`
around lines 343 - 344, The error message check in LlmInferenceEngine.kt at the
if statement checking for "embedding model" is case-sensitive and will miss
error messages with different casing like "Embedding model". Make the message
comparison case-insensitive by converting the error message to lowercase before
performing the contains check, or by using a case-insensitive comparison method
that ignores the case parameter. This ensures that all variations of the
embedding model error message are properly detected and the user receives
appropriate guidance.

Comment on lines +814 to +821
// Validate batch before decode
if (batch->n_tokens < 0) {
LOGe("Invalid batch token count: %d", batch->n_tokens);
env->ReleaseStringUTFChars(jtext, text);
env->ThrowNew(env->FindClass("java/lang/IllegalStateException"),
"Batch state corrupted. Token count is negative.");
return 0;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The negative n_tokens validation is unreachable in its current placement.

batch->n_tokens < 0 is checked inside a branch that already requires batch->n_tokens > 0, so corrupted negative counts bypass the intended error path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@llama-impl/src/main/cpp/llama-android.cpp` around lines 814 - 821, The
validation check for `batch->n_tokens < 0` in the llama-android.cpp decode
function is unreachable because it is placed after a code path that already
requires `batch->n_tokens > 0`, meaning negative values will never reach this
validation block. Move the negative `n_tokens` validation earlier in the
function, before any conditional logic that assumes a positive token count, to
ensure the check is actually executed when the batch object has been corrupted
with a negative count value.

Comment on lines +993 to +997
// Safety check before decode
if (!batch || batch->n_tokens <= 0) {
LOGe("Invalid batch state before decode: n_tokens=%d", batch ? batch->n_tokens : -1);
return nullptr;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move the batch null-check before any batch dereference.

The new safety guard runs after batch has already been dereferenced, so a null/invalid batch can still crash before this check executes.

Suggested fix
-    common_batch_clear(*batch);
-    common_batch_add(*batch, new_token_id, n_cur, {0}, true);
+    if (!batch) {
+        LOGe("Invalid batch pointer before decode");
+        return nullptr;
+    }
+
+    common_batch_clear(*batch);
+    common_batch_add(*batch, new_token_id, n_cur, {0}, true);

-    // Safety check before decode
-    if (!batch || batch->n_tokens <= 0) {
+    // Safety check before decode
+    if (batch->n_tokens <= 0) {
         LOGe("Invalid batch state before decode: n_tokens=%d", batch ? batch->n_tokens : -1);
         return nullptr;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@llama-impl/src/main/cpp/llama-android.cpp` around lines 993 - 997, The batch
null-check block containing the LOGe statement and return nullptr is positioned
too late in the function. Move this entire safety check block to the very
beginning of the function implementation, before any code that dereferences or
uses the batch parameter, to ensure the null/invalid batch is caught before any
potential crash occurs from dereferencing batch in subsequent operations.

Comment on lines 192 to 197
val batch = new_batch(2048, 0, 1)
if (batch == 0L) throw IllegalStateException("new_batch() failed")

val sampler = new_sampler()
if (sampler == 0L) throw IllegalStateException("new_sampler() failed")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release native resources when batch/sampler allocation fails.

If new_batch() or new_sampler() fails, already-allocated native handles are not freed, which leaks model/context (and batch for sampler failure).

Suggested fix
                     val batch = new_batch(2048, 0, 1)
-                    if (batch == 0L) throw IllegalStateException("new_batch() failed")
+                    if (batch == 0L) {
+                        free_context(context)
+                        free_model(model)
+                        throw IllegalStateException("new_batch() failed")
+                    }

                     val sampler = new_sampler()
-                    if (sampler == 0L) throw IllegalStateException("new_sampler() failed")
+                    if (sampler == 0L) {
+                        free_batch(batch)
+                        free_context(context)
+                        free_model(model)
+                        throw IllegalStateException("new_sampler() failed")
+                    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val batch = new_batch(2048, 0, 1)
if (batch == 0L) throw IllegalStateException("new_batch() failed")
val sampler = new_sampler()
if (sampler == 0L) throw IllegalStateException("new_sampler() failed")
val batch = new_batch(2048, 0, 1)
if (batch == 0L) {
free_context(context)
free_model(model)
throw IllegalStateException("new_batch() failed")
}
val sampler = new_sampler()
if (sampler == 0L) {
free_batch(batch)
free_context(context)
free_model(model)
throw IllegalStateException("new_sampler() failed")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt` around lines 192
- 197, The new_batch() and new_sampler() calls in the initialization sequence do
not properly clean up previously allocated native resources when allocation
fails. If new_sampler() fails after new_batch() succeeds, the batch handle is
leaked. Wrap these allocation calls with proper resource cleanup by implementing
a try-catch block that ensures all allocated native resources (batch, sampler,
model, and context handles) are freed before re-throwing the exception when
either new_batch() or new_sampler() fails. Use the corresponding free functions
to release the handles in the correct order.

Comment on lines +276 to +291
try {
val contextSize = model_n_ctx(state.context)
val tokenCount = tokenize(state.context, message, true).size
log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen)

if (tokenCount + nlen > contextSize) {
log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize)
throw IllegalStateException(
"Message is too long for the model's context window. " +
"Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens."
)
}
} catch (e: Exception) {
log.error("Failed to validate context size", e)
throw IllegalStateException("Failed to validate message length: ${e.message}", e)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve the explicit context-window error instead of re-wrapping it.

The broad catch currently wraps the intentionally-thrown “message too long for context” error into a generic validation failure, which reduces user-facing precision.

Suggested fix
                 } catch (e: Exception) {
+                } catch (e: IllegalStateException) {
+                    throw e
+                } catch (e: Exception) {
                     log.error("Failed to validate context size", e)
                     throw IllegalStateException("Failed to validate message length: ${e.message}", e)
                 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
val contextSize = model_n_ctx(state.context)
val tokenCount = tokenize(state.context, message, true).size
log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen)
if (tokenCount + nlen > contextSize) {
log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize)
throw IllegalStateException(
"Message is too long for the model's context window. " +
"Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens."
)
}
} catch (e: Exception) {
log.error("Failed to validate context size", e)
throw IllegalStateException("Failed to validate message length: ${e.message}", e)
}
try {
val contextSize = model_n_ctx(state.context)
val tokenCount = tokenize(state.context, message, true).size
log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen)
if (tokenCount + nlen > contextSize) {
log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize)
throw IllegalStateException(
"Message is too long for the model's context window. " +
"Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens."
)
}
} catch (e: IllegalStateException) {
throw e
} catch (e: Exception) {
log.error("Failed to validate context size", e)
throw IllegalStateException("Failed to validate message length: ${e.message}", e)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt` around lines 276
- 291, The broad catch block around the context validation logic is catching the
intentionally-thrown IllegalStateException about message length being too long
and re-wrapping it with a generic "Failed to validate message length" error,
which loses the specific user-facing error detail. Modify the exception handling
to either rethrow the IllegalStateException that is explicitly thrown with the
"Message is too long for the model's context window" message, or restructure the
try-catch to only catch exceptions from specific operations like tokenize() and
model_n_ctx() calls rather than catching all exceptions after the explicit throw
statement.

} catch (e: IllegalStateException) {
if (e.message?.contains("embedding model") == true) {
log.error("Cannot use embedding model for chat: {}", displayName, e)
throw IllegalArgumentException(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught exception reintroduces the crash this PR fixes (on other load paths).

After this change initModelFromFile now throws IllegalArgumentException — both for the embedding-model path here and for validateModelFormat rejections (rethrown at line 358). Previously it caught everything and returned false.

Only AiSettingsViewModel was updated to catch it. The other three callers still treat the result as a plain Boolean and have no try/catch, so the exception escapes into their coroutines:

  • agent/.../ChatViewModel.kt:466 (autoLoadLocalModelIfNeeded)
  • app/.../ChatViewModel.kt:161 (getOrCreateRepository)
  • LocalLlmRepositoryImpl.kt:73 (loadModel)

Scenario: a saved local-model path that resolves to an embedding/unsupported model (e.g. saved by a pre-fix build, or a .gguf embedding model) → on startup/chat-open the auto-load throws an uncaught IllegalArgumentException → app crash. That is exactly the ADFA-4388 crash class, just relocated to the auto-load path.

Fix: either wrap those three call sites in try/catch and surface the message, or have initModelFromFile return a sealed result (Loaded/Rejected(reason)/Failed) instead of throwing, so the contract is uniform for all callers.


log.info("Model validation passed - {} tokens processed", testResult)

kv_cache_clear(context)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry-run leaves native KV-reuse state inconsistent → can corrupt the first real message.

The dry-run completion_init("Hi", …) populates the native g_cached_tokens prefix (set in llama-android.cpp ~line 793). kv_cache_clear(context) here clears the llama KV cache but does not reset g_cached_tokens, breaking the invariant that g_cached_tokens reflects what is actually resident in the KV cache.

On the first real user message, completion_init’s reuse path (llama-android.cpp:769-808, g_kv_cache_reuse defaults true) sees g_cached_tokens non-empty. If the message tokenizes to a sequence sharing the full "Hi" prefix (realistic with formatChat=false — e.g. user types "Hi"/"Hi there", both start BOS,Hi), it sets reuse=true and skips decoding those prefix positions (lines 804-807), assuming they are in the KV cache — but they were just cleared. Decode then attends over missing KV entries → corrupted/garbage first response.

Fix options:

  • After the dry run, clear the cached-token bookkeeping too (add a JNI call that does g_cached_tokens.clear() under g_globals_mutex, not just kv_cache_clear), or
  • Better, drop the dry-run entirely — new_context already rejects embedding models authoritatively via pooling_type (llama-android.cpp ~line 369), so the dry run validates nothing new while adding this state hazard plus a full decode to every load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants