ADFA-4388 | embedding model crash by jatezzz · Pull Request #1429 · appdevforall/CodeOnTheGo

jatezzz · 2026-06-19T18:52:32Z

Description

Crash Before Fix:
FATAL EXCEPTION: Llm-RunLoop
SIGABRT
decode: cannot decode batches with this context (calling encode() instead)

Error Message After Fix:
The selected model 'all-MiniLM-L6-v2-ggml-model-f16.gguf' is an embedding model
designed for semantic search and similarity tasks. It cannot be used for chat or
text generation.

Please select a chat/instruct model instead (e.g., models with 'chat', 'instruct', 'conversational' in their name).

Details

telegram-cloud-document-1-5022011912393590501.mp4

Ticket

ADFA-4388

Observation

This fix was developed based on consistent crash reproduction with embedding models (e.g., all-MiniLM-L6-v2). The crash occurred 100% of the time when users selected
embedding models for chat, resulting in SIGABRT in the Llm-RunLoop thread.

The multi-layer approach ensures that:

Native layer catches the issue first (pooling type check)
Kotlin layer provides fallback validation
UI layer gracefully handles exceptions and displays helpful guidance

The app now correctly identifies incompatible model types and guides users to select appropriate chat/instruct models, eliminating the crash entirely.

Added multi-layer protection to detect and reject embedding models: **Native Layer (C++):** - Check pooling_type in new_context() - reject if not LLAMA_POOLING_TYPE_NONE - Added get_pooling_type() JNI function for Kotlin validation - Clear error messages explaining embedding vs generative models **Kotlin Layer:** - Validate model during load() in LLamaAndroid.kt - Catch IllegalStateException and wrap with user-friendly message - File format validation for ONNX, PyTorch, TensorFlow, etc. **UI Layer:** - Proper exception handling in AiSettingsViewModel - Display error in ModelLoadingState.Error instead of crashing - Keep bottom sheet expanded after file picker to show error/status **Infrastructure:** - Rebuilt llama.cpp AAR with updated native code (v8) - Updated LLAMA_LIB_VERSION to 8 in DynamicLibraryLoader The app now gracefully handles embedding models with clear error messages instead of crashing with SIGABRT. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ing-model-crash

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Removed inline comments explaining bottom sheet behavior - Extracted file extension strings to named constants (EXT_*) - Extracted keyword strings to named constants (KEYWORD_*) - Improved code maintainability and reduced duplication Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-06-19T18:59:26Z

📝 Walkthrough

Release Notes

Critical Bug Fix

Fixed SIGABRT crash when users selected embedding models (e.g., all-MiniLM-L6-v2) for chat operations by implementing multi-layer validation across native, Kotlin, and UI layers

User-Facing Changes

Embedding models are now rejected at load time with a clear error message: "The selected model is an embedding model designed for semantic search and similarity tasks. It cannot be used for chat or text generation. Please select a chat/instruct model instead"
Model selection bottom sheet now remains expanded to display error messages or status updates after file selection
Added file format validation that rejects non-GGUF formats (ONNX, PyTorch, TensorFlow, etc.) with conversion guidance

Technical Changes

Native Layer (C++)

Added GGUF magic number validation in model loading to reject non-GGUF files early
Added pooling type validation in context creation to detect and reject embedding models
Improved text generation initialization with stricter safety checks for context validation and buffer validation
Added detailed error handling for prompt evaluation and text generation stepping
New JNI functions: get_pooling_type() and get_model_desc() for Kotlin-level model validation

Kotlin Layer

Enhanced LLamaAndroid.kt with model metadata fetching via native get_model_desc() call
Added pooling type detection and validation to catch embedding models before initialization
Implemented dry-run validation using completion_init() to verify model compatibility
Improved error messages to distinguish between embedding model failures and other incompatibilities

UI Layer

Updated AiSettingsViewModel with comprehensive exception handling (catches IllegalArgumentException for validation failures and other exceptions for generic failures)
Improved ModelLoadingState.Error display for user-friendly error reporting

Code Quality

Extracted magic strings to named constants (EXT_* and KEYWORD_*) for better maintainability
Removed inline comment clutter

Infrastructure

Rebuilt llama.cpp AAR with updated native code (version 8)
Updated LLAMA_LIB_VERSION from 5 to 8 to force AAR asset refresh and re-optimization

⚠️ Risks and Best Practices Considerations

Breaking Change: Models previously loadable but identified as embedding models will now be rejected. Users with saved embedding model configurations will need to select different models.
Version Bump Impact: Increasing LLAMA_LIB_VERSION from 5 to 8 forces a rebuild and re-optimization cycle. Any mismatch in version detection could cause unexpected behavior.
Fragile Error Detection: The Kotlin layer detects embedding model failures partly by matching error message content from native exceptions, which is prone to breaking if error messages change.
Pooling Type Validation Accuracy: The fix relies on pooling type detection to identify embedding models. Incorrect detection could either falsely reject valid chat models or allow embedding models to proceed, partially defeating the fix.
Performance Impact: The dry-run completion_init() call during model loading adds validation overhead to every model load operation. Monitor for potential user experience delays during model selection.
Resource Management: Native resource cleanup (freeing context, batch, sampler) when rejecting models must be guaranteed to prevent memory leaks in edge cases.

Walkthrough

This PR adds end-to-end GGUF-only and embedding-model rejection across the JNI/C++ layer (llama-android.cpp), the Kotlin native bridge (LLamaAndroid), the engine repository (LlmInferenceEngine), and the ViewModel (AiSettingsViewModel). It also adds auto-expansion of the agent bottom sheet after model load in AiSettingsFragment and bumps the native library version constant.

Changes

Model Format Validation and Embedding Rejection

Layer / File(s)	Summary
Native model load, context creation, and JNI utilities `llama-impl/src/main/cpp/llama-android.cpp`	Adds GGUF magic-number validation before `llama_model_load_from_file`, clamps `n_ctx` to training context, rejects embedding models immediately after context creation, and exports `get_pooling_type` and `get_model_desc` JNI functions.
Native completion_init and completion_loop safety checks `llama-impl/src/main/cpp/llama-android.cpp`	Adds pooling-type checks, input/batch validation, tokenization error handling, and detailed decode-failure exceptions in `completion_init`; adds batch-state validation and decode-failure early stop in `completion_loop`.
Kotlin native bridge load() and send() enhancements `llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt`	Declares `get_pooling_type` and `get_model_desc` native bindings; enhances `load()` with model-description logging, pooling-type rejection, and a dry-run `completion_init` validation step; reworks `send()` with pooling-type pre-checks, context-window validation, empty-output detection, and iteration limiting.
Engine format constants, validateModelFormat, and loadModelFromUri `app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt`	Adds `private const val` entries for file extensions and keyword substrings; reworks `validateModelFormat` to throw `IllegalArgumentException` for non-GGUF formats and warn on embedding-model filenames; moves URI parsing before the try block; updates exception handling; refactors `detectModelFamily` to use the new constants.
ViewModel exception propagation `app/src/main/java/com/itsaky/androidide/agent/viewmodel/AiSettingsViewModel.kt`	Wraps `initModelFromFile` in try/catch, separately handling `IllegalArgumentException` (validation errors) and generic `Exception` (unexpected errors), and removes stale comment blocks.
UI auto-expand and library version bump `app/src/main/java/com/itsaky/androidide/agent/fragments/AiSettingsFragment.kt`, `app/src/main/java/com/itsaky/androidide/utils/DynamicLibraryLoader.kt`	Schedules `postDelayed` bottom-sheet expansion and agent-tab selection after both file-picker and saved-model load paths; bumps `LLAMA_LIB_VERSION` from `5` to `8`.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant AiSettingsFragment
  participant AiSettingsViewModel
  participant LlmInferenceEngine
  participant LLamaAndroid
  participant NativeJNI as llama-android.cpp

  User->>AiSettingsFragment: picks model file (URI)
  AiSettingsFragment->>AiSettingsViewModel: loadModelFromUri(uri)
  AiSettingsViewModel->>LlmInferenceEngine: loadModelFromUri(uri)
  LlmInferenceEngine->>LlmInferenceEngine: validateModelFormat(filename)
  alt non-GGUF or embedding keyword
    LlmInferenceEngine-->>AiSettingsViewModel: throw IllegalArgumentException
    AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Error(message)
  else valid
    LlmInferenceEngine->>LLamaAndroid: initModelFromFile(path)
    LLamaAndroid->>NativeJNI: load_model → validate GGUF magic
    LLamaAndroid->>NativeJNI: new_context → check pooling type
    LLamaAndroid->>NativeJNI: completion_init (dry run)
    alt embedding or incompatible
      NativeJNI-->>LLamaAndroid: throw IllegalStateException
      LLamaAndroid-->>LlmInferenceEngine: throw IllegalStateException
      LlmInferenceEngine-->>AiSettingsViewModel: throw IllegalArgumentException
      AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Error(message)
    else success
      LLamaAndroid-->>LlmInferenceEngine: loadedModelName
      LlmInferenceEngine-->>AiSettingsViewModel: ModelLoadingState.Loaded
      AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Loaded
      AiSettingsFragment->>AiSettingsFragment: postDelayed → expand bottom sheet, switch to TAB_AGENT
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

itsaky-adfa
jomen-adfa

Poem

🐇 Hop hop, I check the magic bytes,
No ONNX sneaks past my sights!
Only .gguf may load and run,
Embeddings? Rejected, every one.
The sheet expands when models shine—
A bunny's validation, oh so fine! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'ADFA-4388 \| embedding model crash' directly reflects the main change: fixing a crash caused by embedding models being used for chat operations.
Description check	✅ Passed	The description comprehensively details the crash scenario, the fix implementation across multiple layers, and the improved user experience with clear error messaging.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ADFA-4388-embedding-model-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt`:
- Around line 343-344: The error message check in LlmInferenceEngine.kt at the
if statement checking for "embedding model" is case-sensitive and will miss
error messages with different casing like "Embedding model". Make the message
comparison case-insensitive by converting the error message to lowercase before
performing the contains check, or by using a case-insensitive comparison method
that ignores the case parameter. This ensures that all variations of the
embedding model error message are properly detected and the user receives
appropriate guidance.

In `@llama-impl/src/main/cpp/llama-android.cpp`:
- Around line 814-821: The validation check for `batch->n_tokens < 0` in the
llama-android.cpp decode function is unreachable because it is placed after a
code path that already requires `batch->n_tokens > 0`, meaning negative values
will never reach this validation block. Move the negative `n_tokens` validation
earlier in the function, before any conditional logic that assumes a positive
token count, to ensure the check is actually executed when the batch object has
been corrupted with a negative count value.
- Around line 993-997: The batch null-check block containing the LOGe statement
and return nullptr is positioned too late in the function. Move this entire
safety check block to the very beginning of the function implementation, before
any code that dereferences or uses the batch parameter, to ensure the
null/invalid batch is caught before any potential crash occurs from
dereferencing batch in subsequent operations.

In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt`:
- Around line 276-291: The broad catch block around the context validation logic
is catching the intentionally-thrown IllegalStateException about message length
being too long and re-wrapping it with a generic "Failed to validate message
length" error, which loses the specific user-facing error detail. Modify the
exception handling to either rethrow the IllegalStateException that is
explicitly thrown with the "Message is too long for the model's context window"
message, or restructure the try-catch to only catch exceptions from specific
operations like tokenize() and model_n_ctx() calls rather than catching all
exceptions after the explicit throw statement.
- Around line 192-197: The new_batch() and new_sampler() calls in the
initialization sequence do not properly clean up previously allocated native
resources when allocation fails. If new_sampler() fails after new_batch()
succeeds, the batch handle is leaked. Wrap these allocation calls with proper
resource cleanup by implementing a try-catch block that ensures all allocated
native resources (batch, sampler, model, and context handles) are freed before
re-throwing the exception when either new_batch() or new_sampler() fails. Use
the corresponding free functions to release the handles in the correct order.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78d630fd-79f4-49b1-81c7-a75f68d14029

📥 Commits

Reviewing files that changed from the base of the PR and between 7acac35 and 8200a6b.

📒 Files selected for processing (6)

app/src/main/java/com/itsaky/androidide/agent/fragments/AiSettingsFragment.kt
app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt
app/src/main/java/com/itsaky/androidide/agent/viewmodel/AiSettingsViewModel.kt
app/src/main/java/com/itsaky/androidide/utils/DynamicLibraryLoader.kt
llama-impl/src/main/cpp/llama-android.cpp
llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt

coderabbitai · 2026-06-19T18:59:30Z

+            if (e.message?.contains("embedding model") == true) {
+                log.error("Cannot use embedding model for chat: {}", displayName, e)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make embedding-error detection case-insensitive.

The current message check can miss variants like Embedding model, which bypasses the intended IllegalArgumentException mapping and user guidance.

Suggested fix

- if (e.message?.contains("embedding model") == true) { + if (e.message?.contains("embedding model", ignoreCase = true) == true) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (e.message?.contains("embedding model") == true) {

log.error("Cannot use embedding model for chat: {}", displayName, e)

if (e.message?.contains("embedding model", ignoreCase = true) == true) {

log.error("Cannot use embedding model for chat: {}", displayName, e)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt` around lines 343 - 344, The error message check in LlmInferenceEngine.kt at the if statement checking for "embedding model" is case-sensitive and will miss error messages with different casing like "Embedding model". Make the message comparison case-insensitive by converting the error message to lowercase before performing the contains check, or by using a case-insensitive comparison method that ignores the case parameter. This ensures that all variations of the embedding model error message are properly detected and the user receives appropriate guidance.

coderabbitai · 2026-06-19T18:59:31Z

+        // Validate batch before decode
+        if (batch->n_tokens < 0) {
+            LOGe("Invalid batch token count: %d", batch->n_tokens);
+            env->ReleaseStringUTFChars(jtext, text);
+            env->ThrowNew(env->FindClass("java/lang/IllegalStateException"),
+                          "Batch state corrupted. Token count is negative.");
+            return 0;
+        }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The negative n_tokens validation is unreachable in its current placement.

batch->n_tokens < 0 is checked inside a branch that already requires batch->n_tokens > 0, so corrupted negative counts bypass the intended error path.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@llama-impl/src/main/cpp/llama-android.cpp` around lines 814 - 821, The validation check for `batch->n_tokens < 0` in the llama-android.cpp decode function is unreachable because it is placed after a code path that already requires `batch->n_tokens > 0`, meaning negative values will never reach this validation block. Move the negative `n_tokens` validation earlier in the function, before any conditional logic that assumes a positive token count, to ensure the check is actually executed when the batch object has been corrupted with a negative count value.

coderabbitai · 2026-06-19T18:59:31Z

+    // Safety check before decode
+    if (!batch || batch->n_tokens <= 0) {
+        LOGe("Invalid batch state before decode: n_tokens=%d", batch ? batch->n_tokens : -1);
+        return nullptr;
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move the batch null-check before any batch dereference.

The new safety guard runs after batch has already been dereferenced, so a null/invalid batch can still crash before this check executes.

Suggested fix

- common_batch_clear(*batch); - common_batch_add(*batch, new_token_id, n_cur, {0}, true); + if (!batch) { + LOGe("Invalid batch pointer before decode"); + return nullptr; + } + + common_batch_clear(*batch); + common_batch_add(*batch, new_token_id, n_cur, {0}, true); - // Safety check before decode - if (!batch || batch->n_tokens <= 0) { + // Safety check before decode + if (batch->n_tokens <= 0) { LOGe("Invalid batch state before decode: n_tokens=%d", batch ? batch->n_tokens : -1); return nullptr; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@llama-impl/src/main/cpp/llama-android.cpp` around lines 993 - 997, The batch null-check block containing the LOGe statement and return nullptr is positioned too late in the function. Move this entire safety check block to the very beginning of the function implementation, before any code that dereferences or uses the batch parameter, to ensure the null/invalid batch is caught before any potential crash occurs from dereferencing batch in subsequent operations.

coderabbitai · 2026-06-19T18:59:31Z

                    val batch = new_batch(2048, 0, 1)
                    if (batch == 0L) throw IllegalStateException("new_batch() failed")

                    val sampler = new_sampler()
                    if (sampler == 0L) throw IllegalStateException("new_sampler() failed")



⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release native resources when batch/sampler allocation fails.

If new_batch() or new_sampler() fails, already-allocated native handles are not freed, which leaks model/context (and batch for sampler failure).

Suggested fix

val batch = new_batch(2048, 0, 1) - if (batch == 0L) throw IllegalStateException("new_batch() failed") + if (batch == 0L) { + free_context(context) + free_model(model) + throw IllegalStateException("new_batch() failed") + } val sampler = new_sampler() - if (sampler == 0L) throw IllegalStateException("new_sampler() failed") + if (sampler == 0L) { + free_batch(batch) + free_context(context) + free_model(model) + throw IllegalStateException("new_sampler() failed") + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val batch = new_batch(2048, 0, 1)

if (batch == 0L) throw IllegalStateException("new_batch() failed")

val sampler = new_sampler()

if (sampler == 0L) throw IllegalStateException("new_sampler() failed")

val batch = new_batch(2048, 0, 1)

if (batch == 0L) {

free_context(context)

free_model(model)

throw IllegalStateException("new_batch() failed")

}

val sampler = new_sampler()

if (sampler == 0L) {

free_batch(batch)

free_context(context)

free_model(model)

throw IllegalStateException("new_sampler() failed")

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt` around lines 192 - 197, The new_batch() and new_sampler() calls in the initialization sequence do not properly clean up previously allocated native resources when allocation fails. If new_sampler() fails after new_batch() succeeds, the batch handle is leaked. Wrap these allocation calls with proper resource cleanup by implementing a try-catch block that ensures all allocated native resources (batch, sampler, model, and context handles) are freed before re-throwing the exception when either new_batch() or new_sampler() fails. Use the corresponding free functions to release the handles in the correct order.

coderabbitai · 2026-06-19T18:59:31Z

+                try {
+                    val contextSize = model_n_ctx(state.context)
+                    val tokenCount = tokenize(state.context, message, true).size
+                    log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen)
+
+                    if (tokenCount + nlen > contextSize) {
+                        log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize)
+                        throw IllegalStateException(
+                            "Message is too long for the model's context window. " +
+                            "Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens."
+                        )
+                    }
+                } catch (e: Exception) {
+                    log.error("Failed to validate context size", e)
+                    throw IllegalStateException("Failed to validate message length: ${e.message}", e)
+                }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve the explicit context-window error instead of re-wrapping it.

The broad catch currently wraps the intentionally-thrown “message too long for context” error into a generic validation failure, which reduces user-facing precision.

Suggested fix

} catch (e: Exception) { + } catch (e: IllegalStateException) { + throw e + } catch (e: Exception) { log.error("Failed to validate context size", e) throw IllegalStateException("Failed to validate message length: ${e.message}", e) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try {

val contextSize = model_n_ctx(state.context)

val tokenCount = tokenize(state.context, message, true).size

log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen)

if (tokenCount + nlen > contextSize) {

log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize)

throw IllegalStateException(

"Message is too long for the model's context window. " +

"Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens."

)

}

} catch (e: Exception) {

log.error("Failed to validate context size", e)

throw IllegalStateException("Failed to validate message length: ${e.message}", e)

}

try {

val contextSize = model_n_ctx(state.context)

val tokenCount = tokenize(state.context, message, true).size

log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen)

if (tokenCount + nlen > contextSize) {

log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize)

throw IllegalStateException(

"Message is too long for the model's context window. " +

"Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens."

)

}

} catch (e: IllegalStateException) {

throw e

} catch (e: Exception) {

log.error("Failed to validate context size", e)

throw IllegalStateException("Failed to validate message length: ${e.message}", e)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt` around lines 276 - 291, The broad catch block around the context validation logic is catching the intentionally-thrown IllegalStateException about message length being too long and re-wrapping it with a generic "Failed to validate message length" error, which loses the specific user-facing error detail. Modify the exception handling to either rethrow the IllegalStateException that is explicitly thrown with the "Message is too long for the model's context window" message, or restructure the try-catch to only catch exceptions from specific operations like tokenize() and model_n_ctx() calls rather than catching all exceptions after the explicit throw statement.

hal-eisen-adfa · 2026-06-19T20:57:51Z

+        } catch (e: IllegalStateException) {
+            if (e.message?.contains("embedding model") == true) {
+                log.error("Cannot use embedding model for chat: {}", displayName, e)
+                throw IllegalArgumentException(


Uncaught exception reintroduces the crash this PR fixes (on other load paths).

After this change initModelFromFile now throws IllegalArgumentException — both for the embedding-model path here and for validateModelFormat rejections (rethrown at line 358). Previously it caught everything and returned false.

Only AiSettingsViewModel was updated to catch it. The other three callers still treat the result as a plain Boolean and have no try/catch, so the exception escapes into their coroutines:

agent/.../ChatViewModel.kt:466 (autoLoadLocalModelIfNeeded)

app/.../ChatViewModel.kt:161 (getOrCreateRepository)

LocalLlmRepositoryImpl.kt:73 (loadModel)

Scenario: a saved local-model path that resolves to an embedding/unsupported model (e.g. saved by a pre-fix build, or a .gguf embedding model) → on startup/chat-open the auto-load throws an uncaught IllegalArgumentException → app crash. That is exactly the ADFA-4388 crash class, just relocated to the auto-load path.

Fix: either wrap those three call sites in try/catch and surface the message, or have initModelFromFile return a sealed result (Loaded/Rejected(reason)/Failed) instead of throwing, so the contract is uniform for all callers.

hal-eisen-adfa · 2026-06-19T20:58:08Z

+
+                        log.info("Model validation passed - {} tokens processed", testResult)
+
+                        kv_cache_clear(context)


Dry-run leaves native KV-reuse state inconsistent → can corrupt the first real message.

The dry-run completion_init("Hi", …) populates the native g_cached_tokens prefix (set in llama-android.cpp ~line 793). kv_cache_clear(context) here clears the llama KV cache but does not reset g_cached_tokens, breaking the invariant that g_cached_tokens reflects what is actually resident in the KV cache.

On the first real user message, completion_init’s reuse path (llama-android.cpp:769-808, g_kv_cache_reuse defaults true) sees g_cached_tokens non-empty. If the message tokenizes to a sequence sharing the full "Hi" prefix (realistic with formatChat=false — e.g. user types "Hi"/"Hi there", both start BOS,Hi), it sets reuse=true and skips decoding those prefix positions (lines 804-807), assuming they are in the KV cache — but they were just cleared. Decode then attends over missing KV entries → corrupted/garbage first response.

Fix options:

After the dry run, clear the cached-token bookkeeping too (add a JNI call that does g_cached_tokens.clear() under g_globals_mutex, not just kv_cache_clear), or

Better, drop the dry-run entirely — new_context already rejects embedding models authoritatively via pooling_type (llama-android.cpp ~line 369), so the dry run validates nothing new while adding this state hazard plus a full decode to every load.

jatezzz and others added 4 commits June 19, 2026 14:26

Merge remote-tracking branch 'origin/stage' into fix/ADFA-4388-embedd…

e5fb835

…ing-model-crash

chore(ADFA-4388): Remove explanatory comments from Kotlin files

7ef3e80

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

jatezzz requested review from Daniel-ADFA, dara-abijo-adfa, hal-eisen-adfa, itsaky-adfa and jomen-adfa June 19, 2026 18:52

claude Bot reviewed Jun 19, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

hal-eisen-adfa reviewed Jun 19, 2026

View reviewed changes

jomen-adfa approved these changes Jun 22, 2026

View reviewed changes

		if (e.message?.contains("embedding model") == true) {
		log.error("Cannot use embedding model for chat: {}", displayName, e)


		log.info("Model validation passed - {} tokens processed", testResult)

		kv_cache_clear(context)

Uh oh!

Conversation

jatezzz commented Jun 19, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details

Ticket

Observation

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Release Notes

Critical Bug Fix

User-Facing Changes

Technical Changes

⚠️ Risks and Best Practices Considerations

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

hal-eisen-adfa Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

hal-eisen-adfa Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jatezzz commented Jun 19, 2026 •

edited by atlassian Bot

Loading