ADFA-4388 | embedding model crash#1429
Conversation
Added multi-layer protection to detect and reject embedding models: **Native Layer (C++):** - Check pooling_type in new_context() - reject if not LLAMA_POOLING_TYPE_NONE - Added get_pooling_type() JNI function for Kotlin validation - Clear error messages explaining embedding vs generative models **Kotlin Layer:** - Validate model during load() in LLamaAndroid.kt - Catch IllegalStateException and wrap with user-friendly message - File format validation for ONNX, PyTorch, TensorFlow, etc. **UI Layer:** - Proper exception handling in AiSettingsViewModel - Display error in ModelLoadingState.Error instead of crashing - Keep bottom sheet expanded after file picker to show error/status **Infrastructure:** - Rebuilt llama.cpp AAR with updated native code (v8) - Updated LLAMA_LIB_VERSION to 8 in DynamicLibraryLoader The app now gracefully handles embedding models with clear error messages instead of crashing with SIGABRT. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Removed inline comments explaining bottom sheet behavior - Extracted file extension strings to named constants (EXT_*) - Extracted keyword strings to named constants (KEYWORD_*) - Improved code maintainability and reduced duplication Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughRelease NotesCritical Bug Fix
User-Facing Changes
Technical ChangesNative Layer (C++)
Kotlin Layer
UI Layer
Code Quality
Infrastructure
|
| Layer / File(s) | Summary |
|---|---|
Native model load, context creation, and JNI utilities llama-impl/src/main/cpp/llama-android.cpp |
Adds GGUF magic-number validation before llama_model_load_from_file, clamps n_ctx to training context, rejects embedding models immediately after context creation, and exports get_pooling_type and get_model_desc JNI functions. |
Native completion_init and completion_loop safety checks llama-impl/src/main/cpp/llama-android.cpp |
Adds pooling-type checks, input/batch validation, tokenization error handling, and detailed decode-failure exceptions in completion_init; adds batch-state validation and decode-failure early stop in completion_loop. |
Kotlin native bridge load() and send() enhancements llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt |
Declares get_pooling_type and get_model_desc native bindings; enhances load() with model-description logging, pooling-type rejection, and a dry-run completion_init validation step; reworks send() with pooling-type pre-checks, context-window validation, empty-output detection, and iteration limiting. |
Engine format constants, validateModelFormat, and loadModelFromUri app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt |
Adds private const val entries for file extensions and keyword substrings; reworks validateModelFormat to throw IllegalArgumentException for non-GGUF formats and warn on embedding-model filenames; moves URI parsing before the try block; updates exception handling; refactors detectModelFamily to use the new constants. |
ViewModel exception propagation app/src/main/java/com/itsaky/androidide/agent/viewmodel/AiSettingsViewModel.kt |
Wraps initModelFromFile in try/catch, separately handling IllegalArgumentException (validation errors) and generic Exception (unexpected errors), and removes stale comment blocks. |
UI auto-expand and library version bump app/src/main/java/com/itsaky/androidide/agent/fragments/AiSettingsFragment.kt, app/src/main/java/com/itsaky/androidide/utils/DynamicLibraryLoader.kt |
Schedules postDelayed bottom-sheet expansion and agent-tab selection after both file-picker and saved-model load paths; bumps LLAMA_LIB_VERSION from 5 to 8. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant AiSettingsFragment
participant AiSettingsViewModel
participant LlmInferenceEngine
participant LLamaAndroid
participant NativeJNI as llama-android.cpp
User->>AiSettingsFragment: picks model file (URI)
AiSettingsFragment->>AiSettingsViewModel: loadModelFromUri(uri)
AiSettingsViewModel->>LlmInferenceEngine: loadModelFromUri(uri)
LlmInferenceEngine->>LlmInferenceEngine: validateModelFormat(filename)
alt non-GGUF or embedding keyword
LlmInferenceEngine-->>AiSettingsViewModel: throw IllegalArgumentException
AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Error(message)
else valid
LlmInferenceEngine->>LLamaAndroid: initModelFromFile(path)
LLamaAndroid->>NativeJNI: load_model → validate GGUF magic
LLamaAndroid->>NativeJNI: new_context → check pooling type
LLamaAndroid->>NativeJNI: completion_init (dry run)
alt embedding or incompatible
NativeJNI-->>LLamaAndroid: throw IllegalStateException
LLamaAndroid-->>LlmInferenceEngine: throw IllegalStateException
LlmInferenceEngine-->>AiSettingsViewModel: throw IllegalArgumentException
AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Error(message)
else success
LLamaAndroid-->>LlmInferenceEngine: loadedModelName
LlmInferenceEngine-->>AiSettingsViewModel: ModelLoadingState.Loaded
AiSettingsViewModel-->>AiSettingsFragment: ModelLoadingState.Loaded
AiSettingsFragment->>AiSettingsFragment: postDelayed → expand bottom sheet, switch to TAB_AGENT
end
end
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Suggested reviewers
- itsaky-adfa
- jomen-adfa
Poem
🐇 Hop hop, I check the magic bytes,
No ONNX sneaks past my sights!
Only.ggufmay load and run,
Embeddings? Rejected, every one.
The sheet expands when models shine—
A bunny's validation, oh so fine! ✨
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 14.81% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title 'ADFA-4388 | embedding model crash' directly reflects the main change: fixing a crash caused by embedding models being used for chat operations. |
| Description check | ✅ Passed | The description comprehensively details the crash scenario, the fix implementation across multiple layers, and the improved user experience with clear error messaging. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
fix/ADFA-4388-embedding-model-crash
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt`:
- Around line 343-344: The error message check in LlmInferenceEngine.kt at the
if statement checking for "embedding model" is case-sensitive and will miss
error messages with different casing like "Embedding model". Make the message
comparison case-insensitive by converting the error message to lowercase before
performing the contains check, or by using a case-insensitive comparison method
that ignores the case parameter. This ensures that all variations of the
embedding model error message are properly detected and the user receives
appropriate guidance.
In `@llama-impl/src/main/cpp/llama-android.cpp`:
- Around line 814-821: The validation check for `batch->n_tokens < 0` in the
llama-android.cpp decode function is unreachable because it is placed after a
code path that already requires `batch->n_tokens > 0`, meaning negative values
will never reach this validation block. Move the negative `n_tokens` validation
earlier in the function, before any conditional logic that assumes a positive
token count, to ensure the check is actually executed when the batch object has
been corrupted with a negative count value.
- Around line 993-997: The batch null-check block containing the LOGe statement
and return nullptr is positioned too late in the function. Move this entire
safety check block to the very beginning of the function implementation, before
any code that dereferences or uses the batch parameter, to ensure the
null/invalid batch is caught before any potential crash occurs from
dereferencing batch in subsequent operations.
In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt`:
- Around line 276-291: The broad catch block around the context validation logic
is catching the intentionally-thrown IllegalStateException about message length
being too long and re-wrapping it with a generic "Failed to validate message
length" error, which loses the specific user-facing error detail. Modify the
exception handling to either rethrow the IllegalStateException that is
explicitly thrown with the "Message is too long for the model's context window"
message, or restructure the try-catch to only catch exceptions from specific
operations like tokenize() and model_n_ctx() calls rather than catching all
exceptions after the explicit throw statement.
- Around line 192-197: The new_batch() and new_sampler() calls in the
initialization sequence do not properly clean up previously allocated native
resources when allocation fails. If new_sampler() fails after new_batch()
succeeds, the batch handle is leaked. Wrap these allocation calls with proper
resource cleanup by implementing a try-catch block that ensures all allocated
native resources (batch, sampler, model, and context handles) are freed before
re-throwing the exception when either new_batch() or new_sampler() fails. Use
the corresponding free functions to release the handles in the correct order.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 78d630fd-79f4-49b1-81c7-a75f68d14029
📒 Files selected for processing (6)
app/src/main/java/com/itsaky/androidide/agent/fragments/AiSettingsFragment.ktapp/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.ktapp/src/main/java/com/itsaky/androidide/agent/viewmodel/AiSettingsViewModel.ktapp/src/main/java/com/itsaky/androidide/utils/DynamicLibraryLoader.ktllama-impl/src/main/cpp/llama-android.cppllama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt
| if (e.message?.contains("embedding model") == true) { | ||
| log.error("Cannot use embedding model for chat: {}", displayName, e) |
There was a problem hiding this comment.
Make embedding-error detection case-insensitive.
The current message check can miss variants like Embedding model, which bypasses the intended IllegalArgumentException mapping and user guidance.
Suggested fix
- if (e.message?.contains("embedding model") == true) {
+ if (e.message?.contains("embedding model", ignoreCase = true) == true) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (e.message?.contains("embedding model") == true) { | |
| log.error("Cannot use embedding model for chat: {}", displayName, e) | |
| if (e.message?.contains("embedding model", ignoreCase = true) == true) { | |
| log.error("Cannot use embedding model for chat: {}", displayName, e) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@app/src/main/java/com/itsaky/androidide/agent/repository/LlmInferenceEngine.kt`
around lines 343 - 344, The error message check in LlmInferenceEngine.kt at the
if statement checking for "embedding model" is case-sensitive and will miss
error messages with different casing like "Embedding model". Make the message
comparison case-insensitive by converting the error message to lowercase before
performing the contains check, or by using a case-insensitive comparison method
that ignores the case parameter. This ensures that all variations of the
embedding model error message are properly detected and the user receives
appropriate guidance.
| // Validate batch before decode | ||
| if (batch->n_tokens < 0) { | ||
| LOGe("Invalid batch token count: %d", batch->n_tokens); | ||
| env->ReleaseStringUTFChars(jtext, text); | ||
| env->ThrowNew(env->FindClass("java/lang/IllegalStateException"), | ||
| "Batch state corrupted. Token count is negative."); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
The negative n_tokens validation is unreachable in its current placement.
batch->n_tokens < 0 is checked inside a branch that already requires batch->n_tokens > 0, so corrupted negative counts bypass the intended error path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@llama-impl/src/main/cpp/llama-android.cpp` around lines 814 - 821, The
validation check for `batch->n_tokens < 0` in the llama-android.cpp decode
function is unreachable because it is placed after a code path that already
requires `batch->n_tokens > 0`, meaning negative values will never reach this
validation block. Move the negative `n_tokens` validation earlier in the
function, before any conditional logic that assumes a positive token count, to
ensure the check is actually executed when the batch object has been corrupted
with a negative count value.
| // Safety check before decode | ||
| if (!batch || batch->n_tokens <= 0) { | ||
| LOGe("Invalid batch state before decode: n_tokens=%d", batch ? batch->n_tokens : -1); | ||
| return nullptr; | ||
| } |
There was a problem hiding this comment.
Move the batch null-check before any batch dereference.
The new safety guard runs after batch has already been dereferenced, so a null/invalid batch can still crash before this check executes.
Suggested fix
- common_batch_clear(*batch);
- common_batch_add(*batch, new_token_id, n_cur, {0}, true);
+ if (!batch) {
+ LOGe("Invalid batch pointer before decode");
+ return nullptr;
+ }
+
+ common_batch_clear(*batch);
+ common_batch_add(*batch, new_token_id, n_cur, {0}, true);
- // Safety check before decode
- if (!batch || batch->n_tokens <= 0) {
+ // Safety check before decode
+ if (batch->n_tokens <= 0) {
LOGe("Invalid batch state before decode: n_tokens=%d", batch ? batch->n_tokens : -1);
return nullptr;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@llama-impl/src/main/cpp/llama-android.cpp` around lines 993 - 997, The batch
null-check block containing the LOGe statement and return nullptr is positioned
too late in the function. Move this entire safety check block to the very
beginning of the function implementation, before any code that dereferences or
uses the batch parameter, to ensure the null/invalid batch is caught before any
potential crash occurs from dereferencing batch in subsequent operations.
| val batch = new_batch(2048, 0, 1) | ||
| if (batch == 0L) throw IllegalStateException("new_batch() failed") | ||
|
|
||
| val sampler = new_sampler() | ||
| if (sampler == 0L) throw IllegalStateException("new_sampler() failed") | ||
|
|
There was a problem hiding this comment.
Release native resources when batch/sampler allocation fails.
If new_batch() or new_sampler() fails, already-allocated native handles are not freed, which leaks model/context (and batch for sampler failure).
Suggested fix
val batch = new_batch(2048, 0, 1)
- if (batch == 0L) throw IllegalStateException("new_batch() failed")
+ if (batch == 0L) {
+ free_context(context)
+ free_model(model)
+ throw IllegalStateException("new_batch() failed")
+ }
val sampler = new_sampler()
- if (sampler == 0L) throw IllegalStateException("new_sampler() failed")
+ if (sampler == 0L) {
+ free_batch(batch)
+ free_context(context)
+ free_model(model)
+ throw IllegalStateException("new_sampler() failed")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val batch = new_batch(2048, 0, 1) | |
| if (batch == 0L) throw IllegalStateException("new_batch() failed") | |
| val sampler = new_sampler() | |
| if (sampler == 0L) throw IllegalStateException("new_sampler() failed") | |
| val batch = new_batch(2048, 0, 1) | |
| if (batch == 0L) { | |
| free_context(context) | |
| free_model(model) | |
| throw IllegalStateException("new_batch() failed") | |
| } | |
| val sampler = new_sampler() | |
| if (sampler == 0L) { | |
| free_batch(batch) | |
| free_context(context) | |
| free_model(model) | |
| throw IllegalStateException("new_sampler() failed") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt` around lines 192
- 197, The new_batch() and new_sampler() calls in the initialization sequence do
not properly clean up previously allocated native resources when allocation
fails. If new_sampler() fails after new_batch() succeeds, the batch handle is
leaked. Wrap these allocation calls with proper resource cleanup by implementing
a try-catch block that ensures all allocated native resources (batch, sampler,
model, and context handles) are freed before re-throwing the exception when
either new_batch() or new_sampler() fails. Use the corresponding free functions
to release the handles in the correct order.
| try { | ||
| val contextSize = model_n_ctx(state.context) | ||
| val tokenCount = tokenize(state.context, message, true).size | ||
| log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen) | ||
|
|
||
| if (tokenCount + nlen > contextSize) { | ||
| log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize) | ||
| throw IllegalStateException( | ||
| "Message is too long for the model's context window. " + | ||
| "Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens." | ||
| ) | ||
| } | ||
| } catch (e: Exception) { | ||
| log.error("Failed to validate context size", e) | ||
| throw IllegalStateException("Failed to validate message length: ${e.message}", e) | ||
| } |
There was a problem hiding this comment.
Preserve the explicit context-window error instead of re-wrapping it.
The broad catch currently wraps the intentionally-thrown “message too long for context” error into a generic validation failure, which reduces user-facing precision.
Suggested fix
} catch (e: Exception) {
+ } catch (e: IllegalStateException) {
+ throw e
+ } catch (e: Exception) {
log.error("Failed to validate context size", e)
throw IllegalStateException("Failed to validate message length: ${e.message}", e)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| val contextSize = model_n_ctx(state.context) | |
| val tokenCount = tokenize(state.context, message, true).size | |
| log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen) | |
| if (tokenCount + nlen > contextSize) { | |
| log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize) | |
| throw IllegalStateException( | |
| "Message is too long for the model's context window. " + | |
| "Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens." | |
| ) | |
| } | |
| } catch (e: Exception) { | |
| log.error("Failed to validate context size", e) | |
| throw IllegalStateException("Failed to validate message length: ${e.message}", e) | |
| } | |
| try { | |
| val contextSize = model_n_ctx(state.context) | |
| val tokenCount = tokenize(state.context, message, true).size | |
| log.debug("Context size: {}, message tokens: {}, max output: {}", contextSize, tokenCount, nlen) | |
| if (tokenCount + nlen > contextSize) { | |
| log.error("Message too long: {} tokens + {} max output > {} context", tokenCount, nlen, contextSize) | |
| throw IllegalStateException( | |
| "Message is too long for the model's context window. " + | |
| "Message requires $tokenCount tokens plus $nlen for output, but context is only $contextSize tokens." | |
| ) | |
| } | |
| } catch (e: IllegalStateException) { | |
| throw e | |
| } catch (e: Exception) { | |
| log.error("Failed to validate context size", e) | |
| throw IllegalStateException("Failed to validate message length: ${e.message}", e) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@llama-impl/src/main/java/android/llama/cpp/LLamaAndroid.kt` around lines 276
- 291, The broad catch block around the context validation logic is catching the
intentionally-thrown IllegalStateException about message length being too long
and re-wrapping it with a generic "Failed to validate message length" error,
which loses the specific user-facing error detail. Modify the exception handling
to either rethrow the IllegalStateException that is explicitly thrown with the
"Message is too long for the model's context window" message, or restructure the
try-catch to only catch exceptions from specific operations like tokenize() and
model_n_ctx() calls rather than catching all exceptions after the explicit throw
statement.
| } catch (e: IllegalStateException) { | ||
| if (e.message?.contains("embedding model") == true) { | ||
| log.error("Cannot use embedding model for chat: {}", displayName, e) | ||
| throw IllegalArgumentException( |
There was a problem hiding this comment.
Uncaught exception reintroduces the crash this PR fixes (on other load paths).
After this change initModelFromFile now throws IllegalArgumentException — both for the embedding-model path here and for validateModelFormat rejections (rethrown at line 358). Previously it caught everything and returned false.
Only AiSettingsViewModel was updated to catch it. The other three callers still treat the result as a plain Boolean and have no try/catch, so the exception escapes into their coroutines:
agent/.../ChatViewModel.kt:466(autoLoadLocalModelIfNeeded)app/.../ChatViewModel.kt:161(getOrCreateRepository)LocalLlmRepositoryImpl.kt:73(loadModel)
Scenario: a saved local-model path that resolves to an embedding/unsupported model (e.g. saved by a pre-fix build, or a .gguf embedding model) → on startup/chat-open the auto-load throws an uncaught IllegalArgumentException → app crash. That is exactly the ADFA-4388 crash class, just relocated to the auto-load path.
Fix: either wrap those three call sites in try/catch and surface the message, or have initModelFromFile return a sealed result (Loaded/Rejected(reason)/Failed) instead of throwing, so the contract is uniform for all callers.
|
|
||
| log.info("Model validation passed - {} tokens processed", testResult) | ||
|
|
||
| kv_cache_clear(context) |
There was a problem hiding this comment.
Dry-run leaves native KV-reuse state inconsistent → can corrupt the first real message.
The dry-run completion_init("Hi", …) populates the native g_cached_tokens prefix (set in llama-android.cpp ~line 793). kv_cache_clear(context) here clears the llama KV cache but does not reset g_cached_tokens, breaking the invariant that g_cached_tokens reflects what is actually resident in the KV cache.
On the first real user message, completion_init’s reuse path (llama-android.cpp:769-808, g_kv_cache_reuse defaults true) sees g_cached_tokens non-empty. If the message tokenizes to a sequence sharing the full "Hi" prefix (realistic with formatChat=false — e.g. user types "Hi"/"Hi there", both start BOS,Hi), it sets reuse=true and skips decoding those prefix positions (lines 804-807), assuming they are in the KV cache — but they were just cleared. Decode then attends over missing KV entries → corrupted/garbage first response.
Fix options:
- After the dry run, clear the cached-token bookkeeping too (add a JNI call that does
g_cached_tokens.clear()underg_globals_mutex, not justkv_cache_clear), or - Better, drop the dry-run entirely —
new_contextalready rejects embedding models authoritatively viapooling_type(llama-android.cpp~line 369), so the dry run validates nothing new while adding this state hazard plus a full decode to every load.
Description
Crash Before Fix:
FATAL EXCEPTION: Llm-RunLoop
SIGABRT
decode: cannot decode batches with this context (calling encode() instead)
Error Message After Fix:
The selected model 'all-MiniLM-L6-v2-ggml-model-f16.gguf' is an embedding model
designed for semantic search and similarity tasks. It cannot be used for chat or
text generation.
Please select a chat/instruct model instead (e.g., models with 'chat', 'instruct', 'conversational' in their name).
Details
telegram-cloud-document-1-5022011912393590501.mp4
Ticket
ADFA-4388
Observation
This fix was developed based on consistent crash reproduction with embedding models (e.g., all-MiniLM-L6-v2). The crash occurred 100% of the time when users selected
embedding models for chat, resulting in SIGABRT in the Llm-RunLoop thread.
The multi-layer approach ensures that:
The app now correctly identifies incompatible model types and guides users to select appropriate chat/instruct models, eliminating the crash entirely.