CortexOS supports multiple LLM providers with a unified interface:
- Groq (Default) - Fast inference with Llama models
- Claude - Anthropic's Claude 3.5 Sonnet
- Mistral - Mistral Large
- Gemini - Google's Gemini 1.5 Pro
Add API keys to .env:
# Choose default provider
DEFAULT_LLM_PROVIDER=groq
# Add at least one API key
GROQ_API_KEY=your_groq_api_key
ANTHROPIC_API_KEY=your_claude_api_key
MISTRAL_API_KEY=your_mistral_api_key
GEMINI_API_KEY=your_gemini_api_keyimport { llmManager } from './llm/manager';
// Use default provider
const response = await llmManager.generate([
{ role: 'user', content: 'Hello!' }
]);
// Use specific provider
const response = await llmManager.generate(
[{ role: 'user', content: 'Hello!' }],
{ provider: 'claude' }
);const response = await llmManager.generateJSON([
{ role: 'user', content: 'Return user data as JSON' }
]);
const data = JSON.parse(response.content);const response = await llmManager.generate(
[{ role: 'user', content: 'Write a story' }],
{
provider: 'gemini',
temperature: 0.9,
maxTokens: 4096,
model: 'gemini-1.5-pro'
}
);- Groq: llama-3.1-70b-versatile
- Claude: claude-3-5-sonnet-20241022
- Mistral: mistral-large-latest
- Gemini: gemini-1.5-pro
- Fastest inference
- JSON mode support
- Cost-effective
- Llama 3.1 models
- Best reasoning
- Long context (200K tokens)
- System prompts
- JSON via instruction
- European provider
- JSON mode support
- Multilingual
- Code generation
- Google's latest
- Native JSON mode
- Multimodal support
- Long context
Embeddings are handled by Pinecone's inference API (text-embedding-3-small).
No OpenAI API key needed for embeddings when using Pinecone.
const providers = llmManager.listAvailableProviders();
console.log(providers); // ['groq', 'claude', 'mistral', 'gemini']try {
const response = await llmManager.generate(
[{ role: 'user', content: 'Hello' }],
{ provider: 'claude' }
);
} catch (error) {
// Provider not configured or API error
console.error(error.message);
}All providers are mocked in tests. No API keys needed for testing.
- Use Groq for fast, cheap inference
- Use Claude for complex reasoning
- Use Mistral for European compliance
- Use Gemini for multimodal tasks
All providers return token usage:
const response = await llmManager.generate([...]);
console.log(response.usage);
// {
// promptTokens: 10,
// completionTokens: 50,
// totalTokens: 60
// }