Switch between Claude, GPT-4o, local Llama on Apple Silicon, and Apple's Foundation Models — by changing one line of code.
Conduit is a unified Swift 6.2 SDK for LLM inference across local and cloud providers. Every provider conforms to TextGenerator, giving you a single API surface whether you're running Claude in the cloud, GPT-4o via OpenRouter, Llama on your device with MLX, or Apple's built-in Foundation Models. Actors and Sendable types give you compile-time race-condition safety with zero extra work.
- Quick Demo
- Feature Matrix
- Installation
- Quick Start
- Providers
- Streaming
- Structured Output with @Generable
- Tool Calling
- ChatSession
- Provider Swap in One Line
- On-Device & Privacy
- Model Management
- Generation Config
- Design Philosophy
- Documentation
- Contributing
- License
import Conduit
// Cloud — Anthropic
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let response = try await provider.generate(
"Explain async/await in Swift",
model: .claudeSonnet45,
config: .default
)
// Swap to local MLX — same call, zero rewrite
// let provider = MLXProvider()
// let response = try await provider.generate("Explain async/await in Swift", model: .llama3_2_1B, config: .default)| Capability | MLX | HuggingFace | Anthropic | Kimi | MiniMax | OpenAI | Foundation Models |
|---|---|---|---|---|---|---|---|
| Text Generation | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Streaming | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Structured Output | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Tool Calling | — | — | ✓ | — | — | ✓ | — |
| Vision | — | — | ✓ | — | — | ✓ | — |
| Extended Thinking | — | — | ✓ | — | — | — | — |
| Embeddings | — | ✓ | — | — | — | ✓ | — |
| Transcription | — | ✓ | — | — | — | ✓ | — |
| Image Generation | — | ✓ | — | — | — | ✓ | — |
| Token Counting | ✓ | — | — | — | — | ✓* | — |
| Offline | ✓ | — | — | — | — | —** | ✓ |
| Privacy | ✓ | — | — | — | — | —** | ✓ |
*Estimated token counting **Offline/privacy available when using Ollama local endpoint
Add Conduit to your Package.swift:
dependencies: [
.package(url: "https://github.com/christopherkarani/Conduit", from: "1.0.0")
]Then add "Conduit" to your target's dependencies.
Conduit uses Swift package traits for optional heavyweight dependencies. Enable only what you need:
// On-device MLX inference (Apple Silicon only)
.package(url: "https://github.com/christopherkarani/Conduit", from: "1.0.0", traits: ["MLX"])Note: Without any traits, all cloud providers (Anthropic, OpenAI, HuggingFace, Kimi, MiniMax) are available. MLX requires the trait because it links against Apple Silicon Metal libraries.
| Platform | Status | Available Providers |
|---|---|---|
| macOS 14+ | Full | All providers |
| iOS 17+ | Full | All providers |
| visionOS 1+ | Full | All providers |
| Linux | Partial | Anthropic, Kimi, MiniMax, OpenAI, HuggingFace |
MLX runs on Apple Silicon only. Foundation Models requires iOS 26+ / macOS 26+. Linux builds exclude both by default.
import Conduit
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let response = try await provider.generate(
"What are actors in Swift?",
model: .claudeSonnet45,
config: .default
)
print(response)import Conduit
let provider = MLXProvider()
let response = try await provider.generate(
"What are actors in Swift?",
model: .llama3_2_1B,
config: .default
)
print(response)import Conduit
let provider = AnthropicProvider(apiKey: "sk-ant-...")
for try await chunk in provider.stream(
"Write a haiku about Swift concurrency",
model: .claudeSonnet45,
config: .default
) {
print(chunk, terminator: "")
}Local inference on Apple Silicon. Zero network traffic, complete privacy.
Best for: Privacy-sensitive apps, offline functionality, consistent latency
// Default configuration
let provider = MLXProvider()
// Optimized presets
let provider = MLXProvider(configuration: .m1Optimized)
let provider = MLXProvider(configuration: .highPerformance)
// Full control
let config = MLXConfiguration.default
.memoryLimit(.gigabytes(8))
.withQuantizedKVCache(bits: 4)
let provider = MLXProvider(configuration: config)Configuration Presets:
| Preset | Memory | Use Case |
|---|---|---|
.default |
Auto | Balanced performance |
.m1Optimized |
6 GB | M1 MacBooks, base iPads |
.mProOptimized |
12 GB | M1/M2 Pro, Max chips |
.memoryEfficient |
4 GB | Constrained devices |
.highPerformance |
16+ GB | M2/M3 Max, Ultra |
Warmup for fast first response:
let provider = MLXProvider()
try await provider.warmUp(model: .llama3_2_1B, maxTokens: 5)
// First response is now fast (~100-300ms instead of ~2-4s)
let response = try await provider.generate("Hello", model: .llama3_2_1B)Cloud inference via HuggingFace Inference API. Access hundreds of models.
Best for: Large models, embeddings, transcription, image generation, model variety
// Auto-detects HF_TOKEN from environment
let provider = HuggingFaceProvider()
// Explicit token
let provider = HuggingFaceProvider(token: "hf_...")Embeddings:
let embedding = try await provider.embed(
"Conduit makes LLM inference easy",
model: .huggingFace("sentence-transformers/all-MiniLM-L6-v2")
)
let similarity = embedding.cosineSimilarity(with: otherEmbedding)Image Generation:
let result = try await provider.textToImage(
"A cat wearing a top hat, digital art",
model: .huggingFace("stabilityai/stable-diffusion-3"),
config: .highQuality.width(1024).height(768)
)
result.image // SwiftUI Image, ready to display
try result.save(to: URL.documentsDirectory.appending(path: "image.png"))System-integrated on-device AI. Zero setup, managed by the OS.
if #available(iOS 26.0, *) {
let provider = FoundationModelsProvider()
let response = try await provider.generate(
"What can you help me with?",
model: .foundationModels,
config: .default
)
}First-class support for Anthropic's Claude models.
Best for: Advanced reasoning, vision, extended thinking, production applications
let provider = AnthropicProvider(apiKey: "sk-ant-...")
// Text generation
let response = try await provider.generate(
"Explain quantum computing",
model: .claudeSonnet45,
config: .default.maxTokens(500)
)
// Streaming
for try await chunk in provider.stream("Write a poem about Swift", model: .claude3Haiku, config: .default) {
print(chunk, terminator: "")
}Available Models:
| Model | ID | Best For |
|---|---|---|
| Claude Opus 4.5 | .claudeOpus45 |
Most capable, complex reasoning |
| Claude Sonnet 4.5 | .claudeSonnet45 |
Balanced performance and speed |
| Claude 3.5 Sonnet | .claude35Sonnet |
Fast, high-quality responses |
| Claude 3 Haiku | .claude3Haiku |
Fastest, most cost-effective |
Vision:
let messages = Messages {
Message.user([
.text("What's in this image?"),
.image(base64Data: imageData, mimeType: "image/jpeg")
])
}
let result = try await provider.generate(messages: messages, model: .claudeSonnet45, config: .default)Extended Thinking:
var config = AnthropicConfiguration.standard(apiKey: "sk-ant-...")
config.thinkingConfig = .standard
let provider = AnthropicProvider(configuration: config)
let result = try await provider.generate(
"Solve this complex problem...",
model: .claudeOpus45,
config: .default
)Get your API key at: https://console.anthropic.com/
Dedicated support for Moonshot's Kimi models with 256K context windows.
Best for: Long context tasks, coding, reasoning, document analysis
let provider = KimiProvider(apiKey: "sk-moonshot-...")
let response = try await provider.generate(
"Summarize this 100-page document...",
model: .kimiK2_5,
config: .default
)Available Models:
| Model | ID | Context |
|---|---|---|
| Kimi K2.5 | .kimiK2_5 |
256K |
| Kimi K2 | .kimiK2 |
256K |
| Kimi K1.5 | .kimiK1_5 |
256K |
Get your API key at: https://platform.moonshot.cn/
Support for MiniMax models, compatible with both OpenAI and Anthropic wire formats.
let provider = MiniMaxProvider(apiKey: "your-minimax-key")
let response = try await provider.generate(
"Hello",
model: .abab65Chat,
config: .default
)Works with OpenAI, OpenRouter, Ollama, Azure, and any OpenAI-compatible endpoint.
Supported backends:
- OpenAI — Official GPT-4, DALL-E, Whisper APIs
- OpenRouter — 200+ models from OpenAI, Anthropic, Google, Meta, and more
- Ollama — Local inference server (offline / privacy)
- Azure OpenAI — Microsoft's enterprise OpenAI service
- Custom — Any OpenAI-compatible endpoint
OpenAI (official):
let provider = OpenAIProvider(apiKey: "sk-...")
let response = try await provider.generate("Hello", model: .gpt4o, config: .default)OpenRouter:
// Simple
let provider = OpenAIProvider(openRouterKey: "sk-or-...")
let response = try await provider.generate(
"Hello",
model: .openRouter("anthropic/claude-3-opus"),
config: .default
)
// Static factory methods
let provider = OpenAIProvider.forClaude(apiKey: "sk-or-...") // Optimized for Claude
let provider = OpenAIProvider.fastest(apiKey: "sk-or-...") // Latency-optimized routingOllama (local):
// Install: curl -fsSL https://ollama.com/install.sh | sh && ollama pull llama3.2
let provider = OpenAIProvider(endpoint: .ollama())
let response = try await provider.generate(
"Hello from local inference!",
model: .ollama("llama3.2"),
config: .default
)Azure:
let provider = OpenAIProvider(
endpoint: .azure(resource: "my-resource", deployment: "gpt-4", apiVersion: "2024-02-15-preview"),
apiKey: "azure-key"
)Available OpenAI Models:
| Model | ID | Best For |
|---|---|---|
| GPT-4o | .gpt4o |
Latest multimodal flagship |
| GPT-4o Mini | .gpt4oMini |
Fast, cost-effective |
| o1 | .o1 |
Complex reasoning |
| o3 Mini | .o3Mini |
Fast reasoning |
Real-time token streaming with AsyncSequence:
// Simple text streaming
for try await text in provider.stream("Tell me a joke", model: .llama3_2_1B, config: .default) {
print(text, terminator: "")
}
// With metadata (tokens per second, finish reason)
let stream = provider.streamWithMetadata(
messages: messages,
model: .llama3_2_1B,
config: .default
)
for try await chunk in stream {
print(chunk.text, terminator: "")
if let tokensPerSecond = chunk.tokensPerSecond {
print("Speed: \(tokensPerSecond) tok/s")
}
if let reason = chunk.finishReason {
print("\nFinished: \(reason)")
}
}This is Conduit's most differentiated feature. The @Generable macro synthesizes a complete type-safe structured output pipeline at compile time — no runtime JSON parsing, no manual schema writing.
Define your type:
import Conduit
@Generable
struct MovieReview {
@Guide(description: "Film title")
let title: String
@Guide(description: "Rating from 1 to 10", .range(1...10))
let rating: Int
@Guide(description: "Brief summary of the film", .maxLength(200))
let summary: String
@Guide(description: "Would you recommend this film?")
let recommended: Bool
}The macro synthesizes:
MovieReview.generationSchema— the JSON schema to send to the providerMovieReview.PartiallyGenerated— a mirror type with all-optional fields for streaminginit(_ generatedContent: GeneratedContent)— decoding from the model's response
Generate with schema enforcement:
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let config = GenerateConfig.default
.responseFormat(.jsonSchema(name: "MovieReview", schema: MovieReview.generationSchema))
let result = try await provider.generate(
messages: [.user("Review the film Inception")],
model: .claudeSonnet45,
config: config
)
// result.text is JSON validated against the schema
// {"title": "Inception", "rating": 9, "summary": "A mind-bending thriller..."}
let review = try MovieReview(GeneratedContent(jsonString: result.text))
print(review.title) // "Inception"
print(review.rating) // 9Nested types and enums:
@Generable
enum Sentiment {
case positive
case neutral
case negative
}
@Generable
struct ProductAnalysis {
@Guide(description: "Product name")
let name: String
@Guide(description: "Sentiment of the review")
let sentiment: Sentiment
@Guide(description: "Key strengths", .count(3))
let strengths: [String]
}Define type-safe tools using @Generable arguments:
struct WeatherTool: AITool {
@Generable
struct Arguments {
@Guide(description: "The city to get weather for")
let location: String
@Guide(description: "Unit: celsius or fahrenheit")
let unit: String
}
var name: String { "get_weather" }
var description: String { "Get current weather for a location" }
func call(arguments: Arguments) async throws -> String {
// Call your weather API
return "72°F, sunny in \(arguments.location)"
}
}
// Use with a provider
let tool = WeatherTool()
let result = try await provider.generate(
messages: [.user("What's the weather in San Francisco?")],
model: .claudeSonnet45,
config: .default.tools([tool])
)Stateful conversation management with automatic history tracking, tool execution, and SwiftUI integration.
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let session = ChatSession(
provider: provider,
model: .claudeSonnet45,
config: .default
)
// Set system prompt
session.setSystemPrompt("You are a helpful Swift coding assistant.")
// Send messages — history is managed automatically
let response = try await session.send("What are actors in Swift?")
let followUp = try await session.send("Show me a real example.")Streaming in a session:
let stream = session.stream("Write a sorting algorithm in Swift")
for try await token in stream {
print(token, terminator: "")
}Eager warmup for fast first-message latency:
// Pays 1-2s warmup cost at init → first message is ~100-300ms instead of ~2-4s
let session = try await ChatSession(
provider: MLXProvider(),
model: .llama3_2_1B,
warmup: .eager
)Tool execution loop:
let executor = ToolExecutor(tools: [WeatherTool(), CalendarTool()])
session.toolExecutor = executor
session.maxToolCallRounds = 8
// ChatSession automatically runs the tool loop until the model stops calling tools
let response = try await session.send("What's the weather and my schedule today?")SwiftUI integration — ChatSession is @Observable:
struct ChatView: View {
@State var session: ChatSession<AnthropicProvider>
var body: some View {
VStack {
ForEach(session.messages) { message in
MessageBubble(message: message)
}
if session.isGenerating {
ProgressView("Generating...")
}
}
}
}History management:
session.clearHistory() // Clear all messages (keeps system prompt)
session.undoLastExchange() // Remove last user+assistant pair
session.injectHistory(messages) // Restore a saved conversation
await session.cancel() // Cancel in-progress generationEvery provider conforms to TextGenerator, so your prompt logic is completely provider-agnostic:
func run<P: TextGenerator>(provider: P, model: P.ModelID) async throws -> String {
try await provider.generate(
"Plan a three-day SwiftUI sprint with daily goals.",
model: model,
config: .creative
)
}
// Run the same prompt across all your providers
let anthropic = AnthropicProvider(apiKey: "sk-ant-...")
let openRouter = OpenAIProvider.forOpenRouter(apiKey: "sk-or-...", preferring: [.anthropic, .openai])
let ollama = OpenAIProvider(endpoint: .ollama(), apiKey: nil)
let mlx = MLXProvider()
let claudePlan = try await run(provider: anthropic, model: .claudeOpus45)
let gptPlan = try await run(provider: openRouter, model: .openRouter("openai/gpt-4-turbo"))
let ollamaPlan = try await run(provider: ollama, model: .ollamaLlama32)
let localPlan = try await run(provider: mlx, model: .llama3_2_1B)Run open-weight models entirely on-device. No network traffic, no data leaves the device.
// Enable MLX trait in Package.swift, then:
let provider = MLXProvider(configuration: .m1Optimized)
let response = try await provider.generate(
"Summarize my private notes...",
model: .llama3_2_1B,
config: .default
)Use the OS-managed on-device model. No API key, no model download.
if #available(iOS 26.0, *) {
let provider = FoundationModelsProvider()
let response = try await provider.generate(
"Summarize this text",
model: .foundationModels,
config: .default
)
}Run any open-weight model on localhost with Ollama.
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2let provider = OpenAIProvider(endpoint: .ollama())
let response = try await provider.generate(
"Hello from local inference!",
model: .ollama("llama3.2"),
config: .default
)Download models from HuggingFace Hub for MLX inference with progress tracking.
let manager = ModelManager.shared
// Download with progress
let url = try await manager.download(.llama3_2_1B) { progress in
print("Downloading: \(progress.percentComplete)%")
if let speed = progress.formattedSpeed { print("Speed: \(speed)") }
if let eta = progress.formattedETA { print("ETA: \(eta)") }
}
// Validate before downloading (checks MLX compatibility, estimates size)
let url = try await manager.downloadValidated(.llama3_2_1B) { progress in
print("Progress: \(progress.percentComplete)%")
}DownloadTask is @Observable — bind it directly to a ProgressView:
struct ModelDownloadView: View {
@State private var downloadTask: DownloadTask?
var body: some View {
if let task = downloadTask {
VStack {
ProgressView(value: task.progress.fractionCompleted)
Text("\(task.progress.percentComplete)%")
if let speed = task.progress.formattedSpeed { Text(speed) }
Button("Cancel") { task.cancel() }
}
} else {
Button("Download Llama 3.2") {
Task {
downloadTask = await ModelManager.shared.downloadTask(for: .llama3_2_1B)
}
}
}
}
}let manager = ModelManager.shared
if await manager.isCached(.llama3_2_1B) { print("Model ready") }
let cached = try await manager.cachedModels()
for model in cached {
print("\(model.identifier.displayName): \(model.size.formatted)")
}
// Evict least-recently-used models to fit storage limit
try await manager.evictToFit(maxSize: .gigabytes(30))
// Remove specific model or clear everything
try await manager.delete(.llama3_2_1B)
try await manager.clearCache()if let size = await manager.estimateDownloadSize(.llama3_2_1B) {
print("Download size: \(size.formatted)") // e.g., "2.1 GB"
}Browse the mlx-community on HuggingFace for 4-bit quantized models optimized for Apple Silicon.
Storage locations:
- MLX models:
~/Library/Caches/Conduit/Models/mlx/ - HuggingFace models:
~/Library/Caches/Conduit/Models/huggingface/
Control generation with presets or a fluent API:
// Presets
.default // temperature: 0.7, topP: 0.9, maxTokens: 1024
.creative // temperature: 0.9, topP: 0.95, frequencyPenalty: 0.5
.precise // temperature: 0.1, topP: 0.5, repetitionPenalty: 1.1
.code // temperature: 0.2, topP: 0.9, stopSequences: ["```", "\n\n\n"]
// Fluent API
let config = GenerateConfig.default
.temperature(0.8)
.maxTokens(500)
// Full constructor
let config = GenerateConfig(
temperature: 0.8,
maxTokens: 500,
topP: 0.9,
stopSequences: ["END"]
)- Actors everywhere — All providers are actors, giving compile-time data-race safety via Swift 6.2 strict concurrency
- Explicit model selection — No magic auto-detection; you always know exactly which model is running
- Protocol-first — Everything conforms to
TextGenerator,EmbeddingGenerator, orImageGenerator, so your code stays provider-agnostic - Sendable by default — All public types conform to
Sendable, safe to pass across actor boundaries
| Guide | Description |
|---|---|
| Getting Started | Installation, setup, and first generation |
| Providers | Detailed guides for each provider |
| Structured Output | Type-safe responses with @Generable |
| Tool Calling | Define and execute LLM-invokable tools |
| Streaming | Real-time token streaming patterns |
| ChatSession | Stateful conversation management |
| Model Management | Download, cache, and manage models |
| Error Handling | Handle errors gracefully |
| Architecture | Design principles and internals |
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes with a descriptive message
- Push and open a Pull Request
Please ensure your code follows existing conventions, includes tests (Swift Testing framework), and maintains backward compatibility.
- GitHub Discussions — Ask questions, share ideas
- GitHub Issues — Report bugs, request features
MIT License — see LICENSE for details.
