Native Android Agent OS. Zero compromise. 100% Kotlin.
⚡️ Run autonomous LLM agents directly on-device with full Android Accessibility integration.
Fast, smart, and fully autonomous UI automation
Drop it into any app. Let the AI take the wheel.
MobClaw is the native Android port of the ZeroClaw agent operating system — bringing true autonomous UI interaction, screen reading, and zero-shot reasoning directly to Android devices using standard Accessibility Services.
- 🏎️ Native Kotlin Implementation: Built from the ground up for Android, no heavy web-views or external runtimes required.
- 👁️ Semantic Screen Reading: Automatically transforms Android's Accessibility node tree into semantic, LLM-friendly text summaries.
- ⚡ Real-Time Execution: Uses Android's
AccessibilityService.dispatchGesturefor instant, reliable UI interactions (clicks, scrolls, typing). - 🧠 Pluggable LLM Providers: Supports Gemini out of the box, easily extensible to any LLM provider.
- 🛡️ Robust Error Recovery: Built-in safeguards, auto-retry logic, and element ID resolution to handle dynamic Android UIs.
MobClaw is an Android library (:mobclaw) that you can embed into any host app (like the included :app test module).
Add it to your settings.gradle.kts:
include(":mobclaw")Add it to your app's build.gradle.kts:
implementation(project(":mobclaw"))Your app must declare an Accessibility Service to grant MobClaw screen control.
AndroidManifest.xml:
<service
android:name="com.mobclaw.android.accessibility.MobClawAccessibilityService"
android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE"
android:exported="false">
<intent-filter>
<action android:name="android.accessibilityservice.AccessibilityService" />
</intent-filter>
<meta-data
android:name="android.accessibilityservice"
android:resource="@xml/accessibility_service_config" />
</service>Initialize the agent with your LLM provider and let it run tasks:
val provider = GeminiProvider(apiKey = "YOUR_GEMINI_API_KEY")
val agent = MobAgent.Builder()
.provider(provider)
.build()
// Run a task asynchronously
lifecycleScope.launch {
val result = agent.execute("Open Settings and turn on Wi-Fi")
println("Task finished: ${result.success} - ${result.message}")
}MobClaw mirrors ZeroClaw's trait-driven architecture, adapted for Android:
LlmProvider: Interface for LLM communication (e.g.,GeminiProvider).MobTool: Agent capabilities (ClickTool,ScrollTool,TypeTool,ScreenReadTool).ActionDispatcher: Parses LLM outputs and matches them to tools via JSON or XML.GestureEngine: Translates semantic node IDs to physicalPathgestures on the screen.MobObserver: Hooks for rendering overlays or logging (e.g.,OverlayObserver).
- Observe: The agent calls
screen_readto dump the current Android UI hierarchy. - Reason: The LLM parses the screen and decides what to do next based on your prompt.
- Act: The LLM issues a tool call (like
click(node_id="n5")). - Execute: MobClaw resolves "n5" to physical X/Y coordinates and dispatches a tap gesture via the Accessibility Service.
- Repeat: The loop continues until the LLM calls the
finishtool.
This project is licensed under the MIT License.
