This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Andclaw is an Android automation assistant powered by Large Language Models (LLMs). It uses Android Accessibility Services to "read" UI hierarchy and performs actions based on natural language commands. Supports remote control via Telegram Bot.
This is a Gradle-based Android project with multiple modules.
# Build the project
./gradlew build
# Install debug APK to connected device
./gradlew :app:installDebug
# Clean build
./gradlew clean
# Run tests
./gradlew test
# Run Android instrumented tests
./gradlew connectedAndroidTest-
Create
local.propertieswith your API keys:kimi_key=your_kimi_api_key tg_token=your_telegram_bot_token # Optional, for Telegram integration
-
The app requires these permissions at runtime:
- Accessibility Service: Must be manually enabled in Settings > Accessibility
- Display over other apps: For the emergency stop floating button
| Module | Purpose |
|---|---|
:app |
Main application module containing AI agent logic, UI, and accessibility service |
:model |
Kimi API client (KimiApiClient) using Anthropic messages format |
:mdm |
Mobile Device Management - Device Owner policies, Kiosk mode, silent app install |
:services |
Kotlin interface definitions for cross-module communication (NOT AIDL) |
:commonUtils |
Common Java/Kotlin utilities shared across modules |
:network |
Network utilities, Retrofit factory, and file download manager |
:setupdesign, :setupcompat, :setupdesign_strings |
Setup Wizard UI components (from Google testdpc) |
app → model, mdm, services, commonUtils
mdm → setupdesign, setupcompat, setupdesign_strings, services, commonUtils
setupdesign → setupcompat, setupdesign_strings
AgentController (app/.../AgentController.kt)
- Singleton
objectthat orchestrates the AI agent loop - Implements
ITgBridgeServiceandIAiConfigService - Manages LLM API calls via
Utils.callLLMWithHistory() - Handles Telegram bot integration for remote control (
TgBotClient) - Maintains conversation state in
MutableStateFlow<List<ChatMessage>> - Loop detection via action fingerprint tracking (
consecutiveSameCount)
AgentAccessibilityService (app/.../AgentAccessibilityService.kt)
- Android AccessibilityService providing screen perception and interaction
captureScreenHierarchy(): Captures UI hierarchy as text with element boundsclick(x, y): Simulates tap gestures usingdispatchGesture()swipe(),longPress(): Gesture simulationinputText(): Text injection (SET_TEXT → clipboard paste fallback)isWebViewContext(): Detects browser/WebView for screenshot-based analysiscaptureScreenshot(): Takes screenshot viatakeScreenshot()API (API 30+)globalAction(): System-level actions (back, home, recents, etc.)
Utils (app/.../Utils.kt)
buildAgentSystemPrompt(): Constructs the system prompt with all action type documentationcallLLMWithHistory(): Multi-provider LLM call supporting Kimi (Anthropic format) and OpenAI-compatible APIsparseAction(): Extracts JSON from LLM response, handles markdown fences and array wrapping- Supports multimodal input (screenshot base64) for both Kimi and OpenAI providers
DpmBridge (app/.../DpmBridge.kt)
- Bridges AI actions to Device Policy Manager operations
execute(): Dispatches 40+ DPM actions (lock, reboot, install/uninstall, permissions, kiosk, etc.)getSafetyLevel(): Classifies actions as SAFE / CONFIRM / DANGEROUS
TgBotClient (app/.../TgBotClient.kt)
- Telegram Bot API client using OkHttp
poll(): Long-pollinggetUpdatessend(),sendPhoto(),sendVideo(): Message/media delivery
The AI returns JSON actions with these types:
intent— Launch apps/activities with extrasclick— Tap at screen coordinates (x, y)swipe— Swipe from (x, y) to (endX, endY)long_press— Long press at coordinatestext_input— Type text into focused fieldsglobal_action— System actions (back, home, recents, notifications, quick_settings)screenshot— Capture screen and save to gallerydownload— Download file by URL via DownloadManagerwait— Wait for UI transition, then re-check screencamera— Take photo or record video (take_photo, start_video, stop_video)screen_record— Screen recording via MediaProjection (start_record, stop_record)volume— Volume control (set, adjust_up, adjust_down, mute, unmute, get)dpm— Device Policy Manager actions (Device Owner only)finish— Task completed
- User provides command (text via UI, or remotely via Telegram Bot)
AgentAccessibilityService.captureScreenHierarchy()captures current UI tree- If in WebView/browser context, also captures screenshot for visual analysis
- Prompt sent to LLM with: system prompt + last 12 messages history + screen data
- AI returns JSON action, parsed by
Utils.parseAction()intoAiAction - If parse fails, retries with correction prompt
handleAction()dispatches by type:intent:executeIntent()→ delay 3s → next stepclick/swipe/long_press/text_input/global_action/screenshot/download/camera/screen_record/volume:performConfirmedAction()→ delay 2.5s → next stepdpm:DpmBridge.execute()→ delay 2.5s → next stepwait: delay specified duration → next stepfinish: stops agent
- Loop detection: if same action fingerprint repeats 5 times, takes screenshot and retries with visual analysis (max 3 retries before stopping)
The :services module defines pure Kotlin interfaces (not AIDL) for cross-module decoupling:
| Interface | Purpose |
|---|---|
IAiConfigService |
AI provider config (provider, apiUrl, apiKey, model), Telegram chatId |
ITgBridgeService |
Telegram bridge start/stop |
IDeviceOwner |
Silent app install/uninstall, install permission check |
IAppInfoService |
App info (name, version, package, serial), bring-to-foreground |
IToaster |
Toast display |
ISocketService |
Socket service start/stop |
KimiApiClient (model/.../KimiApi.kt):
- Calls Kimi Coding API using Anthropic messages format (
/v1/messages) - Supports multimodal input (text + base64 images)
- Auth via
x-api-keyheader withanthropic-version: 2023-06-01 - Default model:
kimi-k2.5, default base URL:https://api.kimi.com/coding
The :mdm module provides Device Owner features:
- Silent app install/uninstall via
DeviceOwnerImpl+PackageInstallationUtils - Kiosk mode with
KioskModeHelper(user restrictions, ADB control, lock task) - Setup wizard UI via
SetupKioskModeActivity(device admin, network, AI config) - Device status monitoring via
DeviceStatusViewModel(network, screen on/off) - App lifecycle monitoring via
AppViewModule(PACKAGE_ADDED/REMOVED broadcasts) - DPM gateway abstraction via
DevicePolicyManagerGateway/DevicePolicyManagerGatewayImpl
See ACTIONS.md for complete capability matrix.
- Kotlin: 2.3.10
- AGP: 9.1.0
- JVM Target: 17
- minSdk: 31, targetSdk: 36, compileSdk: 36
- Key Libraries:
- ViewBinding + DataBinding (UI)
- OkHttp 5.3.2 (API calls)
- Gson 2.13.2 (JSON parsing)
- Koin 3.3.3 (DI)
- CameraX 1.4.1 (camera operations)
- Guava 30.1-jre (ListenableFuture for CameraX)
- Kotlin Coroutines + StateFlow (async, state management)
| File | Type | Purpose |
|---|---|---|
App.kt |
Application | Koin DI setup, Device Owner listener, permission auto-grant |
AgentController.kt |
Agent Core | LLM loop, Telegram bridge, action dispatch |
AgentAccessibilityService.kt |
AccessibilityService | Screen perception, gestures, text input, screenshot |
Utils.kt |
Utility | System prompt, LLM API calls, JSON parsing |
DpmBridge.kt |
Bridge | DPM action execution, safety classification |
TgBotClient.kt |
Network | Telegram Bot API client |
ChatHistoryActivity.kt |
Activity | Main chat UI with RecyclerView |
ActionTestActivity.kt |
Activity | Manual action testing (screenshot, gestures, DPM, etc.) |
CameraActivity.kt |
Activity | CameraX photo/video capture |
ScreenRecordActivity.kt |
Activity | MediaProjection permission request |
ScreenRecordService.kt |
Service | Foreground recording service (MediaRecorder + VirtualDisplay) |
Models.kt |
Data | AiAction, AgentUiState, ApiConfig, ChatMessage |
ChatAdapter.kt |
Adapter | Chat bubble RecyclerView adapter |
AppInfoService.kt |
Service Impl | IAppInfoService implementation for MDM module |
- Human-in-the-Loop: Actions dispatched via
performConfirmedAction()for confirmed execution - Loop Detection: Tracks action fingerprints; after 5 repeats, takes screenshot for visual retry; after 15 total repeats, stops agent
- Privacy Warning: Screen UI data and screenshots are sent to LLM providers
KIMI_KEY: Injected fromlocal.propertieskimi_keyTG_TOKEN: Injected fromlocal.propertiestg_token
Screen capture is implemented via:
ScreenRecordActivity: Transparent activity requesting MediaProjection permissionScreenRecordService: Foreground service usingMediaRecorder+VirtualDisplay- Output saved to
Movies/Andclaw/via MediaStore - Supports sending recorded video to Telegram
ApiConfig(
provider = "Kimi Code",
apiKey = BuildConfig.KIMI_KEY,
apiUrl = "https://api.kimi.com/coding",
model = "kimi-k2.5"
)Supports three providers via IAiConfigService.updateConfig():
- Kimi Code: Anthropic Messages format (
api.kimi.com/coding), API Key from Kimi Code console - Moonshot: OpenAI-compatible format (
api.moonshot.cn/v1), API Key from Moonshot platform - OpenAI: Any OpenAI-compatible API