Want AI to Tap Your Phone Automatically? Mobilerun Supports Android and iOS

A look at droidrun's open source Mobilerun: an LLM-agnostic mobile agent framework for Android and iOS devices, supporting CLI, Python API, local execution, and cloud device workflows.

Mobilerun is droidrun’s open source mobile device automation framework. Its goal is to let LLM agents control Android and iOS devices through natural language. It provides native mobile tools so agents can inspect UI state, understand screenshots, tap, swipe, type, plan multi-step tasks, and return results through CLI or Python API.

The project’s positioning is clear: it does not bind itself to one model vendor, but works as the execution layer between mobile devices and agents. The README lists model sources including OpenAI, Anthropic, Gemini, Ollama, DeepSeek, OpenRouter, and OpenAI-compatible providers. For developers, this is more practical than a demo project that only supports one model.

What Problem It Solves

The hardest part of mobile automation is that many layers sit between a natural-language task and real device operation. The model needs to know which app is open, what controls are on the page, whether screenshots are needed for visual context, where to tap next, and how to continue after failure.

Mobilerun organizes these capabilities into a framework:

  • Run one-off natural-language tasks, inspect devices, replay macros, and debug flows through CLI and TUI.
  • Build custom mobile automation workflows through Python API.
  • Support Android and iOS. Android uses Portal app and accessibility; iOS follows a separate Portal flow.
  • Combine accessibility tree and screenshots so the model can read structured UI and visual context.
  • Support modes such as --vision, --vision-only, and --reasoning for tasks of different complexity.
  • Support structured output, app cards, custom tools, credentials, and execution trace tracking.

This makes Mobilerun feel more like a “mobile agent runtime” than a simple screenshot-to-LLM tap simulator.

Local Framework and Cloud Service

Mobilerun separates the local framework and Mobilerun Cloud clearly. The local framework is for developers running agents on their own machines and devices with stronger code-level control. Cloud targets hosted devices, REST API, SDKs, and scaled workflows.

This layering matters. Many mobile automation scenarios begin as “help me run one task on a phone”, but once teams adopt them, device management, concurrency, logs, retries, permissions, and API calls all appear. Cloud does not replace the local framework; it pushes device operations and workflow integration toward backend services.

The README also distinguishes several types of cloud devices: user-owned hardware, hosted cloud phones, and hosted physical phones. The difference is not only cost; it also affects app risk control, identity trust, and task stability. For e-commerce, social, finance, or local-service apps, real devices and virtual devices may behave very differently.

Why LLM-Agnostic Matters

Mobile GUI agents are still changing quickly, so it is hard to say which model will be best long term. Different tasks also need different model strengths: some rely more on visual understanding, some on long-horizon planning, some on tool use, and some on low-cost batch execution.

Mobilerun’s model-agnostic route separates device control, task execution, log tracing, and model choice. Developers can stabilize the device-side flow first, then switch models based on cost, accuracy, and latency.

This helps real deployment. Enterprises will not rewrite the device control layer just because one model demo looks good. It is more reasonable to keep a unified execution framework and treat the model as a replaceable component.

Suitable Scenarios

Mobilerun currently fits several needs:

  • Mobile app QA and regression testing.
  • Extracting data from native apps and returning structured results.
  • Automatically executing repetitive phone tasks.
  • Packaging natural-language mobile operation flows for non-technical users.
  • Running automation tasks across multiple devices.
  • Connecting schedules, notifications, or custom triggers to mobile workflows.

It is not yet a consumer-grade assistant that takes over your phone immediately after installation. Android requires ADB, developer options, USB debugging, and the Portal app; iOS has its own integration flow. To run reliably, you still need model configuration, device state handling, permission popups, and task failure recovery.

My Take

Mobilerun’s value is that it turns mobile device control into a programmable, observable, model-replaceable agent framework. It recognizes that mobile automation is not only a model problem, but a system problem involving models, devices, executors, logs, tools, and cloud infrastructure.

In the short term, it is suitable for developers building mobile automation prototypes and internal tools. In the long term, frameworks like this may become “AI workflow engines on phones”. If GUI agents are to enter real business use, projects that combine local execution, cloud devices, structured output, and traceability will become increasingly important.

Project link: droidrun/mobilerun

记录并分享
Built with Hugo
Theme Stack designed by Jimmy