I recently organized four mobile GUI agent projects in a row: MobiAgent, Mobile-Agent, Mobilerun, and mobile-use. They are all about “letting AI operate phones or mobile apps”, but their positioning is not the same.
In short: MobiAgent is closer to a customizable research system for phone agents; Mobile-Agent is Tongyi Lab’s body of work around GUI agents; Mobilerun is more of a practical local/cloud mobile device control framework; and mobile-use emphasizes real app operation, task decomposition, data extraction, and AndroidWorld evaluation.
Basic Information Comparison
| Project | Site Article | GitHub | Main Positioning | Device/Platform | License | Best For |
|---|---|---|---|---|---|---|
| MobiAgent | Site intro | IPADS-SAI/MobiAgent | Customizable phone GUI agent system with models, runner, memory, acceleration, and evaluation | Mainly Android/Harmony phones | Apache-2.0 | Researchers and mobile agent experiment teams |
| Mobile-Agent | Site intro | X-PLUG/MobileAgent | Tongyi Lab GUI agent family covering mobile, desktop, browser, and tool use | Phones, PCs, web pages, cloud phones/cloud desktops | MIT | People tracking GUI agent technology paths |
| Mobilerun | Site intro | droidrun/mobilerun | LLM-agnostic mobile device agent framework with CLI, Python API, and cloud device workflows | Android, iOS, local devices, cloud devices | MIT | Developers, QA, and automation workflow teams |
| mobile-use | Site intro | minitap-ai/mobile-use | Operates real mobile apps through natural language, with task decomposition, structured extraction, and AndroidWorld focus | Android devices/emulators, iOS simulators | Apache-2.0 | People building mobile app agents, data extraction, and evaluations |
MobiAgent
MobiAgent comes from IPADS-SAI and is positioned as a customizable phone agent system. It is not just an execution script. It puts the MobiMind model family, AgentRR action recording and replay, the MobiFlow benchmark, phone runners, data collection, and an Android app into one system.
Its main strength is the completeness of the research system. MobiAgent cares about accuracy, efficiency, memory, and reusable action sequences in real phone tasks. The user profile memory, experience memory, action memory, and multi-task execution mentioned in the README all show that it is trying to handle long-horizon and repeated tasks.
Its entry barrier is also relatively high. A full setup requires devices, ADB, model deployment, dependencies, and optional vector database and graph database configuration. It is better suited to research or engineering experiments than to an “install and use immediately” phone assistant for ordinary users.
Mobile-Agent
Mobile-Agent comes from X-PLUG/Tongyi Lab. The repository has grown from an early phone operation agent into a GUI agent family: Mobile-Agent-v1/v2/v3/v3.5, Mobile-Agent-E, PC-Agent, GUI-Critic-R1, UI-S1, GUI-Owl, ToolCUA, and more all sit on the same technical line.
Its defining feature is breadth. Mobile-Agent is not only about phones; it also covers desktop, browser, cloud phones, cloud desktops, GUI perception, grounding, error diagnosis, reinforcement learning, and GUI/tool path orchestration. The GUI-Owl model series makes it feel more like a cross-platform GUI agent foundation-model track than a single mobile automation project.
The weakness also comes from that breadth: the repository is more like a collection of research results, so users first need to decide which subproject, model, and scenario they actually want to run. It is good for tracking technical evolution and reproducing experiments, but it may not be the fastest choice for plugging into a business workflow.
Mobilerun
Mobilerun comes from droidrun and is more engineering-oriented: it lets LLM agents control Android and iOS devices through natural language. It provides CLI, TUI, Docker, Python API, portal-based control, vision mode, reasoning mode, structured output, custom tools, app cards, execution traces, and cloud device services.
Its most prominent quality is model agnosticism and clear deployment shape. Developers can connect OpenAI, Anthropic, Gemini, Ollama, DeepSeek, OpenRouter, or OpenAI-compatible providers; they can also choose a local framework or Mobilerun Cloud. For real teams, this separation between the device control layer and the model layer matters a lot.
It still has the usual mobile automation barriers. Android requires developer options, USB debugging, and the Portal app; iOS has a separate flow; complex tasks also need to handle permission popups, page changes, retries after failure, and log investigation. It is better for people willing to use mobile agents as engineering components.
mobile-use
mobile-use comes from minitap-ai and aims to let AI agents use real Android and iOS apps. It supports natural-language control, UI-aware automation, data extraction, and different LLM configurations, and it emphasizes AndroidWorld benchmark performance. Its README also says the project is the first agentic framework to reach 100% on the AndroidWorld benchmark.
Its highlight is task decomposition and structured extraction. For example, finding unread email in Gmail and returning the sender and subject in a specified JSON format is much closer to real production needs than simply “opening Settings and checking the battery level”. It pushes mobile GUI agents from “can operate” toward “can organize information from apps”.
Its limitations are mainly device support and runtime environment. Android can use physical phones or emulators; iOS currently mainly supports simulators on macOS, while physical iOS devices are not yet supported. Docker quick start is also mainly aimed at Android. When evaluating it, first confirm whether the target device and app scenario are covered by the current execution path.
Feature Comparison
| Feature Dimension | MobiAgent | Mobile-Agent | Mobilerun | mobile-use |
|---|---|---|---|---|
| Natural-language tasks | Supported | Supported | Supported | Supported |
| Real phone operation | Strong, Android/Harmony oriented | Strong, includes mobile and cloud phones | Strong, Android/iOS | Strong, Android; iOS leans simulator |
| Desktop/browser expansion | Not the focus | Strong, includes PC-Agent, GUI-Owl, ToolCUA | Not the main positioning | Not the main positioning |
| Model layer | Includes MobiMind series | GUI-Owl and Mobile-Agent series | LLM-agnostic, connects many models | Configurable with multiple LLMs |
| Executor/runner | Strong, includes ADB runner and multi-task runner | Provided separately by subprojects | Strong, CLI/TUI/Python API/Docker | Source code, Docker, and platform entry points |
| Memory ability | User profile, experience, and action memory | v3/v3.5 emphasize memory and reflection | More about traces, logs, and engineering debugging | More about task decomposition and stateful execution |
| Evaluation | MobiFlow | Multiple paper/benchmark directions | Has benchmark result entry points | Strong AndroidWorld performance |
| Cloud devices | Not the main selling point | Supports cloud phone/cloud desktop experiences | Mobilerun Cloud is a focus | Has platform entry points |
| Structured output | Can be implemented through engineering flows | Depends on the subproject | Explicitly supported | Explicitly supported |
Strengths and Weaknesses
MobiAgent’s strength is system completeness. It is suitable for studying the closed loop of models, memory, acceleration, and evaluation for phone GUI agents. Its weakness is the long deployment chain, heavy engineering configuration, and relatively high onboarding cost for ordinary developers.
Mobile-Agent’s strength is the broadest technical path. It shows GUI agents evolving from phones to desktops, browsers, tool use, and foundation models. Its weakness is the complexity of the project family: if you want to land one specific scenario directly, you need to do more filtering first.
Mobilerun’s strength is a clear engineering interface, model agnosticism, and explicit separation between local framework and cloud service. It is suitable for integrating mobile device automation into products or internal tools. Its weakness is that it still has to deal with mobile device permissions, environments, app state, and cloud cost.
mobile-use’s strength is its focus on real app usage, task decomposition, and structured data extraction. The AndroidWorld angle also makes it easier to evaluate. Its weakness is limited support for physical iOS devices, and a complete setup still requires model, device, and runtime configuration.
Suggested Use Cases
If you want to research mobile agents, look first at MobiAgent and Mobile-Agent. The former focuses more on a closed loop for phone-side systems, while the latter is better for observing the cross-platform evolution of GUI agents.
If you want mobile app automation, QA, data extraction, or internal workflows, look first at Mobilerun and mobile-use. Mobilerun is more like a runtime framework that can plug into engineering systems, while mobile-use is better for validating natural-language app operation and structured extraction.
If you care about future personal-assistant forms, all four are worth tracking. MobiAgent represents systematic research on phone agents, Mobile-Agent represents the cross-platform GUI agent path, Mobilerun represents device-control infrastructure, and mobile-use represents real-app task decomposition and evaluation-driven development.
My Take
The differences between these four projects show that mobile GUI agents are no longer just about “letting a model look at screenshots and tap buttons”. The real questions have become: how models understand interfaces, how executors control devices reliably, how tasks are decomposed and evaluated, how cloud devices are managed, how results are returned in structured form, and how risks are constrained.
In the short term, the most realistic landing scenarios are QA, data extraction, internal workflow automation, and controlled device pools. In the long run, whoever can stabilize device control, model capability, permission boundaries, log tracing, and user confirmation mechanisms will be closer to a truly usable mobile AI assistant.