MobiZen-GUI

VLM-based mobile automation framework — control Android devices via natural language.

Repo: https://github.com/alibaba/MobiZen-GUI

1. Environment Setup

1.1 Install ADB

CODEBLOCK0

1.2 Connect Device & Install ADBKeyboard

CODEBLOCK1
Then on device: Settings → System → Languages & Input → Virtual Keyboard → Enable ADBKeyboard.

1.3 Install Project

CODEBLOCK2

2. Quick Start (Config Only, No Code Changes)

Copy example config:

CODEBLOCK3

Only 3 fields need to be configured — api_key, base_url, model_name:

CODEBLOCK4

How to set these 3 fields: When the user asks to run a phone task but hasn't configured yet, the AI should ask the user to provide api_key, base_url, and model_name, then write them into my_config.yaml. The user can also manually edit the file. Any OpenAI-compatible API works.

Provider examples:

CODEBLOCK5

Run:

CODEBLOCK6

3. Configuration Reference

Field	Default	Description
INLINECODE7	INLINECODE8 (auto)	ADB device; null = first available
INLINECODE9

4. Advanced: Deploy MobiZen-GUI-4B Locally

For best results on Chinese mobile tasks, deploy the dedicated 4B model.

4.1 Download Model

CODEBLOCK7

Alternatively from ModelScope: https://modelscope.cn/models/GUIAgent/MobiZen-GUI-4B

4.2 Serve with vLLM

CODEBLOCK8

4.3 Point Config to Local Model

CODEBLOCK9

Then run as usual: python main.py --config my_config.yaml --instruction "..."

5. Customization (Requires Code Changes)

The framework uses a plugin architecture — three components can be swapped via config class paths:

Component	Role	Base Class	Default Implementation
MessageBuilder	Builds prompt + screenshot for model	INLINECODE36	INLINECODE37
ModelClient

5.1 Custom Model Client

For non-OpenAI-compatible APIs:

CODEBLOCK10

Config:
CODEBLOCK11

5.2 Custom Message Builder

To change the system prompt or how screenshots/history are formatted:

CODEBLOCK12

Config:
CODEBLOCK13

5.3 Custom Response Parser

To parse a different model output format:

CODEBLOCK14

Action dict format: {"arguments": {"action": "<type>", ...}} — supported types: click, long_press, swipe, type, system_button, wait, terminate.

Config:
CODEBLOCK15

5.4 Add New Action Type

1. Add _execute_<action>(self, args) method in INLINECODE51
Add dispatch branch in INLINECODE52
Update system prompt in INLINECODE53

6. Troubleshooting

- Device not found: Run adb devices — check USB/wireless connection
ADBKeyboard not working: Ensure enabled in device settings; test: INLINECODE55
Model connection error: Verify base_url + api_key; check network
Coordinate mismatch: Ensure model_type matches your model; check screen size: INLINECODE59
Duplicate action loop: Agent auto-stops after 5 identical actions; may indicate model confusion

技能名称：mobizen-gui

MobiZen-GUI

基于VLM的移动端自动化框架——通过自然语言控制Android设备。

仓库地址：https://github.com/alibaba/MobiZen-GUI

1. 环境配置

1.1 安装ADB

bash

macOS

brew install android-platform-tools

Linux

sudo apt-get install android-tools-adb

Windows：从 https://developer.android.com/studio/releases/platform-tools 下载

adb version # 验证安装

1.2 连接设备并安装ADBKeyboard

bash
adb devices # USB连接；或：adb tcpip 5555 && adb connect :5555
adb install ADBKeyboard.apk # 从 https://github.com/senzhk/ADBKeyBoard 下载

然后在设备上操作：设置 → 系统 → 语言与输入法 → 虚拟键盘 → 启用ADBKeyboard。

1.3 安装项目

bash
git clone https://github.com/alibaba/MobiZen-GUI.git && cd MobiZen-GUI
pip install -r requirements.txt # openai, pillow, pyyaml

2. 快速开始（仅需配置，无需修改代码）

复制示例配置文件：

bash
cp configexample.yaml myconfig.yaml

只需配置 3个字段 — apikey、baseurl、model_name：

yaml
api_key: your-api-key-here
base_url: https://api.openai.com/v1 # 模型端点地址
model_name: gpt-4o # 模型标识符

如何设置这3个字段：当用户要求执行手机任务但尚未配置时，AI应要求用户提供apikey、baseurl和modelname，然后将它们写入myconfig.yaml。用户也可以手动编辑该文件。任何兼容OpenAI的API均可使用。

提供商示例：

yaml

OpenAI

base_url: https://api.openai.com/v1
api_key: sk-...
model_name: gpt-4o

DeepSeek / Moonshot / 智谱AI 等

base_url: https://api.deepseek.com/v1 api_key: your-key model_name: deepseek-chat

Ollama（本地）

base_url: http://localhost:11434/v1 api_key: dummy model_name: llava

运行：

bash
python main.py --config my_config.yaml --instruction 打开微信并发送消息

3. 配置参考

字段	默认值	描述
deviceid	null（自动）	ADB设备；null表示第一个可用设备
apikey

4. 进阶：本地部署MobiZen-GUI-4B

为在中文移动端任务上获得最佳效果，请部署专用4B模型。

4.1 下载模型

bash
pip install -U huggingface_hub

中国镜像（可选）

export HF_ENDPOINT=https://hf-mirror.com
hf download alibabagroup/MobiZen-GUI-4B --local-dir ./MobiZen-GUI-4B

或从ModelScope下载：https://modelscope.cn/models/GUIAgent/MobiZen-GUI-4B

4.2 使用vLLM提供服务

bash
pip install vllm==0.11.0
vllm serve ./MobiZen-GUI-4B --host 0.0.0.0 --port 8000 --trust-remote-code

4.3 将配置指向本地模型

yaml
api_key: dummy
base_url: http://localhost:8000/v1
model_name: MobiZen-GUI-4B
model_type: qwen3vl

然后正常运行：python main.py --config my_config.yaml --instruction ...

5. 自定义（需要修改代码）

该框架采用插件架构——三个组件可通过配置类路径进行替换：

组件	角色	基类	默认实现
MessageBuilder	为模型构建提示词和截图	core.messagebuilders.base.BaseMessageBuilder	core.messagebuilders.qwen.QwenMessageBuilder
ModelClient

5.1 自定义模型客户端

适用于非OpenAI兼容的API：

python

core/modelclients/myclient.py

from .base import BaseModelClient

class MyClient(BaseModelClient):
def init(self, apikey: str, baseurl: str = None, model: str = , timeout: int = 60):
pass # 初始化客户端

def chat(self, messages, kwargs):
pass # 必须返回包含 .choices[0].message.content 的对象

配置：
yaml
modelclientclass: core.modelclients.myclient.MyClient
modelclientkwargs: {} # 传递给init的额外参数

5.2 自定义消息构建器

用于更改系统提示词或截图/历史记录的格式：

python

core/messagebuilders/mybuilder.py

from .base import BaseMessageBuilder
from utils.image import imagetodata_url

class MyBuilder(BaseMessageBuilder):
def buildsystemprompt(self, kwargs) -> str:
return 你的系统提示词

def buildmessages(self, instruction, currentscreenshot, history, kwargs):
return [{role: system, content: [...]}, {role: user, content: [...]}]

配置：
yaml
messagebuilderclass: core.messagebuilders.mybuilder.MyBuilder

5.3 自定义响应解析器

用于解析不同模型输出格式：

python

core/responseparsers/myparser.py

from .base import BaseResponseParser, ParsedResponse

class MyParser(BaseResponseParser):
def parse(self, response) -> ParsedResponse:
content = response.choices[0].message.content
# 将内容解析为结构化字段
return ParsedResponse(
thought=...,
summary=...,
action={arguments: {action: click, coordinate: [x, y]}},
subtask=...
)

动作字典格式：{arguments: {action: , ...}} — 支持的类型：click、longpress、swipe、type、systembutton、wait、terminate。

配置：
yaml
responseparserclass: core.responseparsers.myparser.MyParser

5.4 添加新动作类型

1. 在 core/executor/actionexecutor.py 中添加 execute(self, args) 方法
在 ActionExecutor.execute() 中添加分发分支
更新 QwenMessageBuilder.buildsystem_prompt() 中的系统提示词

6. 故障排除

- 设备未找到：运行 adb devices — 检查USB/无线连接
ADBKeyboard不工作：确保在设备设置中已启用；测试：adb shell am broadcast -a ADBINPUTTEXT --es msg test
模型连接错误：验证 baseurl + apikey；

mobizen-guiMobizen图形界面

mobizen-gui

MobiZen-GUI

1. Environment Setup

1.1 Install ADB

1.2 Connect Device & Install ADBKeyboard

1.3 Install Project

2. Quick Start (Config Only, No Code Changes)

3. Configuration Reference

4. Advanced: Deploy MobiZen-GUI-4B Locally

4.1 Download Model

4.2 Serve with vLLM

4.3 Point Config to Local Model

5. Customization (Requires Code Changes)

5.1 Custom Model Client

5.2 Custom Message Builder

5.3 Custom Response Parser

5.4 Add New Action Type

6. Troubleshooting

MobiZen-GUI

1. 环境配置

1.1 安装ADB

macOS

Linux

Windows：从 https://developer.android.com/studio/releases/platform-tools 下载

1.2 连接设备并安装ADBKeyboard

1.3 安装项目

2. 快速开始（仅需配置，无需修改代码）

OpenAI

DeepSeek / Moonshot / 智谱AI 等

Ollama（本地）

3. 配置参考

4. 进阶：本地部署MobiZen-GUI-4B

4.1 下载模型

中国镜像（可选）

4.2 使用vLLM提供服务

4.3 将配置指向本地模型

5. 自定义（需要修改代码）

5.1 自定义模型客户端

core/modelclients/myclient.py

5.2 自定义消息构建器

core/messagebuilders/mybuilder.py

5.3 自定义响应解析器

core/responseparsers/myparser.py

5.4 添加新动作类型

6. 故障排除

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement