Expose devices as an MCP service

MCP (Model Context Protocol) is a protocol standard that lets AI models interact with external tools and capabilities.

Midscene provides MCP services that expose atomic operations in Midscene Agent (each Action in the Action Space) as MCP tools. Upper-layer Agents can use natural language to inspect the UI, precisely operate UI elements, and run automation tasks without needing to understand the underlying implementation.

Because Midscene Agent relies on a multimodal model, configure the environment variables required by Midscene inside the MCP service instead of reusing the upstream Agent's model configuration.

MCP tool list

Tool nameDescription
Device connections such as web_connect, ios_connect, android_connect, computer_connectConnect to target devices such as browsers, iOS devices, Android devices, or computer desktops
take_screenshotGet the latest screenshot
assertAssert a natural language statement against the current page/screen
Device actionsEach Action in the Action Space, such as Tap, Scroll, etc.

View execution reports

After each interaction finishes, Midscene generates a task report. You can open it directly in the command line:

open report_file_name.html

The report includes detailed interaction information such as screenshots, operation logs, and error details to help with debugging and troubleshooting.

Configure MCP

Browser Bridge Mode

@midscene/web-bridge-mcp exposes the Chrome extension Bridge Mode as an MCP service.

Environment preparation

Refer to Chrome Bridge Mode to ensure the browser extension starts correctly. We recommend enabling Background Bridge Mode, which allows the connection to run persistently in the background without manual intervention and won't disconnect when closing the extension popup.

Background Bridge Mode

With background bridge mode enabled, the MCP service can connect at any time without user intervention. See Background Bridge Mode for details.

Configuration

Add the Midscene Web Bridge MCP server (@midscene/web-bridge-mcp) in your MCP client. For model configuration parameters, see Model strategy.

{
  "mcpServers": {
    "midscene-web": {
      "command": "npx",
      "args": ["-y", "@midscene/web-bridge-mcp"],
      "env": {
        "MIDSCENE_MODEL_BASE_URL": "replace with your model service URL",
        "MIDSCENE_MODEL_API_KEY": "replace with your API Key",
        "MIDSCENE_MODEL_NAME": "replace with your model name",
        "MIDSCENE_MODEL_FAMILY": "replace with your model family",
        "MCP_SERVER_REQUEST_TIMEOUT": "600000"
      }
    }
  }
}

iOS MCP service

Environment preparation

  • AI model service: Prepare an OpenAI API Key or another supported AI model service. See Model strategy for more details.
  • Device setup: Follow iOS Getting Started to configure WebDriverAgent, certificates, and device connections, and make sure WebDriverAgent is running. You can verify screenshots and basic operations in iOS Playground.

Configuration

Add the Midscene iOS MCP server (@midscene/ios-mcp) in your MCP client. For model configuration parameters, see Model strategy.

{
  "mcpServers": {
    "midscene-ios": {
      "command": "npx",
      "args": ["-y", "@midscene/ios-mcp"],
      "env": {
        "MIDSCENE_MODEL_BASE_URL": "replace with your model service URL",
        "MIDSCENE_MODEL_API_KEY": "replace with your API Key",
        "MIDSCENE_MODEL_NAME": "replace with your model name",
        "MIDSCENE_MODEL_FAMILY": "replace with your model family",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

Android MCP service

Environment preparation

  • AI model service: Prepare an OpenAI API Key or another supported AI model service. See Model strategy for more details.
  • Device setup: Follow Android Getting Started to configure adb and connect your device. Ensure adb devices can recognize the target device. Use Android Playground to verify screenshots and basic operations.

Configuration

Add the Midscene Android MCP server (@midscene/android-mcp) in your MCP client. For model configuration parameters, see Model strategy.

{
  "mcpServers": {
    "midscene-android": {
      "command": "npx",
      "args": ["-y", "@midscene/android-mcp"],
      "env": {
        "MIDSCENE_MODEL_BASE_URL": "replace with your model service URL",
        "MIDSCENE_MODEL_API_KEY": "replace with your API Key",
        "MIDSCENE_MODEL_NAME": "replace with your model name",
        "MIDSCENE_MODEL_FAMILY": "replace with your model family",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

Computer Desktop MCP service

@midscene/computer-mcp exposes the computer desktop automation capabilities as an MCP service, allowing AI to control your computer through mouse, keyboard, and screenshot operations.

Environment preparation

  • AI model service: Prepare an OpenAI API Key or another supported AI model service. See Model strategy for more details.
  • System permissions: On macOS, you need to grant accessibility and screen recording permissions to the terminal or application running the MCP service.

Configuration

Add the Midscene Computer MCP server (@midscene/computer-mcp) in your MCP client. For model configuration parameters, see Model strategy.

{
  "mcpServers": {
    "midscene-computer": {
      "command": "npx",
      "args": ["-y", "@midscene/computer-mcp"],
      "env": {
        "MIDSCENE_MODEL_BASE_URL": "replace with your model service URL",
        "MIDSCENE_MODEL_API_KEY": "replace with your API Key",
        "MIDSCENE_MODEL_NAME": "replace with your model name",
        "MIDSCENE_MODEL_FAMILY": "replace with your model family",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

Improve precision (deep locate / deep think)

Two independent startup flags help when the MCP service struggles with a task:

  • --deep-locate — spends an extra round of visual reasoning to pinpoint the target element. Use it when actions tap or interact with the wrong spot (location drift / offset). It applies to every operation that locates an element (action tools such as Tap, Input, Scroll, and the locating that happens inside act).
  • --deep-think — plans the act tool with deeper reasoning (richer context and sub-goal decomposition). Use it for complex, multi-step act instructions. It only affects planning, so it has no effect on the single-step action tools.

Both trade a little speed for better results, and you can combine them. Once a flag is on, the relevant tools default to it without having to pass a parameter on every call.

For an MCP server, add the flag(s) to args:

{
  "mcpServers": {
    "midscene-android": {
      "command": "npx",
      "args": ["-y", "@midscene/android-mcp", "--deep-locate", "--deep-think"],
      "env": {
        "MIDSCENE_MODEL_BASE_URL": "replace with your model service URL",
        "MIDSCENE_MODEL_API_KEY": "replace with your API Key",
        "MIDSCENE_MODEL_NAME": "replace with your model name"
      }
    }
  }
}

The same flags work for the Agent Skill / device CLIs (midscene-android, midscene-web, midscene-ios, etc.), since they share the same tool definitions:

midscene-android --deep-locate tap --locate "the login button"

A per-call value always wins over the startup flag: action tools accept locate.deepLocate, and the act tool accepts top-level deepLocate and deepThink booleans, so an individual call can opt in or out regardless of the startup setting.

Configure Agent behavior per call

MCP tools and the device CLIs also accept common Agent behavior parameters. Use them when a target UI needs a longer settle time, a different act replanning limit, extra action context, or smaller screenshots. These parameters are the per-call form of the same Agent options documented in API reference (Common).

In the device CLIs, convert the API camelCase option name to a bare kebab-case flag and pass it on each command that should use it. The generated CLI also accepts the original camelCase alias shown in --help, but docs and examples use kebab-case:

midscene-android tap --locate "the login button" --wait-after-action 800
midscene-web act --prompt "finish checkout" --replanning-cycle-limit 30
midscene-ios assert --prompt "the success message is visible" --screenshot-shrink-factor 2

Common flags:

  • --wait-after-action <ms> maps to waitAfterAction: wait time after each action execution. The default is 300.
  • --replanning-cycle-limit <n> maps to replanningCycleLimit: maximum number of aiAct replanning cycles.
  • --ai-act-context <text> maps to aiActContext: extra background knowledge for aiAct.
  • --screenshot-shrink-factor <n> maps to screenshotShrinkFactor: shrink screenshots before sending them to the AI model.

In MCP calls, keep the API camelCase name and put it under the platform namespace: android.waitAfterAction, harmony.waitAfterAction, ios.waitAfterAction, computer.waitAfterAction, or web.waitAfterAction. The same pattern applies to replanningCycleLimit, aiActContext, and screenshotShrinkFactor.

These parameters are part of the Agent init args for that tool call. If a later call changes them, or omits them after setting them, Midscene rebuilds the Agent so the next call uses the new effective configuration. For Web tools, web.url opens or navigates to the URL each time it is supplied; omit it to keep using the current page.

Implement your own MCP

If you want to integrate Midscene tools into your own MCP service, you can use the mcpKitForAgent function to get tool definitions and expose your own MCP service as needed.

The tools provided by mcpKitForAgent include screenshots and every Action in the Action Space.

Using mcpKitForAgent

The mcpKitForAgent function takes an Agent instance and returns an object containing description and tools list:

import { mcpKitForAgent } from '@midscene/web/mcp-server';
import { Agent } from '@midscene/core/agent';

const agent = new Agent();
const { description, tools } = await mcpKitForAgent(agent);

// description - "Control the browser / device using natural language commands"
// tools - Tool[] - array of tool definitions

Platform support

Each platform provides its corresponding mcpKitForAgent function:

Web platform

import { mcpKitForAgent } from '@midscene/web/mcp-server';

iOS platform

import { mcpKitForAgent } from '@midscene/ios/mcp-server';

Android platform

import { mcpKitForAgent } from '@midscene/android/mcp-server';

Computer platform

import { mcpKitForAgent } from '@midscene/computer/mcp-server';

Integrate into custom MCP service

You can integrate the obtained tools into your own MCP service:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { mcpKitForAgent } from '@midscene/web/mcp-server';

const agent = new Agent();
const { description, tools } = await mcpKitForAgent(agent);
const server = new McpServer({
  name: 'my-custom-mcp',
  version: '1.0.0',
  description
});

// Register Midscene tools to your MCP service
for (const tool of tools) {
  server.tool(tool.name, tool.description, tool.schema, tool.handler);
}