SuperBased's MCP server gives AI agents 72 tools to drive your screen directly. Click, type, scroll, drag, fill forms, handle dialogs, switch tabs and virtual desktops, click system tray icons, find images on screen — all under one approval. Humanization defeats CAPTCHA classifiers. Cross-platform Windows + macOS.
Grouped by purpose. Each tool is callable over MCP stdio (Claude Code / Cursor / Windsurf / Cline / OpenCode / Zed / Copilot CLI) or HTTP (Codex). All tools normalize across Windows + macOS where the underlying OS supports them.
screenshotcapture_imagecapturegallery_imagewindow_listdisplay_listfind_imagecapture_template
aiocrcompress_textdescribe_framesnarrate
recordingsessionsexportdiffbaseline
dictatetranscribedictation_historystt_status
clicktypehotkeyscrolldraghoverpixel_colormouse_positionwaitwait_forlocateui_dumpaccessibility_tree
sequencescroll_toscroll_capturefind_title_bar_drag_region
ax_invokeform_filldialog_handlecontext_menu_selectdrag_file
window_statewindow_boundsresize_windowfocus_windowlaunch_appopen_urltab_managementfind_in_page
tray_clickvirtual_desktopdoctor_gui_automation
dry_runreplayundo_last
projectworkspace_synctools
settingspresetsgallerygallery_updatehealthauthlicenseai_usageredactannotateclipboard
Most agents lose half their tokens to single-step round-trips: click → approve → screenshot → approve → click → approve. superbased_sequence bundles N steps into one MCP call with a single approval, a single activation window, and screenshots returned inline. The whole flow completes in seconds with stable focus.
// Agent fills a login form, waits for the dashboard, captures the result. // One approval. One activation. Stable focus. Inline screenshot at the end. await superbased_sequence({ steps: [ { action: "click", label: "Email" }, // resolves via OCR + AX tree { action: "type", text: "agent@example.com", humanize: "human" }, { action: "hotkey", combo: "Tab" }, { action: "type", text: "********", humanize: "human" }, { action: "click", label: "Sign In", modifiers: [] }, { action: "wait_for", condition: { type: "window", title: "Dashboard" }, timeoutMs: 5000 }, { action: "screenshot" } // MCP image content block, inline ], processName: "chrome", confirm: true, stopOnError: true }) › ok: 7 steps, 3.4s total › matchedWindow: "chrome.exe / Sign in - Acme - Google Chrome" › screenshot: 1920×1080 PNG returned as MCP image content block › audit-log: 7 NDJSON entries written to ~/.superbased/audit.log // Replay the same trajectory byte-for-byte await superbased_replay({ sessionId: "abc123", dryRun: false })
Per-call humanize override on every write tool. Substitutes the cheap atomic-input path with an empirically calibrated humanization layer — Bezier-curved cursor approaches with sin-shaped velocity envelopes, gamma-distributed inter-key timing, gaussian click-target jitter, and click + key hold variation. Active by default at 'light'.
Linear, machine-fast. Pre-v2.0 behavior. Use for non-adversarial automation where speed matters.
Modest curvature. Gaussian click jitter (1.5px). Gamma keystrokes. Click hold variation. Sin velocity envelope on cursor walks.
Realistic curves with overshoot. 3.0px click jitter. Pre-click tremor. Rare 2–4× micro-pauses. 45–95ms key holds.
Max curves + 40% overshoot. 4.5px jitter. Pre-click tremor (4 micro-moves). Typo + correct sequences. Inter-action catch-up pause.
Three classes of detection are out of scope at the OS-input layer. OS-level synthetic-event flags (LLMHF_INJECTED on Win32, kCGEventSourceUserData on macOS) require a signed driver to bypass. Browser fingerprint (canvas, WebGL, JA3, CDP markers) requires a stealth-patched browser. Multi-signal session score (Cloudflare Turnstile, reCAPTCHA V3 Enterprise, Datadome) needs a residential IP + stealth browser to score low-risk. Pair humanization with the right stack for the threat model.
Same npm package as the headless mode of the desktop app — superbased — exposes the full 72-tool MCP surface to any client over stdio or HTTP.
# 1. Install the npm package globally $ npm install -g superbased # 2. Sign in (optional — most tools work without an account) $ superbased auth login # 3. Wire MCP into your AI editor of choice. Claude Code: $ claude mcp add superbased -- superbased mcp # Or paste this into ~/.claude/settings.json (and equivalents for Cursor / Windsurf / Codex): { "mcpServers": { "superbased": { "command": "superbased", "args": ["mcp"] } } } # 4. Opt into GUI automation (off by default — explicit user consent required) $ superbased config set guiAutomation.enabled true # 5. Verify the install with the hermetic doctor probe $ superbased doctor ✓ health: ok ✓ auth: signed in as agent@example.com ✓ guiAutomation: enabled (master toggle ON) ✓ kill switch: Ctrl+Shift+Esc registered ✓ audit log: writing to ~/.superbased/audit.log ✓ 72 MCP tools registered, 13 resources available
GUI automation is the most privileged surface SuperBased exposes. The defaults err toward refusal, every action requires explicit consent at the right level, and there's a kill switch you can hit at any time.
guiAutomation.enabled defaults to false. The user explicitly opts in via superbased config set guiAutomation.enabled true or the GUI. Refused actions return {ok:false, error, hint} — never throw.
Three groups: actions.safe (read-like — wait, mouse_position, ui_dump), actions.write (click, type, hotkey, scroll, drag, hover), actions.destructive (drag_file, launch_app, window_state action='close'). Each independently toggleable.
Every write call requires confirm: true unless the user explicitly disables the prompt via guiAutomation.requireConfirmFlag = false. Calls without confirm get refused with a hint.
Calls into the protected-apps blocklist (password managers, banking) get refused regardless of toggles. SuperBased self-targeting returns SELF_TARGET_REFUSED — agents cannot drive the SuperBased UI itself.
Configurable. Hits at any time, immediately aborts all in-flight automation. Agents cannot suppress it. The shortcut is registered globally as long as guiAutomation.enabled is true.
Every action — accepted or refused — is appended to ~/.superbased/audit.log as one NDJSON line with sessionId, tool, args, result, timing, humanization params. Replay any range via superbased_replay.
Each adapter normalizes through the same MCP surface. Code your agent once; it runs identically on Windows and macOS.
nut-js + native fallbacks (PowerShell + UIAutomationClient.dll for AX patterns, EnumWindows for popup detection, Shell_TrayWnd cross-process reads for tray, MOUSEEVENTF_HWHEEL for horizontal scroll).
nut-js CGEventPost + osascript System Events AX + Electron desktopCapturer + /usr/bin/open + @superbased/macos-ax binding. Form-fill, dialog-handle, virtual-desktop, find-in-page, tab-management all working via S9 + Track C.
Capture, OCR, gallery, recording, and dictation work today. GUI automation deferred — no platform adapter yet. PRs welcome via the open MCP server repo.
Computer Use and Operator are vendor-locked: they only work with Anthropic's or OpenAI's models, and their automation runs in the vendor's environment (a sandboxed VM you don't control). SuperBased runs locally on your real desktop, exposes the same surface to ANY MCP client (Claude Code, Cursor, Codex, Cline, OpenCode, Windsurf, Zed, Copilot CLI, or your own), uses the OS accessibility tree for reliable element resolution, and ships humanization that lets agents pass real-world CAPTCHAs. Different threat model, different deployment model.
Three-tier reliability pyramid: top is automationId / AXIdentifier (UIA AutomationId on Windows, AXIdentifier on macOS — set by the app developer, never moves). Middle is role + name via the accessibility tree. Bottom is OCR-resolved label. Each tool falls through automatically. superbased_ax_invoke bypasses synthesized clicks entirely and invokes UIA patterns (Invoke, Toggle, SelectionItem.Select, Value.SetValue) — the most reliable rung when the target supports it.
It defeats the input-trajectory and timing classifiers. reCAPTCHA V2 (the "I'm not a robot" checkbox) and similar puzzle-style CAPTCHAs (hCaptcha, GeeTest, KeyCAPTCHA, Turnstile) typically score the input layer alongside the browser fingerprint and session signals. Humanization fixes the input layer; you still need a stealth-patched browser and a residential IP for the others. The honest answer: 'paranoid' humanization + stealth browser + residential proxy gets you through most defenses; just humanization on a default Chrome profile from a datacenter IP does not.
The agent's first GUI automation call returns {ok:false, error:"GUI_AUTOMATION_DISABLED", hint:"Run: superbased config set guiAutomation.enabled true"}. The user runs that command (or flips the toggle in Settings), and from then on the agent operates within the configured per-action toggles. Re-disable any time with the same command and false.
Ctrl+Shift+Esc is registered as a global hotkey via uiohook-napi. When pressed, it sets a global abort flag and emits a JS event that all in-flight humanization loops check between micro-steps. The longest the agent can ignore the abort is the duration of the current atomic OS call (typically <50ms). The audit log records the abort event with the action that was in progress.
Yes for most tools. Capture, OCR (local Tesseract), GUI automation, recording, dictation (with local sherpa-onnx STT), and gallery all work offline. AI vision tools (superbased_ai, superbased_describe_frames, superbased_narrate) require either a signed-in cloud account or an Ollama instance running locally (configure via ollama.routing.enabled = true). License validation has a 7-day offline grace period.
The agent-facing reference is shipped with the desktop app at desktop/SUPERBASED_SKILL.md — every tool with parameters, return shapes, decision guides, error codes, and full Common Workflows section (including CAPTCHA solving for reCAPTCHA V2/V3, hCaptcha, Cloudflare Turnstile, GeeTest, KeyCAPTCHA, MTCaptcha, click-sequence puzzles, rotation puzzles). It's also surfaced via the SuperBased MCP plugin for Claude Code, Cursor, Codex, and Copilot CLI.
SuperBased Agents drive your screen. SuperBased Observer watches what your AI tools are spending while they do it — per-model cost breakdowns, long-context-tier-aware pricing, waste detection, conversation compression. Free, open source, local-only.
Free to install. 72 MCP tools. Off by default. Cross-platform. Apache-2.0 plugins for every major AI editor.