่ทณ่‡ณไธป่ฆๅ†…ๅฎน
ๅฐ้พ™่™พๅฐ้พ™่™พAI
๐Ÿค–

Agent Touch Layer

Mobile browser and native app automation via ATL (iOS Simulator). Navigate, click, screenshot, and automate web and native app tasks on iPhone/iPad simulators.

ไธ‹่ฝฝ1.3k
ๆ˜Ÿๆ ‡0
็‰ˆๆœฌ0.1.0
general
ๅฎ‰ๅ…จ้€š่ฟ‡
โš™๏ธ่„šๆœฌ

ๆŠ€่ƒฝ่ฏดๆ˜Ž


name: atl-browser description: Mobile browser and native app automation via ATL (iOS Simulator). Navigate, click, screenshot, and automate web and native app tasks on iPhone/iPad simulators. metadata: openclaw: emoji: "๐Ÿ“ฑ" requires: bins: ["xcrun", "xcodebuild", "curl"] install: - id: "atl-clone" kind: "shell" command: "git clone https://github.com/JordanCoin/Atl /Atl" label: "Clone ATL repository" - id: "atl-setup" kind: "shell" command: "/.openclaw/skills/atl-browser/scripts/setup.sh" label: "Build and install ATL to simulator"

ATL โ€” Agent Touch Layer

The automation layer between AI agents and iOS

ATL provides HTTP-based automation for iOS Simulator โ€” both browser (mobile Safari) and native apps. Think Playwright, but for mobile.

๐Ÿ”€ Two Servers: Browser & Native

ATL uses two separate servers for browser and native app automation:

ServerPortUse CaseKey Commands
Browser9222Web automation in mobile Safarigoto, markElements, clickMark, evaluate
Native9223iOS app automation (Settings, Contacts, any app)openApp, snapshot, tapRef, find
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  BROWSER SERVER (9222)     โ”‚     NATIVE SERVER (9223)      โ”‚
โ”‚  (mobile Safari/WebView)   โ”‚     (iOS apps via XCTest)     โ”‚
โ”‚                            โ”‚                                โ”‚
โ”‚  markElements + clickMark  โ”‚     snapshot + tapRef         โ”‚
โ”‚  CSS selectors             โ”‚     accessibility tree        โ”‚
โ”‚  DOM evaluation            โ”‚     element references        โ”‚
โ”‚  tap, swipe, screenshot    โ”‚     tap, swipe, screenshot    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why two ports? Native app automation requires XCTest APIs (XCUIApplication, XCUIElement) which are only available in UI Test bundles. The native server runs as a UI Test that exposes an HTTP API.

Starting the Servers

# Browser server (starts automatically with AtlBrowser app)
xcrun simctl launch booted com.atl.browser
curl http://localhost:9222/ping  # โ†’ {"status":"ok"}

# Native server (run as UI Test)
cd ~/Atl/core/AtlBrowser
xcodebuild test -workspace AtlBrowser.xcworkspace \
  -scheme AtlBrowser \
  -destination 'id=<SIMULATOR_UDID>' \
  -only-testing:AtlBrowserUITests/NativeServer/testNativeServer &
  
# Wait for it to start, then:
curl http://localhost:9223/ping  # โ†’ {"status":"ok","mode":"native"}

Quick Port Reference

TaskPortExample
Browse websites9222curl localhost:9222/command -d '{"method":"goto",...}'
Open native app9223curl localhost:9223/command -d '{"method":"openApp",...}'
Screenshot (browser)9222curl localhost:9222/command -d '{"method":"screenshot"}'
Screenshot (native)9223curl localhost:9223/command -d '{"method":"screenshot"}'

๐Ÿ“ฑ Native App Automation (Port 9223)

Native automation uses port 9223 and automates any iOS app using the accessibility tree โ€” no DOM, no JavaScript, just direct element interaction.

Opening & Closing Apps

# Open an app by bundle ID
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# โ†’ {"success":true,"result":{"bundleId":"com.apple.Preferences","mode":"native","state":"running"}}

# Check current app state
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"appState"}'
# โ†’ {"success":true,"result":{"mode":"native","bundleId":"com.apple.Preferences","state":"running"}}

# Close current app
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"closeApp"}'
# โ†’ {"success":true,"result":{"closed":true}}

Common Bundle IDs

AppBundle ID
Settingscom.apple.Preferences
Contactscom.apple.MobileAddressBook
Calculatorcom.apple.calculator
Calendarcom.apple.mobilecal
Photoscom.apple.mobileslideshow
Notescom.apple.mobilenotes
Reminderscom.apple.reminders
Clockcom.apple.mobiletimer
Mapscom.apple.Maps
Safaricom.apple.mobilesafari

The snapshot Command

snapshot returns the accessibility tree โ€” all visible elements with their properties and tap-able references.

curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result'

Example output:

{
  "count": 12,
  "elements": [
    {
      "ref": "e0",
      "type": "cell",
      "label": "Wi-Fi",
      "value": "MyNetwork",
      "identifier": "",
      "x": 0,
      "y": 142,
      "width": 393,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    },
    {
      "ref": "e1",
      "type": "cell",
      "label": "Bluetooth",
      "value": "On",
      "identifier": "",
      "x": 0,
      "y": 186,
      "width": 393,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    },
    {
      "ref": "e2",
      "type": "button",
      "label": "Back",
      "value": null,
      "identifier": "Back",
      "x": 0,
      "y": 44,
      "width": 80,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    }
  ]
}

Parameters:

  • interactiveOnly (bool, default: false) โ€” Only return hittable elements
  • maxDepth (int, optional) โ€” Limit tree traversal depth

The tapRef Command

Tap an element by its reference from the last snapshot:

# Take snapshot first
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}'

# Tap element e0 (Wi-Fi cell from example above)
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"tapRef","params":{"ref":"e0"}}'
# โ†’ {"success":true}

The find Command

Find and interact with elements by text โ€” no need to parse snapshot manually:

# Find and tap "Wi-Fi"
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# โ†’ {"success":true,"result":{"found":true,"ref":"e0"}}

# Check if an element exists
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Bluetooth","action":"exists"}}'
# โ†’ {"success":true,"result":{"found":true,"ref":"e1"}}

# Find and fill a text field
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"First name","action":"fill","value":"John"}}'

# Get element info without interacting
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Cancel","action":"get"}}'
# โ†’ {"success":true,"result":{"found":true,"ref":"e5","element":{...}}}

Parameters:

  • text (string) โ€” Text to search for (matches label, value, or identifier)
  • action (string) โ€” One of: tap, fill, exists, get
  • value (string, optional) โ€” Text to fill (required for action:"fill")
  • by (string, optional) โ€” Narrow search: label, value, identifier, type, or any (default)

๐Ÿ”„ Native App Workflow Example

Here's a complete flow: open Settings, navigate to Wi-Fi, take a screenshot:

# 1. Open Settings app
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'

# 2. Wait for app to launch
sleep 1

# 3. Take snapshot to see available elements
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result.elements[:5]'

# 4. Find and tap Wi-Fi
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'

# 5. Wait for navigation
sleep 0.5

# 6. Take screenshot of Wi-Fi settings
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"screenshot"}' | jq -r '.result.data' | base64 -d > /tmp/wifi-settings.png

# 7. Navigate back (swipe right from left edge)
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"swipe","params":{"direction":"right"}}'

# 8. Close the app
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"closeApp"}'

Helper Script Version

source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh

atl_openapp "com.apple.Preferences"
sleep 1
atl_find "Wi-Fi" tap
sleep 0.5
atl_screenshot /tmp/wifi-settings.png
atl_swipe right
atl_closeapp

๐Ÿ’ก Core Insight: Vision-Free Automation

ATL's killer feature is spatial understanding without vision models:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  markElements + captureForVision = COMPLETE PAGE KNOWLEDGE  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1. markElements  โ†’ Numbers every interactive element [1] [2] [3]
2. captureForVision โ†’ PDF with text layer + element coordinates
3. tap x=234 y=567 โ†’ Pixel-perfect touch at exact position

Why this matters:

  • No vision API calls โ€” zero token cost for "seeing" the page
  • Faster โ€” no round-trip to GPT-4V/Claude Vision
  • Deterministic โ€” same page = same coordinates, every time
  • Reliable โ€” pixel-perfect coordinates vs. vision interpretation

The Vision-Free Workflow

# 1. Mark elements (adds numbered labels + stores coordinates)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"markElements","params":{}}'

# 2. Capture PDF with text layer (machine-readable, has coordinates)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"captureForVision","params":{"savePath":"/tmp","name":"page"}}' \
  | jq -r '.result.path'
# โ†’ /tmp/page.pdf (text-selectable, contains element positions)

# 3. Get specific element's position by mark label
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"getMarkInfo","params":{"label":5}}' | jq '.result'
# โ†’ {"label":5, "tag":"button", "text":"Add to Cart", "x":187, "y":432, "width":120, "height":44}

# 4. Tap at exact coordinates
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"tap","params":{"x":187,"y":432}}'

The marks tell you WHERE everything is. The PDF tells you WHAT everything says. Together = full page understanding.

๐ŸŽฏ The Escalation Ladder

When automation gets stuck, escalate through these levels:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Level 1: COORDINATES (fast, cheap, no API calls)          โ”‚
โ”‚  markElements โ†’ getMarkInfo โ†’ tap x,y                      โ”‚
โ”‚                                                             โ”‚
โ”‚  โ†“ If stuck after 2-3 tries...                             โ”‚
โ”‚                                                             โ”‚
โ”‚  Level 2: VISION FALLBACK (screenshot to understand state) โ”‚
โ”‚  screenshot โ†’ analyze UI โ†’ identify blockers (modals, etc) โ”‚
โ”‚                                                             โ”‚
โ”‚  โ†“ If still stuck...                                       โ”‚
โ”‚                                                             โ”‚
โ”‚  Level 3: JS INJECTION (direct DOM manipulation)           โ”‚
โ”‚  evaluate โ†’ dispatchEvent โ†’ force interactions             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

When to Escalate

SymptomLikely CauseAction
Tap succeeds but nothing changesModal/overlay openedScreenshot โ†’ find new button
Cart count doesn't updateSite needs login or has bot detectionTry JS click with events
Element not found after scrollMarks are page-relative, not viewportUse getBoundingClientRect via evaluate
Same error 3+ timesUI state changed unexpectedlyScreenshot to see actual state

Real-World Pattern: E-commerce Checkout

# 1. Search and find product
atl_goto "https://store.com/search?q=headphones"
atl_mark

# 2. First, dismiss any modals/banners (ALWAYS DO THIS)
# Look for: close, dismiss, continue, accept, no thanks, got it
CLOSE=$(atl_find "close")
[ -n "$CLOSE" ] && atl_click $CLOSE

# 3. Find and click Add to Cart
ATC=$(atl_find "Add to cart")
atl_click $ATC

# 4. Wait, then CHECK if it worked
sleep 2
atl_screenshot /tmp/after-click.png

# 5. If cart didn't update, LOOK at the screenshot
# Maybe a "Choose options" modal opened - find the NEW Add to Cart button
# This is the vision fallback - you need to SEE what happened

Key Insight: Modals Change Everything

When you click "Add to cart" on sites like Target, Amazon, etc., they often:

  1. Open a "Choose options" modal (size, color, quantity)
  2. Show an upsell (protection plans, accessories)
  3. Display a confirmation with "View cart" or "Continue shopping"

Your original tap WORKED โ€” you just can't see the result without a screenshot.

๐Ÿš€ Quick Start (30 seconds)

# 1. Setup (boots sim, installs ATL)
~/.openclaw/skills/atl-browser/scripts/setup.sh

# 2. Navigate somewhere
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}'

# 3. Mark elements (shows [1], [2], [3] labels)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"markElements","params":{}}'

# 4. Take screenshot
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png

# 5. Click element [1]
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"clickMark","params":{"label":1}}'

Or use the helper functions:

source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh
atl_goto "https://example.com"
atl_mark
atl_screenshot /tmp/page.png
atl_click 1

Quick Reference

Base URL: http://localhost:9222

Common Commands

# Check if ATL is running
curl -s http://localhost:9222/ping

# Navigate to URL
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}'

# Wait for page ready
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}'

# Take screenshot (returns base64 PNG)
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > screenshot.png

# Mark interactive elements (shows numbered labels)
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"4","method":"markElements","params":{}}'

# Click by mark label
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"5","method":"clickMark","params":{"label":3}}'

# Scroll page
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"6","method":"evaluate","params":{"script":"window.scrollBy(0, 500)"}}'

# Type text
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"7","method":"type","params":{"text":"Hello world"}}'

# Click by CSS selector
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"8","method":"click","params":{"selector":"button.submit"}}'

Setup (First Time)

1. Start Simulator

# Boot iPhone 17 simulator (or another device)
xcrun simctl boot "iPhone 17"

# Open Simulator app
open -a Simulator

2. Build & Install AtlBrowser

cd ~/Atl/core/AtlBrowser

# Build for simulator (RECOMMENDED: target by UDID)
# Why: name-based destinations can cause Xcode to pick an older iOS runtime (15/16)
# and fail if AtlBrowser has an iOS 17+ deployment target.
#
# 1) Find a suitable simulator UDID (iOS 17+):
#   xcrun simctl list devices available
#
# 2) Build targeting that UDID:
xcodebuild -workspace AtlBrowser.xcworkspace \
  -scheme AtlBrowser \
  -destination 'id=<SIM_UDID>' \
  -derivedDataPath /tmp/atl-dd \
  build

# Install to a specific simulator (preferred)
xcrun simctl install <SIM_UDID> \
  /tmp/atl-dd/Build/Products/Debug-iphonesimulator/AtlBrowser.app

# Launch the app
xcrun simctl launch <SIM_UDID> com.atl.browser

3. Verify Server

curl -s http://localhost:9222/ping
# Should return: {"status":"ok"}

All Available Methods

App Control (Native Mode)

MethodParamsModeDescription
openApp{bundleId}Anyโ†’NativeOpen app, switch to native mode
closeApp-NativeClose current app, return to browser mode
appState-AnyGet current mode and bundleId
openBrowser-Nativeโ†’BrowserSwitch back to browser mode

Native Accessibility

MethodParamsModeDescription
snapshot{interactiveOnly?, maxDepth?}NativeGet accessibility tree
tapRef{ref}NativeTap element by ref (e.g., "e0")
find{text, action, value?, by?}NativeFind element and interact
fillRef{ref, text}NativeTap element and type text
focusRef{ref}NativeFocus element without typing

Navigation (Browser)

MethodParamsModeDescription
goto{url}BrowserNavigate to URL
reload-BrowserReload page
goBack-BrowserGo back
goForward-BrowserGo forward
getURL-BrowserGet current URL
getTitle-BrowserGet page title

Interactions (Browser)

MethodParamsModeDescription
click{selector}BrowserClick element
doubleClick{selector}BrowserDouble-click
type{text}BothType text
fill{selector, value}BrowserFill input field
press{key}BothPress key
hover{selector}BrowserHover over element
scrollIntoView{selector}BrowserScroll to element

Mark System (Browser)

MethodParamsModeDescription
markElements-BrowserMark visible interactive elements
markAll-BrowserMark ALL interactive elements
unmarkElements-BrowserRemove marks
clickMark{label}BrowserClick by label number
getMarkInfo{label}BrowserGet element info by label

Screenshots & Capture

MethodParamsModeDescription
screenshot{fullPage?, selector?}BothTake screenshot
captureForVision{savePath?, name?}BrowserFull page PDF
captureJPEG{quality?, fullPage?}BothJPEG capture
captureLight-BrowserText + interactives only

Waiting (Browser)

MethodParamsModeDescription
waitForSelector{selector, timeout?}BrowserWait for element
waitForNavigation-BrowserWait for navigation
waitForReady{timeout?, stabilityMs?}BrowserWait for page ready
waitForAny{selectors, timeout?}BrowserWait for any selector

JavaScript (Browser)

MethodParamsModeDescription
evaluate{script}BrowserRun JavaScript
querySelector{selector}BrowserFind element
querySelectorAll{selector}BrowserFind all elements
getDOMSnapshot-BrowserGet page HTML

Cookies (Browser)

MethodParamsModeDescription
getCookies-BrowserGet all cookies
setCookies{cookies}BrowserSet cookies
deleteCookies-BrowserDelete all cookies

Touch Gestures (Both Modes)

MethodParamsModeDescription
tap{x, y}BothTap at coordinates
longPress{x, y, duration?}BothLong press (default 0.5s)
swipe{direction}BothSwipe up/down/left/right
swipe{fromX, fromY, toX, toY}BothSwipe between points
pinch{scale, duration?}BothPinch zoom (scale > 1 = zoom in)

Swipe Examples

# Swipe up (scroll down)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"swipe","params":{"direction":"up"}}'

# Swipe left (next page in carousel)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"swipe","params":{"direction":"left","distance":400}}'

# Custom swipe path
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"swipe","params":{"fromX":200,"fromY":600,"toX":200,"toY":200}}'

# Long press for context menu
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"longPress","params":{"x":150,"y":300,"duration":1.0}}'

# Pinch to zoom in
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"5","method":"pinch","params":{"scale":2.0}}'

Typical Workflow

# 1. Navigate to site
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"1","method":"goto","params":{"url":"https://www.apple.com/shop"}}'

# 2. Wait for page to load
sleep 2
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}'

# 3. Mark elements to see what's clickable
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"3","method":"markElements","params":{}}'

# 4. Take screenshot to see the marks
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"4","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png

# 5. Click a marked element (e.g., label 14)
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"5","method":"clickMark","params":{"label":14}}'

# 6. Repeat as needed

Troubleshooting

Navigation not working (goto returns success but page doesn't change)

Known issue: goto command may return success without navigating. Use JS workaround:

# Instead of goto, use evaluate to navigate
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"1","method":"evaluate","params":{"script":"location.href = \"https://example.com\"; true"}}'

# Wait for page load
sleep 3
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}'

Server not responding

# Check if app is running
xcrun simctl listapps booted | grep atl

# Restart the app
xcrun simctl terminate booted com.atl.browser
xcrun simctl launch booted com.atl.browser

# Check logs
xcrun simctl spawn booted log show --predicate 'process == "AtlBrowser"' --last 1m

Need to rebuild (iOS version changes)

cd ~/Atl/core/AtlBrowser
xcodebuild -workspace AtlBrowser.xcworkspace -scheme AtlBrowser -sdk iphonesimulator build
xcrun simctl install booted ~/Library/Developer/Xcode/DerivedData/AtlBrowser-*/Build/Products/Debug-iphonesimulator/AtlBrowser.app
xcrun simctl launch booted com.atl.browser

Port 9222 in use

The ATL server runs inside the simulator app. If port 9222 is blocked, check for other processes:

lsof -i :9222

Best Practices

1. Clean UI Before Acting

Real users dismiss popups. You should too.

# Before any workflow, check for and dismiss:
# - Cookie consent banners
# - Newsletter popups  
# - Health/privacy consent modals
# - "Download our app" prompts
atl_mark
for KEYWORD in "close" "dismiss" "no thanks" "accept" "got it" "continue"; do
  LABEL=$(atl_find "$KEYWORD")
  [ -n "$LABEL" ] && atl_click $LABEL && sleep 1
done

2. Verify State After Actions

Don't assume โ€” confirm.

atl_click $ADD_TO_CART
sleep 2
# Check if cart updated
CART=$(atl_find "cart [1-9]")
if [ -z "$CART" ]; then
  # Didn't work - take screenshot to see why
  atl_screenshot /tmp/debug.png
  echo "Action may have opened a modal - check screenshot"
fi

3. Use Viewport Coordinates for Taps

Marks give page-relative coordinates. For tap to work, the element must be visible.

# Option A: Scroll element into view first
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"1","method":"evaluate","params":{"script":"document.querySelector(\"#my-button\").scrollIntoView()"}}'

# Option B: Get viewport-relative coords via JS
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"2","method":"evaluate","params":{"script":"var r = document.querySelector(\"#my-button\").getBoundingClientRect(); JSON.stringify({x: r.x + r.width/2, y: r.y + r.height/2})"}}'

4. Screenshot is Your Debugging Superpower

When in doubt, look.

atl_screenshot /tmp/current-state.png
# Then analyze with vision or just open the file

Notes

  • ATL runs inside the iOS Simulator, sharing the host's network
  • Port 9222 is the default (matches Chrome DevTools Protocol convention)
  • The mark system shows red numbered labels on interactive elements
  • Screenshots are PNG base64-encoded; use base64 -d to decode
  • iOS 26+ compatible (fixed NWListener binding issue)

Requirements

  • macOS with Xcode installed
  • iOS Simulator (comes with Xcode)
  • That's it!

Examples

See examples/ folder:

  • test-browse.sh - Quick bash test workflow

API Reference

For machine-readable API spec, see openapi.yaml โ€” includes all commands, parameters, and response schemas.

Source

ๅฆ‚ไฝ•ไฝฟ็”จใ€ŒAgent Touch Layerใ€๏ผŸ

  1. ๆ‰“ๅผ€ๅฐ้พ™่™พAI๏ผˆWeb ๆˆ– iOS App๏ผ‰
  2. ็‚นๅ‡ปไธŠๆ–นใ€Œ็ซ‹ๅณไฝฟ็”จใ€ๆŒ‰้’ฎ๏ผŒๆˆ–ๅœจๅฏน่ฏๆก†ไธญ่พ“ๅ…ฅไปปๅŠกๆ่ฟฐ
  3. ๅฐ้พ™่™พAI ไผš่‡ชๅŠจๅŒน้…ๅนถ่ฐƒ็”จใ€ŒAgent Touch Layerใ€ๆŠ€่ƒฝๅฎŒๆˆไปปๅŠก
  4. ็ป“ๆžœๅณๆ—ถๅ‘ˆ็Žฐ๏ผŒๆ”ฏๆŒ็ปง็ปญๅฏน่ฏไผ˜ๅŒ–

็›ธๅ…ณๆŠ€่ƒฝ