Real mouse/keyboard automation via xdotool. Useful for bypassing bot detection that blocks CDP clicks.
Setup:
- Browser runs X server on display
:1with socat forwarding to TCP 6001 - Agent and browser containers share
desk-netDocker network - Browser hostname:
desk-browser(IP fallback:172.18.0.2)
Usage:
export DISPLAY=desk-browser:1
xdotool getmouselocation
xdotool mousemove 500 300 click 1
xdotool key Return
xdotool type "hello world"If DNS fails: Container may be disconnected from desk-net. Timer reconnects every 5 min, or ask Brektimus to run /data/openclaw/agents/desk/setup-x11.sh
IP fallback: DISPLAY=172.18.0.2:1
Take screenshots of the browser using ImageMagick's import command over X11.
Take a screenshot:
DISPLAY=desk-browser:1 import -window root /tmp/screen.png
cp /tmp/screen.png ./screen.png # Copy to workspace for readingRead the screenshot:
read("screen.png") # Returns the imageWhy X11 screenshots?
- CDP screenshots miss native browser popups (file dialogs, permission prompts)
- X11 captures the actual screen as seen in noVNC
Find browser window:
DISPLAY=desk-browser:1 xdotool search --class "chromium"
# Returns window IDs (use the larger geometry one, usually second)Get window geometry:
DISPLAY=desk-browser:1 xdotool getwindowgeometry <window_id>Resize window:
DISPLAY=desk-browser:1 xdotool windowsize <window_id> 1050 780Note: getactivewindow doesn't work (minimal WM). Use search --class instead.