Clicky

One-line summary: Open-source macOS menu-bar AI assistant that captures screenshots, responds by voice, and animates a cursor overlay that points at UI elements — a reference implementation of the screen-aware "point and ask" pattern.

What it is

A Swift/macOS menu-bar app (farzaa/clicky, MIT license, ~4,400 stars as of ingest). User holds a global hotkey (Control + Option), speaks a question, and Clicky captures audio plus a screenshot, sends them to Claude for reasoning, plays the response via ElevenLabs TTS, and can animate a floating cursor overlay that points at specific UI elements while explaining them.

The self-description: "an AI teacher that lives as a buddy next to your cursor."

Why it matters to patia

Clicky is not a competitor — it is a reference implementation of a pattern patia could eventually adopt for a v2+ feature. The failure mode it addresses is one SMS and voice chat cannot easily fix: the user cannot describe what they are looking at. A senior who cannot find a setting, identify which button to press, or articulate what a dialog box is saying is stuck inside a conversational-only helper. A screen-aware assistant closes that gap.

Whether this pattern meaningfully helps seniors specifically is an open question tracked at screen-aware-ai-for-seniors.

Key facts

License: MIT
Platform: macOS 14.2+ only; no Windows, Linux, or web version
Frontend: Swift (two transparent NSPanel windows — one control, one fullscreen cursor overlay)
Screen capture: ScreenCaptureKit; requires Screen Recording and Accessibility permissions
Speech-to-text: AssemblyAI (streaming)
Reasoning / vision: Claude (Anthropic); the model can emit directional tags like [POINT:x,y:label:screenN] that the client animates
Text-to-speech: elevenlabs
Backend: Cloudflare Workers (free tier) as an API proxy
Interaction: push-to-talk hotkey; voice in, voice out, with visual pointer overlay

Strengths (from patia's perspective)

Demonstrates the pattern end-to-end — we can point at a working implementation, not a concept slide
Stack is portable — screenshot + STT + vision-capable LLM + TTS is reimplementable on web/PWA or native mobile
Open source — the [POINT:x,y:label:screenN] protocol for directing overlay gestures is a reusable idea
Integrates with tools patia already uses (Claude) or plans to evaluate (elevenlabs)

Weaknesses (from patia's perspective)

macOS-only — most patia pilot seniors will be on iOS, Android, or Windows; the reference implementation does not run on the target platforms
Four API keys to wire (Anthropic, AssemblyAI, ElevenLabs, Cloudflare) — even with a proxy, this is infeasible for a non-technical user
Permission-gated by screen recording and accessibility access — dialogs that fraud-aware seniors are already primed to refuse (see senior-fraud-susceptibility)
Always-on menu-bar monitoring is the wrong privacy posture for a shame-aware product; a senior-facing version would likely need per-session explicit screen-share instead

Open questions

Does the [POINT:x,y] overlay reduce or amplify cognitive load for users with selective-attention deficits? (See risks summarized at screen-aware-ai-for-seniors.)
Is the right product form for patia seniors a Clicky-style always-on companion, or a bounded "help me now" mode the user explicitly opts into?
Could the same protocol work over a web session (browser extension + Chrome's getDisplayMedia API) rather than native-app territory?

Sources

2026-04-17-clicky-cursor-aware-ai-assistants

elevenlabs — Clicky's TTS dependency; also a Phase 2 voice candidate for patia
screen-aware-ai-for-seniors — the open question this pattern raises
senior-tech-competitive-landscape — screen-aware AI is an adjacent emerging pattern, not a current category
senior-fraud-susceptibility — permission-dialog friction and honeypot risk

Clicky

Clicky

What it is

Why it matters to patia

Key facts

Strengths (from patia's perspective)

Weaknesses (from patia's perspective)

Open questions

Sources

Related