Project · Agent tooling
Reusable Vision & Image Skill for Claude Code
Claude Code can read a screenshot and even rough out an image in code — but both are limited, and Gemini's vision and image generation go well beyond them. They just aren't at hand in the terminal, so each project that wants them re-wires the API from scratch. I built one skill that puts Gemini's see and draw behind a single command — a gateway any agent, skill, or workflow can build on, without touching the API again.
Claude, with Gemini's eyes and a paintbrush — one skill to see and to draw.
Gemini's eyes and hands — not yet in the terminal.
Claude Code can read an image and even rough one out in code — but only so far. It'll tell you a screenshot shows an error; it won't reliably lift every line of small text from it or judge whether a generated product shot matches the real garment down to the stitching, and its own image output stops at SVG and diagrams. Gemini's vision is far sharper at exactly that, and it renders photographic images Claude can't. Neither is a command away in the terminal, though — and calling Gemini ad hoc means every project re-implements the same client, copies the API key into one more place, and hardcodes model names that quietly break when the provider renames them. The capability isn't hard; owning it cleanly across a dozen projects — one place every agent, skill, and workflow can reach — is.
One gateway, in front of everything.
Every agent and skill calls the same small CLI instead of wiring up Gemini on its own. Behind it sits one client and one key — so vision and image generation are a command away, and the messy parts (auth, model names, errors) live in exactly one place.
See, draw, and a way to stay current.
Each is one subcommand with a clean input and output. The commands below are generic illustrations — the skill's own prompts stay inside it.
Hand it one or more images and a question; it answers in plain text. Read a screenshot, pull text out of a picture, or check a generated image against the brief.
gemini see shot.png --prompt "What error is shown, and which field is highlighted?" gemini-2.5-flash Turn a prompt — plus up to fourteen reference images and a chosen aspect ratio — into a PNG on disk. Generate from scratch, or edit and recombine the references.
gemini draw "a minimalist ink-on-limestone lens icon" --aspect 1:1 --out icon.png gemini-3.1-flash-image · pro option List what the API actually offers, grouped by capability, so model names are looked up at runtime instead of guessed — when the provider renames one, nothing downstream breaks.
gemini models --filter draw draw · see What makes it a gateway, not just a script.
The two commands are the easy part. These are the pieces that make it safe for any agent to shell out to, again and again, without surprises.
A single google-genai client, with the API key resolved from the environment or a git-ignored file — never copied into each project.
Model names live as editable constants and can be overridden per call, so a provider rename is a one-line change instead of a code hunt.
API failures print verbatim with clear exit codes, so the calling agent relays the real reason rather than guessing at it.
The real key and the virtual environment are git-ignored; only a template and a dependency lockfile are committed.
A locked dependency set means any machine or agent gets the same environment from a single setup command.
The contract requires the agent to actually look at a generated image and confirm it matches before calling the job done.
Build the gateway once; every agent after it inherits it.
The work of reaching Gemini cleanly — the client, the key, the model names, the error handling — gets done a single time, in a single place. After that, giving a new project vision or image generation isn't a task; it's a command it already knows. A small piece of tooling that quietly raises the ceiling on everything built next.
Already paying for itself.
Phone photos to studio-grade shots, every image checked against the real product — built entirely on this skill's see and draw.
Learn more→The cover image on each case study across this portfolio came out of the same gateway — generated, checked, and dropped in. No stock photos, no design tool, no hand-off.
With draw and see behind one command, Claude generates an image, looks at the result itself, and keeps correcting until it matches the brief — image generation as a self-correcting loop, not a one-shot guess.
Want a skill like this in your stack?
Tell me what you're building, and I'll come back with whether I can help and what a first step looks like.