What does Segment Anything do?

Segment Anything is an AI technique that extracts the full contour of an object in a photo from a single click. You tap inside the image — the tool produces a pixel-precise mask you can use to cut out the object or replace the background. The technique works for arbitrary content: people, animals, products, objects, plants, furniture.

Does the tool work offline and without signup?

Yes. The AI model is loaded once into the browser cache on first use (around 21 MB for the fast variant, around 106 MB for the accurate variant). All subsequent segmentations then run fully offline. No signup, no email address, no third-party cookies, no server upload.

What's the difference between Click, Box, and Refine modes?

**Click** is the default — a single tap on the object creates the selection. **Box** is useful when several similar objects sit next to each other (e.g. several chairs) — a rectangle narrows down which object you mean. **Refine** is the power mode: positive clicks (graphite) add regions to the mask, shift-clicks (orange) remove regions. The mask updates live, in under 100 ms per click.

What are the three mask candidates below the image?

The model returns three candidate masks per click with different granularities — the highest IoU score (confidence) is preselected. Alternative suggestions often show meaningful sub-selections — for a portrait click, for instance, 'head only', 'head+shoulders' and 'whole person'. Tap a candidate to switch the active mask.

Why does the first click take longer than the next ones?

The tool uses a two-phase approach: on the first image, an image encoder runs once (around 2 to 6 seconds depending on device and model). The image representation is then cached in memory, and every subsequent click only runs the small mask decoder — typically under 100 milliseconds. That's why refinement feels real-time.

How reliable is AI segmentation?

Very good for clearly defined objects in everyday photos — people, animals, products, furniture, vehicles. Trickier are transparent objects (glass, veils), fine hair/fur with background bleed-through, and reflections. Per EU AI Act Article 50, the tool surfaces a disclosure note above every result — review for unusual scenes or critical applications.

Segment Anything — Object cutout by click

What does this tool do?

Segment Anything extracts the full contour of an object from a single photo click. You tap inside the image, the tool computes a pixel-precise mask, and you get three outputs: the cutout object as PNG (transparent background), the pure black-and-white mask (for image editors), and the inverse mask (keep background, remove object). Everything happens directly in your browser via WebAssembly or WebGPU — no photo is sent to a server.

At the core sits a specialized neural network for prompt-based image segmentation. You give the model a “prompt” — a click point, a rectangle, or a combination of positive and negative points — and it returns the matching mask. This works for arbitrary objects: people, animals, furniture, products, plants, vehicles. It’s not restricted to a fixed class list like older approaches.

How does in-browser segmentation work?

The tool operates in two phases. In the analysis phase, an image encoder runs once over your photo — it extracts the spatial image representations into an internal format used by the subsequent selection step. This phase takes around 2 to 6 seconds depending on device and model and is one-time per image.

In the selection phase, a small mask decoder runs on every click. Because the encoder has already done its work and the representation is cached in memory, this phase is dramatically faster — typically under 100 milliseconds per click. Refinement feels like a live interaction: you tap, the mask updates, you tap again, the mask adapts.

This two-stage split is the central performance trick over older tools that re-run the full model per click — there every click takes several seconds, which makes refinement practically impossible.

What selection modes are available?

Click mode is the default. You tap the target object once, and the model finds the matching contour automatically. Works excellently for clearly defined objects with background contrast — people against a wall, products on a table, animals in a landscape.

Box mode is useful when several similar objects sit next to each other. You drag a rectangle around the desired object, and the model knows exactly which one you mean. Classic example: photos of several people, where a single click would be ambiguous.

Refine mode is the power mode. Tap to add a positive point (included in the mask, drawn in the highlight color), shift-tap to add a negative point (excluded from the mask, drawn in the error color). With two or three additional points, complex selections — say “only the t-shirt, not the skin” — become precise.

What are the three mask candidates?

The model returns not one but three masks per prompt, at different granularities. The candidates are sorted by predicted Intersection-over-Union (IoU) confidence — the most likely candidate is preselected. You can switch between the three without re-running anything.

In practice the three candidates often look like this: a portrait click yields “head only”, “head and shoulders”, and “whole person”. A click on a car yields “body only”, “car including windows”, and “car including ground shadow”. These multi-candidates save the next refinement click when the obvious granularity isn’t the intended one.

What can I do with the mask?

The output is universal — the tool offers three download options:

Cutout PNG — the object on a transparent background. Ready to use for composites in Adobe Photoshop, product images on a marketplace, social-media graphics with transparent background, or as an overlay in video editing.
Mask PNG — black-and-white image, white = object, black = background. Input for your own workflows in Affinity Photo, GIMP as “load selection from mask”, or as alpha channel in Blender for 3D composites.
Inverse mask / inverse cutout — keep the background, remove the object. Practical for “remove person from photo” workflows in combination with content-aware fill in your image editor.

All outputs are lossless PNG in your original input resolution. No hidden watermarks, no format conversions, no quality loss.

When does the tool deliver the best results?

Clearly defined objects with good contrast are the sweet spot. A person against a single-color wall, a product on a clean table, an animal in a typical landscape — here a single click is often enough for a print-ready result.

Trickier scenes work too but need refinement: for fine hair with background bleed-through, a refinement click on the hair tip usually fills in the missing contour. For a person holding an object in front of them (e.g. phone, glass), a negative click on the object separates them cleanly.

Hard cases: fully transparent objects (glass, water droplets), very fine detail (individual hair tips without contrast), reflections and mirror images, and images with low resolution (below 256×256). In these cases manual post-processing in an image editor makes sense — the tool mask is a good starting point, not a finished product.

Is my photo truly private?

Segmentation runs exclusively on your device. Neither the original nor the computed mask is sent to a server, stored, or analysed. There’s no third-party cookie banner, no signup, and no tracking — no anonymous usage analytics either.

The only exception is the one-time model download on first use: the model file is fetched once from a public model registry. This request contains only the model file URL. No image data, no user IDs, no personally identifiable information is transmitted. After the first load, the model lives in the browser cache, and the CDN is no longer contacted.

For sensitive material like product prototypes, confidential visuals, or unreleased footage, that’s the decisive advantage over cloud tools that must upload the file — at kittokit nobody but you sees the photo.

What does the EU AI Act require for AI-generated content?

Starting August 2026, EU AI Act Article 50 requires AI-generated content to be labelled as such. The tool therefore shows a fixed, non-dismissible note above every result: “This selection was estimated by an AI model. Review for optical illusions or unusual scenes.” This note is mandatory and cannot be hidden.

In practice that means: the mask is a suggestion, not a binding classification. For design purposes (composites, product images, social media) the accuracy is more than sufficient; for safety-critical applications (medical image analysis, legal identification, autonomous systems) a professional tool with classification warranties is required — not a browser-local AI estimate.

Frequently Asked Questions

The most-asked questions about usage, quality, and privacy:

How do I cut out an object with one click?

Upload your photo in the tool above — after around 3 seconds of analysis, tap once on the object. The mask appears immediately. Three candidates are available; the most likely one is preselected. Download mask PNG or cutout PNG.

Does the tool work offline?

Yes. On first use the browser downloads the AI model once (around 21 MB fast, around 106 MB accurate). All subsequent segmentations run fully offline from the browser cache.

Which image formats can I upload?

Input: PNG, JPG, WebP, AVIF, and HEIC (iPhone photos). HEIC is decoded automatically. Output: lossless PNG (mask + cutout + inverse mask) at original resolution.

How long does a selection take?

Per image around 2 to 6 seconds for the one-time analysis phase, then under 100 milliseconds per refinement click.

Which image tools fit alongside this one?

More tools from the kittokit ecosystem that pair well with object segmentation:

Background remover — automatic cutout without click, ideal for portraits and products with clear backgrounds.
Photo to coloring page — line drawing instead of mask, for print workflows with coloring character.
Depth map — spatial depth instead of object mask, complements segmentation for 3D workflows.
Image upscaler — scale up source images before segmentation if the source is under 512×512.
Image format converter — convert masks or cutouts to other formats (lossless PNG to compact WebP).

Segment Anything — object cutout by click

How It Works

Pick a photo

Choose a mode and tap

Pick mask + download

Privacy

How do you use this tool?