Skip to content
Runs local · no upload

Segment Anything — object cutout by click

Click, drag a box, or add multiple refinement points — a specialized segmentation model finds the object contour and delivers mask, cutout, and inverse mask as lossless PNGs.

Pick an image or drop it here

PNG, JPG, WebP, AVIF or HEIC up to 20 MB

PNGJPGWEBPAVIFHEICHEIF

How It Works

  1. 01

    Pick a photo

    Drag-drop a file or pick from your device. PNG, JPG, WebP, AVIF or HEIC up to 20 MB.

  2. 02

    Choose a mode and tap

    Click for simple selection, Box for rectangle constraint, Refine for positive and negative points. Mask appears in under a second after the one-time image analysis.

  3. 03

    Pick mask + download

    Pick the best of three candidates (confidence shown). Download mask PNG, cutout PNG, or inverse mask.

Privacy

Processing runs exclusively on your device. Your photos never leave the browser, are never uploaded, and are discarded when you close the tab. GDPR-compliant — safe for product photos, business visuals, or confidential material.

One click on the target object is enough — the mask appears in seconds. The model runs entirely in your browser via WebGPU or WebAssembly. Refine the selection with additional points, download mask or cutout as PNG — everything happens on your device, no photo leaves the browser tab.

01 — How to Use

How do you use this tool?

  1. Pick a photo or drop it here (PNG, JPG, WebP, AVIF or HEIC up to 20 MB)
  2. Pick a mode: Click (default), Box (rectangle constraint) or Refine (positive/negative points)
  3. Tap the object once — selection appears in under a second after the one-time image analysis
  4. Pick the best of three mask candidates (confidence shown per candidate)
  5. Download mask PNG, cutout PNG, or inverse mask — lossless, original resolution

What does this tool do?

Segment Anything extracts the full contour of an object from a single photo click. You tap inside the image, the tool computes a pixel-precise mask, and you get three outputs: the cutout object as PNG (transparent background), the pure black-and-white mask (for image editors), and the inverse mask (keep background, remove object). Everything happens directly in your browser via WebAssembly or WebGPU — no photo is sent to a server.

At the core sits a specialized neural network for prompt-based image segmentation. You give the model a “prompt” — a click point, a rectangle, or a combination of positive and negative points — and it returns the matching mask. This works for arbitrary objects: people, animals, furniture, products, plants, vehicles. It’s not restricted to a fixed class list like older approaches.

How does in-browser segmentation work?

The tool operates in two phases. In the analysis phase, an image encoder runs once over your photo — it extracts the spatial image representations into an internal format used by the subsequent selection step. This phase takes around 2 to 6 seconds depending on device and model and is one-time per image.

In the selection phase, a small mask decoder runs on every click. Because the encoder has already done its work and the representation is cached in memory, this phase is dramatically faster — typically under 100 milliseconds per click. Refinement feels like a live interaction: you tap, the mask updates, you tap again, the mask adapts.

This two-stage split is the central performance trick over older tools that re-run the full model per click — there every click takes several seconds, which makes refinement practically impossible.

What selection modes are available?

Click mode is the default. You tap the target object once, and the model finds the matching contour automatically. Works excellently for clearly defined objects with background contrast — people against a wall, products on a table, animals in a landscape.

Box mode is useful when several similar objects sit next to each other. You drag a rectangle around the desired object, and the model knows exactly which one you mean. Classic example: photos of several people, where a single click would be ambiguous.

Refine mode is the power mode. Tap to add a positive point (included in the mask, drawn in the highlight color), shift-tap to add a negative point (excluded from the mask, drawn in the error color). With two or three additional points, complex selections — say “only the t-shirt, not the skin” — become precise.

What are the three mask candidates?

The model returns not one but three masks per prompt, at different granularities. The candidates are sorted by predicted Intersection-over-Union (IoU) confidence — the most likely candidate is preselected. You can switch between the three without re-running anything.

In practice the three candidates often look like this: a portrait click yields “head only”, “head and shoulders”, and “whole person”. A click on a car yields “body only”, “car including windows”, and “car including ground shadow”. These multi-candidates save the next refinement click when the obvious granularity isn’t the intended one.

What can I do with the mask?

The output is universal — the tool offers three download options:

  • Cutout PNG — the object on a transparent background. Ready to use for composites in Adobe Photoshop, product images on a marketplace, social-media graphics with transparent background, or as an overlay in video editing.
  • Mask PNG — black-and-white image, white = object, black = background. Input for your own workflows in Affinity Photo, GIMP as “load selection from mask”, or as alpha channel in Blender for 3D composites.
  • Inverse mask / inverse cutout — keep the background, remove the object. Practical for “remove person from photo” workflows in combination with content-aware fill in your image editor.

All outputs are lossless PNG in your original input resolution. No hidden watermarks, no format conversions, no quality loss.

When does the tool deliver the best results?

Clearly defined objects with good contrast are the sweet spot. A person against a single-color wall, a product on a clean table, an animal in a typical landscape — here a single click is often enough for a print-ready result.

Trickier scenes work too but need refinement: for fine hair with background bleed-through, a refinement click on the hair tip usually fills in the missing contour. For a person holding an object in front of them (e.g. phone, glass), a negative click on the object separates them cleanly.

Hard cases: fully transparent objects (glass, water droplets), very fine detail (individual hair tips without contrast), reflections and mirror images, and images with low resolution (below 256×256). In these cases manual post-processing in an image editor makes sense — the tool mask is a good starting point, not a finished product.

Is my photo truly private?

Segmentation runs exclusively on your device. Neither the original nor the computed mask is sent to a server, stored, or analysed. There’s no third-party cookie banner, no signup, and no tracking — no anonymous usage analytics either.

The only exception is the one-time model download on first use: the model file is fetched once from a public model registry. This request contains only the model file URL. No image data, no user IDs, no personally identifiable information is transmitted. After the first load, the model lives in the browser cache, and the CDN is no longer contacted.

For sensitive material like product prototypes, confidential visuals, or unreleased footage, that’s the decisive advantage over cloud tools that must upload the file — at kittokit nobody but you sees the photo.

What does the EU AI Act require for AI-generated content?

Starting August 2026, EU AI Act Article 50 requires AI-generated content to be labelled as such. The tool therefore shows a fixed, non-dismissible note above every result: “This selection was estimated by an AI model. Review for optical illusions or unusual scenes.” This note is mandatory and cannot be hidden.

In practice that means: the mask is a suggestion, not a binding classification. For design purposes (composites, product images, social media) the accuracy is more than sufficient; for safety-critical applications (medical image analysis, legal identification, autonomous systems) a professional tool with classification warranties is required — not a browser-local AI estimate.

Frequently Asked Questions

The most-asked questions about usage, quality, and privacy:

How do I cut out an object with one click?

Upload your photo in the tool above — after around 3 seconds of analysis, tap once on the object. The mask appears immediately. Three candidates are available; the most likely one is preselected. Download mask PNG or cutout PNG.

Does the tool work offline?

Yes. On first use the browser downloads the AI model once (around 21 MB fast, around 106 MB accurate). All subsequent segmentations run fully offline from the browser cache.

Which image formats can I upload?

Input: PNG, JPG, WebP, AVIF, and HEIC (iPhone photos). HEIC is decoded automatically. Output: lossless PNG (mask + cutout + inverse mask) at original resolution.

How long does a selection take?

Per image around 2 to 6 seconds for the one-time analysis phase, then under 100 milliseconds per refinement click.

Which image tools fit alongside this one?

More tools from the kittokit ecosystem that pair well with object segmentation:

  • Background remover — automatic cutout without click, ideal for portraits and products with clear backgrounds.
  • Photo to coloring page — line drawing instead of mask, for print workflows with coloring character.
  • Depth map — spatial depth instead of object mask, complements segmentation for 3D workflows.
  • Image upscaler — scale up source images before segmentation if the source is under 512×512.
  • Image format converter — convert masks or cutouts to other formats (lossless PNG to compact WebP).

Last updated:

You might also like