How do you use this tool?
- Pick a photo or drop it into the zone (PNG, JPG, WebP, AVIF or HEIC up to 20 MB)
- Pick a model: Fast runs on every device; Sharp needs WebGPU for finer depth gradients
- One-time model download in the background (~19 MB Fast, ~50 MB Sharp), then cached
- The depth map appears as a grayscale image next to the source
- Download as PNG, WebP or JPG — output keeps the original resolution
What This Tool Does
This tool turns a single photo into a depth map — a grayscale image that encodes the estimated distance to the camera per pixel. Bright areas mean “near”, dark ones mean “far”. The computation runs entirely in your browser via WebAssembly or WebGPU and a specialized neural network trained specifically for monocular depth estimation — depth inference from a single still image, with no stereo camera or depth sensor.
The output is a full PNG (lossless), WebP (compact), or JPG (universal) that every image editor reads. Resolution and aspect ratio match the source; the tool automatically upsamples the internally computed map to the input size.
How Does AI Depth Estimation Work?
Estimating depth from a single image is a classic computer vision problem. Stereo methods need two captures from slightly different angles; time-of-flight sensors need dedicated hardware. With an ordinary snapshot, only a single 2D image is available — the computer has to reconstruct what’s in front and what’s behind from indirect cues.
The model uses learned patterns for that: Perspective foreshortening (parallel lines converge with depth), size consistency (a human in the foreground is larger than in the distance), occlusion (an object in front of another is closer), texture gradients (structures get finer with distance), atmospheric scattering (distant objects lose contrast), and learned scene statistics from millions of training images. The result is relative depth information — you learn what’s closer or farther, but not the absolute distance in meters.
The whole process runs in your browser. On first use the model is fetched once from a public model store (~19 MB for the fast variant, ~50 MB for the sharper one), then cached locally and works offline. Every subsequent depth estimate takes 3 to 15 seconds depending on device and image size.
When Does It Produce Good Results?
Natural scenes with clear foreground/background structure are the sweet spot. Portraits, landscape shots, interiors, street scenes, architecture photos — anywhere the image shows a spatially structured composition the model produces clean depth maps. Product photos with clear background bokeh also work well.
Difficult cases fall into three categories:
- Flat, low-texture images — uniform walls, empty sky shots, monochrome backgrounds. Here the model lacks visual cues and the map becomes flat or noisy.
- Optical illusions and trompe-l’œil — intentional depth illusions in paintings, mirror reflections, reflections in windows can confuse the model.
- Microscopic or astronomical images — microscope shots and astronomy photos don’t follow the natural depth cues from the training data and produce unreliable estimates.
For everyday photography — smartphone shots, DSLR images, drone footage — the model matches the expected image world and produces usable results for the typical applications.
What Can I Do With a Depth Map?
The map is a universal grayscale image that fits many workflows:
- Bokeh simulation and depth blur — in image editors like Adobe Photoshop, Affinity Photo or GIMP as a depth mask for selective blur, turning a smartphone snapshot into a shot with the professional shallow-depth-of-field look.
- Compositing between photo layers — separate foreground and background by depth mask, insert new objects spatially correctly, fake depth-of-field for stock photos.
- 3D modelling — input for Blender, Cinema 4D or other 3D software as a displacement or height map, generating a 3D terrain surface from a 2D photo.
- AR and VR effects — depth-based effects in Web AR implementations, parallax animations on websites, immersive image galleries.
- Education and research — depth maps as teaching material in computer vision courses, visualizing spatial structure in architectural photography.
The map is not suitable for autonomous vehicles, robotic manipulation, or medical depth measurement — those applications need calibrated sensors, not a relative AI estimate.
Is My Photo Really Private?
Depth estimation happens entirely on your device. Neither the original nor the computed depth map is sent to any server, stored, or analyzed. There is no third-party cookie banner, no signup, and no tracking — not even anonymous usage analytics.
The single exception is the one-time model download on first visit: the model file is fetched once from a public model store. That request contains only the model file URL. No image data, no user IDs, no personally identifiable information is transmitted. Technically, the model provider sees the IP address and user agent of the browser making the download — the same data your Internet provider sees on every page load anywhere on the web. After the first fetch, the model lives in the browser cache and the CDN is no longer contacted.
For sensitive material like product prototypes, confidential visuals, or unreleased shots, this is the deciding advantage over cloud tools that require uploading the file.
What does the EU AI Act require for AI-generated images?
Starting in August 2026 the EU AI Act, Article 50 requires AI-generated content to be labeled as such. The tool therefore shows a fixed, non-dismissible notice above every depth map: “This depth map was estimated by an AI model. Verify before use — AI models can misinterpret depth on optical illusions or unusual scenes.” This disclaimer is mandatory and cannot be turned off.
Practically that means: the map is a suggestion, not a binding measurement. For design uses (bokeh, compositing) the accuracy is more than sufficient; for safety-critical applications (autonomous systems, medical distance measurement, surveying) a calibrated sensor solution is mandatory.
Frequently Asked Questions
The most common questions about usage, quality, and privacy:
How do I make a depth map from a single photo?
Drop your photo into the tool above — the depth map is computed entirely in your browser by AI. The model estimates relative depth per pixel from image content. No stereo camera or depth sensor needed.
Does the tool work offline?
Yes. On the first visit, the browser downloads the AI model once (~19 MB). After that every depth estimate runs fully offline from the browser cache.
Which image formats can I upload?
Input: PNG, JPG, WebP, AVIF and HEIC (iPhone photos). HEIC is automatically decoded before processing. Output: grayscale PNG, WebP or JPG.
How long does a depth estimate take?
After the one-time model download, an estimate typically takes 3 to 15 seconds — depending on device, selected variant, and image size.
Which Image Tools Are Related?
Other tools from the kittokit ecosystem that pair well with depth-map generation:
- Background Remover — AI cutout, often the prep step for depth-based compositing.
- Image Upscaler — upscale small input images first so the depth map gets more detail.
- Photo to Coloring Page — turn photos into line drawings, complementary to depth information.
- Image Format Converter — convert depth maps to other formats (PNG lossless to WebP compact).
- EXIF Viewer — read original metadata (camera, focal length, GPS) alongside the depth estimate.
Last updated: