Meta launched SAM 3D, a suite of AI models that converts 2D images into detailed 3D reconstructions—objects, scenes, and human bodies from single photos.
Key Details:
From photos to explorable 3D - Upload a single image, select objects or people, and generate posed 3D models you can manipulate or view from different angles. Meta's calling this "grounded 3D reconstruction" for physical-world scenarios, not just synthetic or staged studio shots.
5:1 win rate vs. competitors - In head-to-head human preference tests, SAM 3D Objects beat existing methods by at least a 5:1 margin. The system reconstructs full textured meshes in seconds, fast enough for real-time robotics or VFX workflows.
Segment Anything Playground live - Public demo environment where anyone can upload images and try the models. Upload your own photos, reconstruct humans or objects, export the results.
Training data advantage - SAM 3D Objects was trained on almost 1 million physical-world images with ~3.14 million model-in-the-loop meshes. SAM 3D Body used approximately 8 million images, including rare poses and occlusions. This is vastly more diverse than the isolated synthetic 3D assets most models train on.
Already in production - Facebook Marketplace's new "View in Room" feature uses SAM 3D to let buyers visualize furniture in their homes before purchase.
The trade-offs: Resolution is moderate—complex objects like full human reconstructions can lose detail. The model predicts objects individually, so it doesn't reason about physical interactions between multiple objects in a scene. Hand pose estimation works but doesn't beat specialized hand-only models.
For VFX, virtual production, or previsualization teams, the value is in speed and accessibility. This isn't photogrammetry-level fidelity, but it's a practical path from reference photos to manipulable 3D assets without scanning rigs or modeling labor. Meta's sharing everything—code, benchmarks, MHR format—which should accelerate research and tooling around physical-world 3D capture.


