Meta, formerly known as Facebook, has introduced a new AI tool called the “Segment Anything Model” or “SAM,” aimed at simplifying the process of analyzing photos for researchers and web developers. The tool allows users to create “cutouts” or segments of any object in an image by clicking on a point or drawing a box around the item. This innovative solution facilitates various tasks, such as editing photos, analyzing surveillance footage, and understanding the components of cells.
One of the key advantages of SAM is its extensive training dataset, comprising 1.1 billion segmentation masks and 11 million images licensed from an undisclosed photo company. Meta AI collaborated with 130 human annotators from Kenya to create this dataset, combining manual and automatic labeling to achieve remarkable accuracy.
While object recognition and computer vision technologies have been present for some time, Meta’s approach differs by integrating AI foundational models with computer vision, making it a notable advancement in the field. Paul Powers, CEO and founder of Physna, a 3D object search engine, acknowledges the uniqueness of Meta’s approach, particularly regarding the size of their training dataset.
“I wouldn’t say that this is a new area of technology. Object segmentation already exists so I wouldn’t say this is a new capability. Fundamentally, I think their approach of using foundational models is new and the size of the dataset they’re training on could be novel,” says Paul Powers.
Despite the existing capabilities of object segmentation, Meta’s release of these tools to the broader public aims to encourage users to build more specific use cases in areas like biology and agriculture. This aligns with Meta’s larger strategy of exploring generative AI for advertisements across Instagram and Facebook, as CEO Mark Zuckerberg has assembled a dedicated product team focused on building generative AI tools.
The SAM tool caters to users who lack the AI infrastructure or data capacity to create their own models for image segmentation. Being browser-based and operating in real-time, SAM provides accessibility to a wider audience without the need for powerful GPU systems, enabling various edge use cases.
Nevertheless, there are limitations to a computer vision model trained solely on two-dimensional images. For instance, detecting and selecting a remote held upside down would require training on different orientations of the object. Additionally, models based on 2D images might struggle to accurately identify non-standardized objects through AR/VR headsets or detect partially covered objects in public spaces when used by autonomous vehicle manufacturers.
Despite these limitations, Meta envisions several applications for its object detection tool, particularly in virtual reality spaces like its online VR game, Horizon Worlds. The tool proves useful for “gaze-based” detection of objects through VR and AR headsets. Moreover, the model can extend its capabilities across various domains, including underwater, microscopic, aerial, and agricultural images. The inspiration behind the generalized image segmenting model arose from conversations with PhD researchers, expressing the need for object detection in specific research applications, such as counting and identifying trees for studying wildfires in California.