Meta has announced its latest iteration of its Meta Segment Anything Model (SAM). The latest Segment Anything Model 2 is the successor of SAM which was released over a year ago. The tech giant claims that the new model now supports object segmentation in videos and images.
“We’re releasing SAM 2 under an Apache 2.0 license, so anyone can use it to build their own experiences. We’re also sharing SA-V, the dataset we used to build SAM 2 under a CC BY 4.0 license and releasing a web-based demo experience where everyone can try a version of our model in action,” Meta said in its official release.
What is SAM?
SAM is a part of Meta’s FAIR (Fundamental AI Research) lab and is seen as the next big leap in computer vision. For the uninitiated, computer vision is a field of artificial intelligence that allows computers to interpret visual data such as images and videos. So in essence, SAM is an AI model that is an advancement in computer vision. Meta’s SAM is a state-of-the-art segmentation model that is known for its ability to perform complex image segmentation tasks with great precision and versatility. For industries, SAM meticulously analyses visual data into segments that allow precise analysis and innovations.
When it comes to SAM 2, Meta claims that the new model can identify pixels in an image that correspond to an object of interest. Now, this is the most fundamental task in the field of computer vision. The previous version of SAM was a foundation model for this kind of task on images.
Meta has described SAM 2 as its first unified model for real-time, promptable object segmentation in images and videos, enabling a step-change in the video segmentation experience. When it comes to accuracy, SAM 2 exceeds in capabilities when compared to SAM and it achieves better video segmentation performance than existing work. The new model can also segment any object in any video or image, essentially previously unseen visual content without custom adaptation.
To explain it in simple words, SAM 2 helps computers understand videos by identifying and following objects across frames in videos. This can be understood by imagining that you are tracking the movements of a dog in a video. SAM 2 uses simple signals such as clicking on the dog to find it in one frame. It later remembers this and keeps tracking the dog across frames, even if the animal hides and reappears. SAM 2 is like having a smart assistant that knows when the dog is hidden or mingled with other objects in the visual. A tool like SAM 2 makes it easier and faster for people to create video annotations – detailed notes about everything that is happening in the video.
A model like SAM 2 can have a broad range of use cases across industries. For example, in film and media, this model can help in automating the process of video editing and special effects, leading to improved efficiency in post-production. In healthcare, this could help in analysing medical videos or surgeries and diagnostic imaging to gain better insights. Threat detection in security and surveillance, and in retail it could be useful in enhancing inventory management. When it comes to robotics, SAM 2 could be pathbreaking as it could allow robots to navigate and interact with objects in their surroundings more efficiently.