- Introduction
- Basics of Image Segmentation
- Graph Cuts in Image Segmentation
- How GraphCut Works
- Additional Information
- Further Learning Resources
- Running Instructions
Image segmentation allows devices to identify individual items in images. Consequently, apps can change backgrounds or erase subjects because of this capability. This process in a computer allows machines to isolate different elements that exist in an image.
When people look at photos, they automatically segment people, objects, and backgrounds without even realizing it. Brains automatically detect edges between objects. Image segmentation allows computers to do the same so that they can segment couches from cats, people from sunsets, or tables from apples.
Glancing at a family photograph, the human vision instinctively recognizes individual family members, furniture pieces, pets, and other things in the photograph. The human vision recognizes those things naturally.
Computers view images differently. An image is just a collection of colored pixels to a computer. Image segmentation assists computers in making sense of these pixels by grouping them into significant areas. The process is like viewing a mosaic at different distances. Close up, only single pieces of the mosaic are observed. Middle distance, there are patterns. Distant, the whole picture is visible. Segmentation allows computers to view the big picture rather than single pixels.
Mathematically, image segmentation is a partition problem. Given an image represented as a function
-
$\bigcup_{i=1}^{n} S_i = I$ (complete coverage) -
$S_i \cap S_j = \emptyset$ for all$i \neq j$ (non-overlapping regions) - Each
$S_i$ is connected - Pixels within each
$S_i$ satisfy a predefined homogeneity criterion
The homogeneity criterion can be based on various features:
- Intensity: Using thresholding with a threshold
$T$ :
- Color
- Texture
Simple segmentation merely clumps pixels of the same color (like blue pixels that may be defining sky). More advanced techniques detect abrupt color transitions that define edges (like the edge of a person). The most advanced techniques identify objects by shape and context.
Graph cuts are a powerful image segmentation tool. A picture can be imagined as a neighborhood where there is a house for every pixel. Every house shares a common border with its neighboring houses (neighbor pixels). There are two community centers for the neighborhood - one for "foreground" people and another one for "background" people.
Mathematically, an image is represented as a weighted graph
- Vertices
$V$ correspond to pixels or superpixels - Edges
$E$ connect neighboring elements - Edge weights
$w(u,v)$ represent the dissimilarity between vertices
For pixels
Where
Each home has paths to both centers. More substantial paths are for stronger connections. More vibrant red houses may have stronger connections to the "foreground" center when trying to find a red apple in a picture.
The task is to determine where to place a wall in this neighborhood, splitting the two populations in front and rear. The objective is to not cut across main thoroughfares as much as possible. This wall is the border of the object and the background.
This approach has the following advantages:
- Examines the entire image as a whole rather than individual pixels
- Both pixel characteristics and neighboring pixel interactions are evaluated
- Allows people to contribute, such as labelling some areas as foreground.
GrabCut is an easy-to-use separator for image parts. Here's how it functions:
- Rectangle Selection: Users put a rectangle over the subject to be separated. The rectangle says "the subject is somewhere in this area." Everything outside of this rectangle automatically becomes the background.
- Color Analysis: The system processes color patterns in both possible foreground (within the rectangle) and certain background (beyond the rectangle). For instance, while choosing a golden retriever, the system can determine tan/gold colors as possible foreground and green grass colors as background.
Mathematically, GrabCut models pixel color distributions using Gaussian Mixture Models (GMMs):
Where:
-
$\pi_k$ ,$\mu_k$ , and$\Sigma_k$ are the weight, mean, and covariance of the$k$ -th Gaussian component -
$K$ is typically 5 for both foreground and background models -
Building a Connection Network: The system constructs a network of relationships between all the pixels with foreground or background probability based on color. Visually similar pixels link to one another.
-
Boundary Determination: It determines the most natural boundary line between the foreground and background. It detects the shortest path in the image that separates the object from the surrounding objects.
-
Iterative Refinement: Once estimated, the system improves its knowledge of foreground and background colors, and then recalculates. The process is repeated a number of times until generating optimal results.
The energy function combines color models with smoothness constraints:
Where:
-
$U$ is the data term based on color models
-
$V$ enforces spatial coherence:
GrabCut allows users to intervene when errors occur. If an area of the subject is not chosen, users can mark that region as clear foreground. Similarly, regions of the background that were incorrectly added can be designated as clear background. This combines human decisions with computer processing.
The cv::grabCut function implements the GrabCut image segmentation algorithm with the following parameters:
img: Input 8-bit 3-channel image.mask: Input/output 8-bit single-channel mask initialized by the function.rect: ROI containing a segmented object, used when mode isGC_INIT_WITH_RECT.bgdModelandfgdModel: Temporary arrays for the background and foreground models.iterCount: Number of iterations for the algorithm.mode: Operation mode, one of theGrabCutModes.
To deepen your understanding of image segmentation beyond GrabCut, consider exploring:
- Watershed Algorithm: A region-based approach that treats an image as a topographic surface
- Mean Shift Segmentation: A feature-space analysis technique for locating maxima of a density function
- K-means Clustering: A simple but effective method for partitioning pixels into k clusters
For practical implementation, the OpenCV library provides implementations of many segmentation algorithms:
cv::watershed()for watershed segmentation: Official-Linkcv::pyrMeanShiftFiltering()for mean shift segmentation: Official-Link- K-means clustering can be implemented using
cv::kmeans(): Offical-Link
Each approach has its strengths and weaknesses, making them suitable for different applications and image types. Refer Opencv official documentation for getting into this functionalities.
-
Navigate to ../Pixels_Seminar/4_cv_basics/9_image_segmentation
-
Open terminal in this folder
-
run
make cleanto clean out any previous builds -
run
make SRC=main.cpp link=src/segmentation.cppto build the executable -
run
./image_segmentation -
Using the Program
- After you run the program, a window will open showing the default image (
car.jpeg)
- Use your mouse to draw a rectangle around the object you want to segment:
- Left-click and hold to start drawing
- Drag to create the rectangle
- Release to finish drawing and press enter
- The program will automatically apply the GrabCut algorithm
- Results will be displayed in a new window
- Press any key to exit
- After you run the program, a window will open showing the default image (