Selective Search Algorithm For Object Detection Explained

Feb 4, 2019

Ayoub Benali Amjoud

3 minute read

What is selective search?

Selective search is a region proposal algorithm used in object detection that blends both the strength of exhaustive search and segmentation. It is intended to be fast with a very high recall. It is based on hierarchical segmentation, starting with small super pixels and merging according to color, texture, size and shape compatibility.

How selective search works ?

The selective search begins by over-segmenting the image according to pixel intensity using a graph-segmentation method developed by Felzenszwalb and Huttenlocher.

An image that is over segmented looks like this.

Step 1: Generate the starting sub-segmentation

Purpose: To generate several regions, each of them being part of no more than one object. By using the approach presented by Felzenszwalb et al.

Step 2: Recursively merge similar regions into larger ones

Using a greedy algorithm to group regions iteratively: 1. From a list of regions, select two that closely match. 2. Merge them into a single larger region. 3. Repeat the process until there is only one region left. The result is a hierarchy of increasingly large regions, as we would like to achieve.

Step 3: Using the generated regions in order to produce candidate object locations

At every iteration, larger segments are created and included in the list of region proposals. Therefore, we generate region proposals from smaller to larger segments on a bottom-up basis. This is what we mean by computing “hierarchical” segmentations using the Felzenszwalb and Huttenlocher over segments.

This image illustrates the initial, central and final step of the hierarchical segmentation process.

Similarity metrics:

For two regions, selective search suggested 4 similarity measures based on color, texture, size and shape compatibility:

Color Similarity:

A color histogram C with 25 bins is calculated for each channel in the region r and the histograms of all channels are combined to produce a color descriptor resulting in a color descriptor of 25×3 = 75 dimensions. The similarity with the histogram intersection can be quantified as:

Texture Similarity:

We measure textures using a “HOG-like” function: 1. Extract the Gaussian derivatives from the image in 8 directions and on each channel. 2. Build a 10-bin histogram for every orientation and for each color channel, which gives a feature descriptor of 10x8x3 = 240 dimensions. The texture similarity of two regions is also computed using histogram intersections.

Size similarity:

guarantees that region proposals at all scales are formed at all image parts. If this measure of similarity is not considered, a single region will still swallow up all the small adjacent regions one by one and, therefore, regional proposals at several scales will only be generated at that location. The size similarity is described as follows:

Shape Compatibility:

As well, we want our merged regions to be more cohesive. Shape compatibility indicates the extent to which two regions fit into each other to fill gaps and if they do not even meet each other, then the two regions should not be merged. The shape compatibility is defined as follows:

Final similarity metric:

The final similarity measures the similarity of two regions as a linear combination of the four-given metrics:

where and are two regions or segments in the image and denotes if the similarity measure is used or not. Afterwards, we can build a diversified collection of regions merging strategies by looking at different weighted combinations in various color spaces.