We propose a perceptual video coding framework based on the divisive normalization scheme, which is found to be an effective approach to model the perceptual sensitivity of biological vision, but has not been fully exploited in the context of video coding. At the macro block (MB) level, we derive the normalization factors based on the structural similarity (SSIM) index as an attempt to transform the domain frame residuals to a perceptually uniform space. We further develop an MB level perceptual mode selection scheme and a frame level. The proposed method can achieve significant gain in terms of rate-SSIM performance and provide better visual quality. Perceptual Video Coding Based on SSIM-Inspired Divisive Normalization
However, existing video coding techniques typically use the sum of absolute difference (SAD) or sum of square difference (SSD) as the model for distortion, which have been widely criticized in the literature for the lack of correspondence with perceptual quality.
For many years, there have been numerous efforts in developing subjective-equivalent quality models in an attempt to generate quality scores close to the opinions of human viewers but it is not achieved.
The structural similarity (SSIM) index has become a popular image quality measure in recent years in various image/video processing areas due to its good compromise between quality evaluation accuracy and computation efficiency.
One major advantage of utilizing the SSIM index is totally adaptive according to the reference signal and therefore it will be automatically adapted to the properties of the video content.