This blog post explores the principles behind maintaining watermarks embedded in digital photos during editing and compression, along with the technical characteristics applied in the spatial and frequency domains.
Digital watermarking refers to the technology of invisibly embedding specific identifiers, or watermarks, into digital photos. The embedded identifier can be extracted through specific procedures and used as evidence to prove the photo’s copyright. Therefore, watermarking must ensure a certain level of robustness so that the watermark can be extracted in a form close to its original state, even if the original image undergoes editing such as rotation, cropping, or resizing, or is compressed. Furthermore, inserting a watermark must not alter the original data storage format, and invisibility must be maintained so that the embedded identifier is not easily detectable.
Digital photo data consists of brightness values for grid-like pixels arranged horizontally and vertically. Representing each pixel’s brightness value as two-dimensional array data is called the spatial domain approach. In digital photo data expressed via the spatial domain approach, watermarks can be inserted by appropriately adjusting the brightness values of pixels in areas less noticeable to the human eye. For example, the image data of a specific trademark can be incorporated into the brightness values by adding or multiplying it with the pixel values in a designated area. Since the spatial domain allows direct insertion of identifiers into pixel values, the computational load required for watermark insertion and extraction is relatively low, and the insertion process itself can be handled more quickly. However, a drawback is that watermarks inserted this way are confined to specific regions. Consequently, they can be easily damaged by simple image processing like cropping or even data compression that causes information loss.
This issue can be partially mitigated by utilizing the frequency domain. The spatial frequency represents the degree of brightness variation per unit distance. It signifies the degree of vibration occurring due to spatial movement, not the passage of time. In digital photos, the more frequently brightness changes occur in a specific direction, the higher the spatial frequency measured in that direction. Spatial frequency reaches its maximum when the brightness change between adjacent pixels is abrupt. Applying this principle allows a digital photo to be represented as a distribution of spatial frequencies on a two-dimensional plane in the horizontal and vertical directions. The distribution of spatial frequencies, formed as a two-dimensional array, is called the spatial frequency spectrum. Representing a photograph as a frequency spectrum is known as the frequency domain approach. Photographic data in the spatial domain can be losslessly converted to the frequency domain through mathematical transformations like the Fourier transform, and the inverse transformation is also valid.
To insert a watermark in the frequency domain, the spatial domain data must first be transformed into the frequency domain, and then identifier data is inserted into a specific frequency band. Subsequently, a reverse transformation back to the spatial domain is required. The identifier inserted into a specific frequency band is distributed across all pixels in the spatial domain that constitute that frequency, ensuring it is evenly dispersed and stored throughout the entire image. A watermark inserted in this manner remains largely invisible to the human eye. Even if some image editing, such as cropping, occurs, the watermark can be partially restored based on the identifier data stored in the remaining areas. However, the need for transformation between the spatial and frequency domains significantly increases the computational load for watermark insertion. Additionally, since the identifier data inserted into a specific band manifests as noise in the spatial domain, there is a drawback where the entire image may appear blurred or distorted.
In typical photographs, most information visually perceived by humans is concentrated in the low-frequency band. When comprehending a photo’s content, humans are relatively insensitive to high-frequency components compared to low-frequency ones. Therefore, even if the amount of noise introduced by watermarking is identical across all frequency bands, distortion in the high-frequency band is less noticeable in the original image. However, most lossy image compression techniques prioritize removing high-frequency components to reduce overall data size. Consequently, watermarks embedded in the high-frequency band become particularly vulnerable to compression. For this reason, watermarks in the frequency domain are mostly embedded in the mid-frequency band to maintain a certain level of robustness even under compression. This approach minimizes photo quality degradation while maximizing the potential for identifier preservation.