Fundamentals of Image ProcessingScience and Technology
Continuous-domain, discrete-domain, and finite-size images
An image is a spatially varying signal where and are two spatial coordinates. The signal value at each spatial location can be either a scalar (e.g. light intensity for gray scale images) or a vector (e.g. 3 dimensional vector for RGB color images, or more general -dimensional vector for multispectral images). In the latter case, we could treat each vector component separately as a scalar image (referred to as a channel), sometimes after a certain transformation in the -dimensional vector space.
In digital image processing, images are discretized into samples at discrete spatial locations that are indexed by integer coordinates . Typically, a discrete-domain image is related to a continuous-domain image through the sampling operation
where and are sampling intervals in and dimensions, respectively. More general, a discrete-space image is obtained through the generalized-sampling operation
where is the point-spread function of the image sensor (e.g. a photometric sensor in a digital camera) at the location indexed by . Typically, point-spread functions at different locations are simply shifted versions of a single function as
and is called the sampling kernel.
Furthermore, a discrete image is often of finite size; for example . Then can also be treated as an matrix. The image sample and the corresponding location is often called a pixel, or picture element.
Fourier transforms and sampling theorem
It is often very effective, conceptually and computationally, to represent images in the frequency domain using the Fourier transform. For a continuous-domain image , its Fourier transform is defined as
Here, and denote frequency variables and they have reciprocal unit with and . For example, if the spatial coordinate has unit in , then the corresponding frequency variable has unit in . Under certain conditions, the image can be exactly recovered from its frequency-domain by the inverse Fourier transform
We denote this pair of signals related by the Fourier transform (FT) as
For a discrete image the discrete-space Fourier transform (DSFT) relation
is defined as
It is easy to see that is a periodic function
and thus we only need to consider the function in one period; e.g. with .
Theorem 1 (Sampling) Suppose that the discrete-domain image is related to the continuous-domain image through the sampling operation [link]. Then their Fourier transforms are related by
The summation on the right-hand side of [link] consists of and its translated copies in frequency by . These copies with are called alias terms. If is bandlimited such that
then these alias terms do not overlap with , and thus can be exactly recovered from simply by
Here the rectangular function is defined as
We will show later that [link] in the spatial domain is equivalent to
where the sinc function is defined as
For the discrete image of finite size with , we have the discrete Fourier transform (DFT) relation
which is defined as
Therefore the DFT maps an image in the spatial domain into an image in the frequency domain; both images can have complex values.
In summary, we have seen the following three Fourier transforms
Among these transforms, only the last one, the DFT, is computationally feasible (i.e. with summations of finite terms). Moreover, the DFT can be implemented efficiently with fast Fourier transform algorithms. In moving from the FT to the DSFT and then to the DFT, we first discretize the spatial domain and then the frequency domain. Therefore, it is important to understand [link] and [link] so that we can relate the computational results and images by the DFT to the frequency representation of the original image in the real world.
In a more abstract framework, we can view each image as a vector in an appropriate vector space (i.e. for continuous-domain, discrete-domain, or discrete-domain of finite support). The associate Fourier transform is a linear mapping or linear operator that maps a vector in the spatial domain into a vector in the frequency domain. We can express the Fourier transform and its inverse using the matrix-vector multiplication notation
In the vector-space framework, it is particularly useful to view the inverse Fourier transform as a basis expansion. For example, the inverse DFT in [link] can be written as
in which the image is expanded as a linear combination of basis images where
Other transforms such as the wavelet transform simply provide other basis expansions.
- Complete the proof of the sampling theorem.
- Suppose that is a bandlimited image with its Fourier transform for . This image is captured by an CCD array where sensors are placed on a rectangular grid with spacing and each sensor measures the integral of light intensity falling in an area of size on this grid. Derive a formula to recover exactly from these discrete CCD measurements.
- Prove that the inverse DFT perfectly recovers the image. That is, suppose that is given in [link] then show that the right-hand side of [link] indeed returns .
- An 3 in by 4 in photo is discretized by a 200 dpi (dots-per-inch) scanner and results in a 600 by 800 digital image. Assume that alias is negligible. Suppose that we want to filter out all spatial high frequency above 50 in both horizontal and vertical dimensions out of the image. One way is to zero-pad the image to the size of 1024 by 1024 so that we can use a fast Fourier transform (FFT) algorithm to compute the DFT of size 1024 by 1024 of the zero-padded image. Find the indexes of the DFT coefficients that need to be zero out before taking the inverse DFT to achieve the desired filtering effect.
The image (linear) filtering operation in the continuous-domain is defined by convolution
And similarly in the discrete-domain:
The two-dimensional signal or is called filter, mask, or point-spread function.
Example 1 (First-order derivatives) The first-order derivatives in the and directions of a discrete image can be approximated by finite differences
which are convolutions with the following filters
Here in the matrix form, row and column indexes correspond to (first) and (second) dimensions, respectively; and the sample in the box corresponds to the original (i.e. ).
Example 2 (The Laplacian and image sharping) The Laplacian of a two-dimensional signal is defined as
Extending the definition of the first-order derivatives above to the second-order derivatives, we have
It follows that the Laplacian can be computed as
which is convolution with the following filter
Sometimes, the Laplacian is extended by adding two more terms for two diagonal directions leading to the following filter
Since the Laplacian is a derivative operator, it highlights intensity discontinuities (or edges) in an image. We can sharpen an image by adding the negative of the Laplacian image to the original
Using the Laplacian filter [link] we can write the resulting sharpening operation as a convolution
with the following filter
Example 3 (Gausian smoothing filter) The two-dimensional Gaussian filter, which is often used for image smoothing, is defined as
The 2D Gaussian filter is separable, which means it is a product of 1D filters in each dimension
Discrete-domain Gaussian filters used in practice are sampled and truncated versions of the above continuous-domain filters.
Example 4 (Sobel edge detector) The Sobel edge detector is obtained by smoothing the image in the perpendicular direction before computing the directional derivatives. The associate Sobel edge detector filters are given by
Edges are detected as pixels where the magnitude of the gradient is above a certain threshold ; i.e.
Frequency response of a filter
A key result in signal and image processing is that convolution in the space domain becomes multiplication in the frequency domain
Therefore, the Fourier transform of the filter, called frequency response, indicates how certain frequency components of the input image are amplified or attenuated in the resulting filtered image .
However, multiplication in the DFT domain corresponds to circular convolution in the space domain
The circular convolution operation for images of size is defined as
where denotes modulo of .
Sometimes we obtain images that are contaminated by noise. Image denoising aims to remove or reduce noise present in the image. Some common image denoising methods are smooth filtering, median filtering, Wiener filtering, wavelet thresholding or shrinkage.
We now present the removal of additive noise using Wiener filtering; first in 1D and then extend to 2D. Let denote the original clean signal and denote the noise. The obtained noisy signal is then
The goal is to filter the obtained noisy signal with a linear filter so that the output is a an estimate of the clean signal . The Wiener filter minimizes the expectation of the squared error given by
Differentiating the above with respect to and setting the result to zero, we get
Therefore, the Wiener filter has to satisfies the following (called orthogonal condition)
If we assume that and are wide-sense stationary random processes, then so is given by [link]. That means their auto-correlation and cross-correlation functions only depend on the difference of sample indexes. For example, we can write
Since , we have
Therefore the orthogonal condition [link] for the Wiener filter can be rewritten as
Taking the DTFT of the above we obtain a closed form formula for the Wiener filter
where function (called power spectral density) is defined as the DTFT of the corresponding correlation function . For example, if is a white Gaussian random process with zero mean and variance , then and .
If furthermore, we assume that the clean signal and the noise are uncorrelated, and the noise has zero mean. Then
It follows that
Therefore, we can express the Wiener filter for denoising solely in terms of the power spectral densities of the clean signal and the noise as
From this, it is easy to see that the Wiener filter for image denoising is given by
Wavelet-based methods have established as a state-of-the-art denoising approach for additive white Gaussian noise (AWGN)
Thresholding can be either “soft thresholding” or “hard thresholding”. A hard thresholding estimator is implemented with
A soft thresholding estimator is implemented with
What is remarkable is such a simple wavelet thresholding algorithm achieves optimal performance in certain sense. The asymptotically optimal threshold (also referred to as “the universal threshold”) is for signals of length . In practice, lower threshold such as improves the MSE significantly.
There are several variations of the basic wavelet thresholding scheme that offer performance gains.
- Translation invariant thresholding. An improved thresholding estimator called “cycle-spinning” is calculated by averaging estimators for translated versions of the signal. The algorithm is equivalent to thesholding the non-subsampled or `à trous' wavelet transform.
- Spatially adaptive thresholding.
- Denoising based on statistical modeling of wavelet coefficients.
- Prove [link] and [link].
- Derive the denoising algorithm using Wiener filter when both and are white Gaussian random processes with zero mean and variances and , respectively.
- Suppose that you want to denoise an image that was contaminated by an additive white Gaussian noise. In the wavelet domain (via an orthornormal transform), the wavelet coefficient of the noisy image is related to the corresponding wavelet coefficient of the clean image by: where is Gaussian random variable with the following PDF: For typical natural images, can be modeled as a generalized Gaussian random variable with the following PDF: The parameters , and can be estimated beforehand. Derive a simple closed form formula for the maximum a posteriori estimate from : using , and .
Suppose that the obtained image is a noisy convoluted version of the original image
Inverse filtering and Wiener filtering
Here we assume that the convolution filter is known and fixed. Without noise, a simple way to recover from is by inverse filtering
However, in the presence of noise, then inverse filtering the obtained signal in [link] leads to
Hence noise might be “blew up” in the second term of the above equation near the frequency such that is close to zero.
Following the derivation of the Wiener filter for denoising in the last section, we obtain the Wiener filter for deconvolution as:
In the vector-space framework, the deconvolution problem can be written in the familiar matrix-vector equation
where is the column vector corresponding to the unknown image , is the column vector corresponding to the given filtering image , and is the matrix corresponding to convolution with the filter . For example, 1D filtering a signal by a filter can be expressed as
The matrix-vector formulation allows us to resort to a rich body of literature on solving linear inverse problems.
In the presence of additive noise, our problem becomes
where is a random vector. Given data , a common approach is to search for the maximum-likelihood (ML) solution:
If is a white Gaussian noise of zero mean and variance , then given , is also a Gaussian random vector of mean and variance . That is
In this case the ML solution of [link] is
which is also the least-squares (LS) solution of [link].
If prior knowledge of is known in the form of , then we can search for the maximum a priori (MAP) solution
The second term in [link] is often called penalty or regularization term.
If we again assume the noise is white Gaussian then the MAP solution becomes
which is also called a regularized or penalized LS solution of [link].
Consider the ML or LS solution in [link]. From linear algebra, we know that this solution can be obtained by the pseudo-inverse . However, in image processing applications, the size of is typically in the order of millions samples, and thus storing matrix of size and computing its pseudo-inverse are impractical. On the other hand, although is a big matrix, it has compact description (convolution with a filter ) and fast algorithms for computing multiplications by and (convolution with short FIR filters or using FFT). Under these circumstances, iterative methods offer a practical solution.
Iterative methods for linear inversion
Generally, in an iterative method, there is a sequence of iterations where each one would successively improve the solution and hopefully they would quickly converse to the true solution. To measure the improvement at each iteration, an objective function is used.
The ML/LS solution of [link] is the minimizer of the following objective function
The gradient of this objective function is
A direct solver would set which leads to the normal equation and the solution using the pseudo-inverse.
The search for the minimizer of is a multidimensional search problem. Therefore, at each iteration step we should limit the search space to a one-dimensional search. Specifically, assume that the current estimate for the solution. Then for the next step, we will fix a search direction and limit the search space along that direction:
where is chosen as the optimal step size
This optimal step size is the solution of the following equation
which can be easily derived to be
Therefore, with an initial guess and a sequence of search directions , the iterative search process is completely determined.
A common choice for search directions is to use negative of the gradient of at each step, or
This choice of directions leads to the steepest descent procedure. Instead of using the optimal step size at each iteration given in [link], which incurs some computation cost, a simple approach is to use a fixed step for all iteration. This leads to the following iteration which is known as Landweber
We want to examine the convergence property of the Landweber iteration. Rewrite [link] as
then we have
Therefore, the sequence converges if all the eigenvalues of has magnitude less than 1. This condition is equivalent to
Under this condition we have
which is, as expected, the same as the solution using the pseudo-inverse.
- Derive the Wiener for deconvolution given in [link].
- Adjoint of convolution Iterative methods often requires fast algorithm for computing multiplication with , or adjoint of .
- Suppose that is the circular convolution matrix of size for the circular convolution with a length real filter . Develop a MATLAB function for a fast implementation of using
- Suppose that is the linear convolution matrix of size for the linear convolution of a length real FIR filter with a length input signal. Suppose that is much greater than . Develop a MATLAB function for a fast implementation of using the
w = conv(u,v) convolves vectors u and v.
- Suppose that is the circular convolution matrix of size for the circular convolution with a length real filter . Develop a MATLAB function for a fast implementation of using
Interpolation can be considered as the inverse operation of sampling: it aims to reconstruct signal value at any locations given samples on a sampling grid. Image interpolation has numerous applications including general geometric transformations (image zooming, resizing, rotation, and warping) and digital-to-analog conversions.
We assume that the continuous-domain image at resolution has the following representation
Taking the Fourier transform of both sides of [link] we obtain the following equivalent representation in the frequency domain
For a fixed , the space of all possible functions that can be represented in the form [link] for some discrete-domain signal is a subspace, called shift-invariant subspace.
Three basic examples of shift-invariant subspaces are:
- For we have a subspace of piecewise-constant functions with pieces are rectangular regions . This corresponds to a zero-order-hold (ZOH) digital-to-analog (D/A) converter.
- For we have a subspace of bandlimited functions to frequencies . This corresponds to an ideal D/A converter. In this case, applying [link]-[link] we obtain [link].
- Between these two above extremes, if is a B-spline function then we have a subspace of spline functions.
We consider the generalized sampling scheme where sampled image is given by
Here we use the notation so that correlation can be written as a convolution.
Note that in the last two convolutions, the first one is in the discrete domain while the second one is in the continuous domain.
Now suppose that we are given sampled image at a lower resolution
Then similarly, we have
In words, sampled image is obtained from by filtering followed by downsampling.
- Given [link], prove [link].
- Prove [link]-[link].
- Suppose that the input image can be represented by a linear expansion
where is a chosen compactly supported basis function (e.g. B-spline).
Our imaging system over-samples the convolution of with a known and compactly supported filter and return discrete image
- Derive a fast algorithm to implement the “forward” linear operator that maps discrete image into discrete image .
- Derive a fast algorithm to implement , the adjoint operator of .
Image reconstruction from projections
In several important applications such as computer tomography (CT), we can collect projection data of an object and would like to reconstruct the internal view of the object. The projection data of an image is defined as
Theorem 2 (Projection-slice) Let be the one-dimensional Fourier transform of . Then
From the definition of the Fourier transform and projection we have
Apply the following change of variables (in fact a rotation)
to the last integral we have
Given take the inverse Fourier transform using polar coordinates to recover
then we have
General linear inversion
In more general setting, we have access to linear measurements of an unknown image and after discretization, image reconstruction amounts to solve the following linear inverse problem
where is a matrix representing the linear image-data relationship, is a column vector representing the unknown image, and is a column vector representing the collected data. We can use tools for solving linear inverse problems as described before.
- Suppose that you would like to reconstruct an unknown image from its given projection data. You planned to use the filtered back projection method, but forgot the ramp filtering step in [link] and instead back projected directly the projection data. Even worse, the acquired projection data were deleted! Study this problem in the continous-domain, show how can you recover the true image from the wrongly reconstructed image? What would be the major problem in this recovery and how can you overcome it?
- Projection and back projection Consider the two by two image Given the “row projections” and the “column projections” , can the elements of the image be recovered? Explain. Find the image that is obtained by back projecting row and column projections, and compare with .
- Regularization using SVD Consider the following regularized inverse problem for solving while enforcing the solution to be closed to a known vector . Using the SVD to diagonalize the above minimization problem and find the solution explicitly.