TÀI LIỆU

Fundamentals of Image Processing

Science and Technology

Lecture part 00

Lecture part 01

Lecture part 02

Lecture part 03

Lecture part 04

Lecture part 05

Lecture part 06

Lecture part 07

Lecture part 08

Image Representation

Continuous-domain, discrete-domain, and finite-size images

An image is a spatially varying signal $s (x, y)$ where $x$ and $y$ are two spatial coordinates. The signal value $s (x, y)$ at each spatial location $(x, y)$ can be either a scalar (e.g. light intensity for gray scale images) or a vector (e.g. 3 dimensional vector for RGB color images, or more general $P$ -dimensional vector for multispectral images). In the latter case, we could treat each vector component separately as a scalar image (referred to as a channel), sometimes after a certain transformation in the $P$ -dimensional vector space.

In digital image processing, images are discretized into samples at discrete spatial locations that are indexed by integer coordinates $[m, n]$ . Typically, a discrete-domain image $s [m, n]$ is related to a continuous-domain image $s (x, y)$ through the sampling operation

s [m, n] = s (m Δ_{x}, n Δ_{y}),

where $Δ_{x}$ and $Δ_{y}$ are sampling intervals in $x$ and $y$ dimensions, respectively. More general, a discrete-space image is obtained through the generalized-sampling operation

w [m, n] = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (x, y) φ_{m, n} (x, y) d x d y,

where $φ_{m, n} (x, y)$ is the point-spread function of the image sensor (e.g. a photometric sensor in a digital camera) at the location indexed by $(m, n)$ . Typically, point-spread functions at different locations are simply shifted versions of a single function as

φ_{m, n} (x, y) = φ (x - m Δ_{x}, y - n Δ_{y}),

and $φ_{m, n} (x, y)$ is called the sampling kernel.

Furthermore, a discrete image $s [m, n]$ is often of finite size; for example $0 \leq m \leq M - 1, 0 \leq n \leq N - 1$ . Then $s [m, n]$ can also be treated as an $M \times N$ matrix. The image sample $s [m, n]$ and the corresponding location $[m, n]$ is often called a pixel, or picture element.

Fourier transforms and sampling theorem

It is often very effective, conceptually and computationally, to represent images in the frequency domain using the Fourier transform. For a continuous-domain image $s (x, y)$ , its Fourier transform is defined as

S (u, v) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (x, y) e^{- j 2 π (x u + y v)} d x d y .

Here, $u$ and $v$ denote frequency variables and they have reciprocal unit with $x$ and $y$ . For example, if the spatial coordinate $x$ has unit in $mm$ , then the corresponding frequency variable $u$ has unit in ${mm}^{- 1}$ . Under certain conditions, the image $s (x, y)$ can be exactly recovered from its frequency-domain $S (u, v)$ by the inverse Fourier transform

s (x, y) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} S (u, v) e^{j 2 π (x u + y v)} d u d v .

We denote this pair of signals related by the Fourier transform (FT) as

s (x, y) \overset{FT}{⟷} S (u, v) .

For a discrete image $s [m, n]$ the discrete-space Fourier transform (DSFT) relation

s [m, n] \overset{DSFT}{⟷} S (u, v) .

is defined as

\begin{matrix} S_{d} (u, v) & = \sum_{m = - \infty}^{\infty} \sum_{n = - \infty}^{\infty} s [m, n] e^{- j 2 π (m u + n v)}, \\ s [m, n] & = \int_{- 1 / 2}^{1 / 2} \int_{- 1 / 2}^{1 / 2} S_{d} (u, v) e^{j 2 π (m u + n v)} d u d v . \end{matrix}

It is easy to see that $S_{d} (u, v)$ is a periodic function

S_{d} (u + k, v + l) = S_{d} (u, v), for all k, l \in Z,

and thus we only need to consider the function in one period; e.g. $S_{d} (u, v)$ with $| u | \leq 1 / 2, | v | \leq 1 / 2$ .

Theorem 1 (Sampling) Suppose that the discrete-domain image $s [m, n]$ is related to the continuous-domain image $s (x, y)$ through the sampling operation [link]. Then their Fourier transforms are related by

S_{d} (u, v) = \frac{1}{Δ_{x} Δ_{y}} \sum_{k \in Z} \sum_{l \in Z} S (\frac{u + k}{Δ_{x}}, \frac{v + l}{Δ_{y}}) .

(Sketch) One way to prove this is to express $s [m, n]$ using [link] by substituting $x = m Δ_{x}, y = n Δ_{y}$ and then “match” with the right-hand side of [link].

The summation on the right-hand side of [link] consists of $S (u / Δ_{x}, v / Δ_{y})$ and its translated copies in frequency by $(k, l)$ . These copies with $(k, l) \neq (0, 0)$ are called alias terms. If $s (x, y)$ is bandlimited such that

S (u, v) = 0, for | u | \geq 1 / (2 Δ_{x}), | v | \geq 1 / (2 Δ_{v}),

then these alias terms do not overlap with $S (u / Δ_{x}, v / Δ_{y})$ , and thus $S (u, v)$ can be exactly recovered from $S_{d} (u, v)$ simply by

S (u, v) = Δ_{x} Δ_{y} rect (Δ_{x} u) rect (Δ_{y} v) S_{d} (Δ_{x} u, Δ_{y} v) .

Here the rectangular function is defined as

rect (x) = \{\begin{matrix} 1 & if | x | \leq 1 / 2 \\ 0 & else. \end{matrix}

We will show later that [link] in the spatial domain is equivalent to

s (x, y) = \sum_{k \in Z} \sum_{l \in Z} s [m, n] sinc (t / Δ_{x} - m) sinc (t / Δ_{y} - n),

where the sinc function is defined as

sinc (x) = \frac{sin (π x)}{π x} .

For the discrete image $s [m, n]$ of finite size $M \times N$ with $0 \leq m \leq M - 1, 0 \leq n \leq N - 1$ , we have the discrete Fourier transform (DFT) relation

s [m, n] \overset{DFT}{⟷} S [k, l],

which is defined as

\begin{matrix} S [k, l] & = \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} s [m, n] e^{- j 2 π (m k / M + n l / N)}, \\ s [m, n] & = \frac{1}{M N} \sum_{k = 0}^{M - 1} \sum_{l = 0}^{N - 1} S [k, l] e^{j 2 π (m k / M + n l / N)} . \end{matrix}

Therefore the DFT maps an $M \times N$ image in the spatial domain into an $M \times N$ image in the frequency domain; both images can have complex values.

Relating [link] to [link], we see that if the $M \times N$ image $s [m, n]$ is zero padded outside its support $[0, M - 1] \times [0, N - 1]$ then $S [k, l]$ is a sampled image of $S_{d} (u, v)$ ,

S [k, l] = S_{d} (k / M, l / N) .

In summary, we have seen the following three Fourier transforms

\begin{matrix} continuous-domain & \overset{FT}{⟷} continuous-domain \\ discrete-domain & \overset{DSFT}{⟷} continuous-domain \\ discrete-domain & \overset{DFT}{⟷} discrete-domain \end{matrix}

Among these transforms, only the last one, the DFT, is computationally feasible (i.e. with summations of finite terms). Moreover, the DFT can be implemented efficiently with fast Fourier transform algorithms. In moving from the FT to the DSFT and then to the DFT, we first discretize the spatial domain and then the frequency domain. Therefore, it is important to understand [link] and [link] so that we can relate the computational results and images by the DFT to the frequency representation of the original image in the real world.

Vector-space framework

In a more abstract framework, we can view each image as a vector in an appropriate vector space (i.e. for continuous-domain, discrete-domain, or discrete-domain of finite support). The associate Fourier transform is a linear mapping or linear operator that maps a vector in the spatial domain into a vector in the frequency domain. We can express the Fourier transform and its inverse using the matrix-vector multiplication notation

\begin{matrix} Forward or Analysis: & S = F s, \\ Inverse or Synthesis: & s = F^{- 1} S, \end{matrix}

In the vector-space framework, it is particularly useful to view the inverse Fourier transform as a basis expansion. For example, the inverse DFT in [link] can be written as

s = \sum_{k = 0}^{M - 1} \sum_{l = 0}^{N - 1} S [k, l] f_{k, l},

in which the image $s$ is expanded as a linear combination of basis images $f_{k, l}$ where

f_{k, l} [m, n] = \frac{1}{M N} e^{j 2 π (m k / M + n l / N)} .

Other transforms such as the wavelet transform simply provide other basis expansions.

Problems

Complete the proof of the sampling theorem.
Suppose that $s (x, y)$ is a bandlimited image with its Fourier transform $S (f_{x}, f_{y}) = 0$ for $| f_{x} | \geq 1 / (2 Δ_{x}), | f_{y} | \geq 1 / (2 Δ_{y})$ . This image is captured by an CCD array where sensors are placed on a rectangular grid with spacing $Δ_{x} \times Δ_{y}$ and each sensor measures the integral of light intensity falling in an area of size $Δ_{x} \times Δ_{y}$ on this grid. Derive a formula to recover $s (x, y)$ exactly from these discrete CCD measurements.
Prove that the inverse DFT perfectly recovers the image. That is, suppose that $S [k, l]$ is given in [link] then show that the right-hand side of [link] indeed returns $s [m, n]$ .
An 3 in by 4 in photo is discretized by a 200 dpi (dots-per-inch) scanner and results in a 600 by 800 digital image. Assume that alias is negligible. Suppose that we want to filter out all spatial high frequency above 50 ${inch}^{- 1}$ in both horizontal and vertical dimensions out of the image. One way is to zero-pad the image to the size of 1024 by 1024 so that we can use a fast Fourier transform (FFT) algorithm to compute the DFT of size 1024 by 1024 of the zero-padded image. Find the indexes of the DFT coefficients that need to be zero out before taking the inverse DFT to achieve the desired filtering effect.

Image Filtering

Convolution operations

The image (linear) filtering operation in the continuous-domain is defined by convolution

r (x, y) = (s * h) (x, y) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} h (x^{'}, y^{'}) s (x - x^{'}, y - y^{'}) d x^{'} d y^{'} .

And similarly in the discrete-domain:

r [m, n] = (s * h) [m, n] = \sum_{m^{'} = - \infty}^{\infty} \sum_{n^{'} = - \infty}^{\infty} h [m^{'}, n^{'}] s [m - m^{'}, n - n^{'}] .

The two-dimensional signal $h (x, y)$ or $h [m, n]$ is called filter, mask, or point-spread function.

Examples

Example 1 (First-order derivatives) The first-order derivatives in the $x$ and $y$ directions of a discrete image $s [m, n]$ can be approximated by finite differences

\begin{matrix} \frac{\partial s}{\partial x} & = s [m + 1, n] - s [m, n] = (s * h_{x}) [m, n] \\ \frac{\partial s}{\partial y} & = s [m, n + 1] - s [m, n] = (s * h_{y}) [m, n], \end{matrix}

which are convolutions with the following filters

h_{x}^{(1)} = (\begin{matrix} 1 \\ - 1 \end{matrix}), h_{y}^{(1)} = (\begin{matrix} 1 & - 1 \end{matrix}) .

Here in the matrix form, row and column indexes correspond to $x$ (first) and $y$ (second) dimensions, respectively; and the sample in the box corresponds to the original (i.e. $(m, n) = (0, 0)$ ).

Example 2 (The Laplacian and image sharping) The Laplacian of a two-dimensional signal $s (x, y)$ is defined as

Δ s = \frac{\partial^{2} s}{\partial x^{2}} + \frac{\partial^{2} s}{\partial x^{2}}

Extending the definition of the first-order derivatives above to the second-order derivatives, we have

\begin{matrix} \frac{\partial^{2} s}{\partial x^{2}} & = s [m + 1, n] - 2 s [m, n] + s [m - 1, n] \\ \frac{\partial^{2} s}{\partial y^{2}} & = s [m, n + 1] - 2 s [m, n] + s [m, n - 1] . \end{matrix}

It follows that the Laplacian can be computed as

\begin{matrix} Δ s [m, n] & = s [m + 1, n] + s [m - 1, n] + s [m, n + 1] + s [m, n - 1] - 4 s [m, n] \\ = (s * h_{Δ}) [m, n], \end{matrix}

which is convolution with the following filter

h_{Δ} = (\begin{matrix} 0 & 1 & 0 \\ 1 & - 4 & 1 \\ 0 & 1 & 0 \end{matrix}) .

Sometimes, the Laplacian is extended by adding two more terms for two diagonal directions leading to the following filter

h_{Δ}^{'} = (\begin{matrix} 1 & 1 & 1 \\ 1 & - 8 & 1 \\ 1 & 1 & 1 \end{matrix}) .

Since the Laplacian is a derivative operator, it highlights intensity discontinuities (or edges) in an image. We can sharpen an image by adding the negative of the Laplacian image to the original

r [m, n] = s [m, n] - Δ s [m, n] .

Using the Laplacian filter [link] we can write the resulting sharpening operation as a convolution

r [m, n] = (s * h_{sharp}) [m, n],

with the following filter

h_{sharp} = (\begin{matrix} - 1 & - 1 & - 1 \\ - 1 & 9 & - 1 \\ - 1 & - 1 & - 1 \end{matrix}) .

Example 3 (Gausian smoothing filter) The two-dimensional Gaussian filter, which is often used for image smoothing, is defined as

h_{Gauss}^{(2D)} (x, y) = \frac{1}{2 π σ^{2}} e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}} .

The 2D Gaussian filter is separable, which means it is a product of 1D filters in each dimension

h_{Gauss}^{(2D)} (x, y) = h_{Gauss}^{(1D)} (x) h_{Gauss}^{(1D)} (y), where h_{Gauss}^{(1D)} (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{x^{2}}{2 σ^{2}}} .

Discrete-domain Gaussian filters used in practice are sampled and truncated versions of the above continuous-domain filters.

Example 4 (Sobel edge detector) The Sobel edge detector is obtained by smoothing the image in the perpendicular direction before computing the directional derivatives. The associate Sobel edge detector filters are given by

h_{x}^{(Sobel)} = (\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}), h_{y}^{(Sobel)} = (\begin{matrix} 1 & 0 & - 1 \\ 2 & 0 & - 2 \\ 1 & 0 & - 1 \end{matrix}) .

Edges are detected as pixels $[m, n]$ where the magnitude of the gradient is above a certain threshold $T$ ; i.e.

| (s * h_{x}^{(Sobel)}) [m, n] | + | (s * h_{y}^{(Sobel)}) [m, n] | \geq T .

Frequency response of a filter

A key result in signal and image processing is that convolution in the space domain becomes multiplication in the frequency domain

\begin{matrix} r (x, y) = (s * h) (x, y) & \overset{FT}{⟷} R (u, v) = S (u, v) H (u, v), \\ r [m, n] = (s * h) [m, n] & \overset{DSFT}{⟷} R_{d} (u, v) = S_{d} (u, v) H_{d} (u, v) . \end{matrix}

Therefore, the Fourier transform $H (u, v)$ of the filter, called frequency response, indicates how certain frequency components of the input image $s (x, y)$ are amplified or attenuated in the resulting filtered image $r (x, y)$ .

However, multiplication in the DFT domain corresponds to circular convolution in the space domain

r [m, n] = (s ⊛_{M, N} h) [m, n] \overset{DFT}{⟷} R [k, l] = S [k, l] H [k, l] .

The circular convolution operation for images of size $M \times N$ is defined as

(s ⊛_{M, N} h) [m, n] = \sum_{m^{'} = 0}^{M - 1} \sum_{n^{'} = 0}^{N - 1} h [m^{'}, n^{'}] s [{〈 m - m^{'} 〉}_{M}, {〈 n - n^{'} 〉}_{N}],

where ${〈 n 〉}_{N}$ denotes modulo $N$ of $n$ .

Problems

Prove the property [link].
Prove the property [link].
Find and sketch the frequency response of the Laplacian filter given in [link]. Is this a lowpass or highpass filter?
Develop and write a simple Matlab function to detect edges at $45^{\circ}$ in an image using the conv2 function.

Image Denoising

Sometimes we obtain images that are contaminated by noise. Image denoising aims to remove or reduce noise present in the image. Some common image denoising methods are smooth filtering, median filtering, Wiener filtering, wavelet thresholding or shrinkage.

Wiener filter

We now present the removal of additive noise using Wiener filtering; first in 1D and then extend to 2D. Let $s [n]$ denote the original clean signal and $z [n]$ denote the noise. The obtained noisy signal is then

r [n] = s [n] + z [n] .

The goal is to filter the obtained noisy signal $r [n]$ with a linear filter $g [n]$ so that the output $\hat{s} [n] = (r * g) [n]$ is a an estimate of the clean signal $s [n]$ . The Wiener filter $g [n]$ minimizes the expectation of the squared error given by

\begin{matrix} J & = E {{(\hat{s} [n] - s [n])}^{2}} \\ = E \{{(\sum_{k} g [k] r [n - k] - s [n])}^{2}\} \end{matrix}

Differentiating the above with respect to $g [k]$ and setting the result to zero, we get

\frac{\partial J}{\partial g [k]} = 2 E \{(\hat{s} [n] - s [n]) r [n - k]\} = 0 .

Therefore, the Wiener filter $g [k]$ has to satisfies the following (called orthogonal condition)

E {\hat{s} [n] r [n - k]} = E {s [n] r [n - k]} .

If we assume that $s [n]$ and $z [n]$ are wide-sense stationary random processes, then so is $r [n]$ given by [link]. That means their auto-correlation and cross-correlation functions only depend on the difference of sample indexes. For example, we can write

\begin{matrix} R_{s s} [k] & = E {s [n] s [n - k]}, \\ R_{s r} [k] & = E {s [n] r [n - k]} . \end{matrix}

Since $\hat{s} [n] = (r * g) [n]$ , we have

\begin{matrix} E {\hat{s} [n] r [n - k]} & = E {\sum_{m} g [m] r [n - m] r [n - k]} \\ = \sum_{m} g [m] E {r [n - m] r [n - k]} \\ = \sum_{m} g [m] R_{r r} [k - m] \\ = (g * R_{r r}) [k] \end{matrix}

Therefore the orthogonal condition [link] for the Wiener filter $g [k]$ can be rewritten as

(g * R_{r r}) [k] = R_{s r} [k] .

Taking the DTFT of the above we obtain a closed form formula for the Wiener filter

G (u) = \frac{P_{s r} (u)}{P_{r r} (u)},

where function $P$ (called power spectral density) is defined as the DTFT of the corresponding correlation function $R$ . For example, if $z [n]$ is a white Gaussian random process with zero mean and variance $σ^{2}$ , then $R_{z z} [k] = σ^{2} δ [k]$ and $P_{z z} (u) = σ^{2}$ .

If furthermore, we assume that the clean signal $s [n]$ and the noise $z [n]$ are uncorrelated, and the noise $z [n]$ has zero mean. Then

E {s [n] z [k]} = E {s [n]} E {z [k]} = 0 for all n, k .

It follows that

\begin{matrix} R_{s r} [k] & = R_{s s} [k], \\ R_{r r} [k] & = R_{s s} [k] + R_{z z} [k], \end{matrix}

Therefore, we can express the Wiener filter for denoising solely in terms of the power spectral densities of the clean signal $s [n]$ and the noise $z [n]$ as

G (u) = \frac{P_{s s} (u)}{P_{s s} (u) + P_{z z} (u)}

From this, it is easy to see that the Wiener filter for image denoising is given by

G (u, v) = \frac{P_{s s} (u, v)}{P_{s s} (u, v) + P_{z z} (u, v)}

Wavelet denoising

Wavelet-based methods have established as a state-of-the-art denoising approach for additive white Gaussian noise (AWGN)

Another common type of noise is the impulse noise, which is sometimes referred to as speckle or salt and pepper noise. For this type of noise, normally a median filtering can do a good job.

. The basic idea is that due to its excellent approximation property for typical signals that are piecewise smooth, in the wavelet domain most of the signal information is captured in a few significant wavelet coefficients. As a result, the wavelet coefficients of the original signal stand out from the noisy one (after an orthonormal transform, an AWGN becomes another AWGN of the same variance). Therefore, a simple thresholding in the wavelet domain can effectively remove the noise out of the signal. [link] illustrates this basic concept of wavelet thresholding.

Thresholding can be either “soft thresholding” or “hard thresholding”. A hard thresholding estimator is implemented with

h_{T} (x) = \{\begin{matrix} x & if | x | > T, \\ 0 & if | x | \leq T . \end{matrix}

A soft thresholding estimator is implemented with

s_{T} (x) = \{\begin{matrix} x - T & if x > T, \\ x + T & if x < T, \\ 0 & if | x | \leq T . \end{matrix}

What is remarkable is such a simple wavelet thresholding algorithm achieves optimal performance in certain sense. The asymptotically optimal threshold (also referred to as “the universal threshold”) is $T = σ \sqrt{2 {log}_{e} N}$ for signals of length $N$ . In practice, lower threshold such as $T = 3 σ$ improves the MSE significantly.

There are several variations of the basic wavelet thresholding scheme that offer performance gains.

Translation invariant thresholding. An improved thresholding estimator called “cycle-spinning” is calculated by averaging estimators for translated versions of the signal. The algorithm is equivalent to thesholding the non-subsampled or `à trous' wavelet transform.
Spatially adaptive thresholding.
Denoising based on statistical modeling of wavelet coefficients.

Problems

Prove [link] and [link].
Derive the denoising algorithm using Wiener filter when both $s [n]$ and $z [n]$ are white Gaussian random processes with zero mean and variances $σ_{s}^{2}$ and $σ_{z}^{2}$ , respectively.
Suppose that you want to denoise an image that was contaminated by an additive white Gaussian noise. In the wavelet domain (via an orthornormal transform), the wavelet coefficient of the noisy image $y$ is related to the corresponding wavelet coefficient of the clean image $x$ by:
$y = x + n,$
where $n$ is Gaussian random variable with the following PDF:
$p (n) = const \cdot e^{- \frac{n^{2}}{2 σ^{2}}} .$
For typical natural images, $x$ can be modeled as a generalized Gaussian random variable with the following PDF:
$p (x) = const \cdot e^{- {(\frac{| x |}{α})}^{β}} .$
The parameters $σ$ , $α$ and $β$ can be estimated beforehand. Derive a simple closed form formula for the maximum a posteriori estimate ${\hat{x}}_{MAP}$ from $y$ :
${\hat{x}}_{MAP} = arg max_{x} p (x | y),$
using $σ$ , $α$ and $β$ .

Image Deconvolution

Suppose that the obtained image is a noisy convoluted version of the original image

r [m, n] = (s * h) [m, n] + z [m, n] .

Inverse filtering and Wiener filtering

Here we assume that the convolution filter $h$ is known and fixed. Without noise, a simple way to recover $s$ from $r$ is by inverse filtering

S (u, v) = \frac{R (u, v)}{H (u, v)} = R (u, v) \underset{G (u, v)}{\underset{︸}{\frac{1}{H (u, v)}}} .

However, in the presence of noise, then inverse filtering the obtained signal in [link] leads to

\frac{R (u, v)}{H (u, v)} = S (u, v) + \frac{Z (u, v)}{H (u, v)} .

Hence noise might be “blew up” in the second term of the above equation near the frequency such that $H (u, v)$ is close to zero.

Following the derivation of the Wiener filter for denoising in the last section, we obtain the Wiener filter for deconvolution as:

\begin{matrix} G (u, v) & = \frac{H^{*} (u, v) P_{s s} (u, v)}{{| H (u, v) |}^{2} P_{s s} (u, v) + P_{z z} (u, v)} \\ = \frac{1}{H (u, v)} \frac{{| H (u, v) |}^{2}}{{| H (u, v) |}^{2} + P_{z z} (u, v) / P_{s s} (u, v)} . \end{matrix}

Vector-space approach

In the vector-space framework, the deconvolution problem $(h * s) [m, n] = r [m, n]$ can be written in the familiar matrix-vector equation

A x = b,

where $x$ is the column vector corresponding to the unknown image $s [m, n]$ , $b$ is the column vector corresponding to the given filtering image $r [m, n]$ , and $A$ is the matrix corresponding to convolution with the filter $h [m, n]$ . For example, 1D filtering a signal ${s [n]}_{n = 0}^{2}$ by a filter ${h [n]}_{n = 0}^{1}$ can be expressed as

\underset{A}{\underset{︸}{(\begin{matrix} h [0] & 0 & 0 \\ h [1] & h [0] & 0 \\ 0 & h [1] & h [0] \\ 0 & 0 & h [1] \end{matrix})}} \underset{x}{\underset{︸}{(\begin{matrix} s [0] \\ s [1] \\ s [2] \end{matrix})}} = \underset{b}{\underset{︸}{(\begin{matrix} r [0] \\ r [1] \\ r [2] \\ r [3] \end{matrix})}}

The matrix-vector formulation allows us to resort to a rich body of literature on solving linear inverse problems.

In the presence of additive noise, our problem becomes

A x + z = b,

where $z$ is a random vector. Given data $b$ , a common approach is to search for the maximum-likelihood (ML) solution:

x_{ML}^{*} = max_{x} Pr (b | x) .

If $z$ is a white Gaussian noise of zero mean and variance $σ^{2}$ , then given $x$ , $b = A x + z$ is also a Gaussian random vector of mean $A x$ and variance $σ^{2}$ . That is

Pr (b | x) = \frac{1}{{(2 π σ^{2})}^{d / 2}} e^{- \frac{{∥ b - A x ∥}_{2}^{2}}{2 σ^{2}}} .

In this case the ML solution of [link] is

x_{ML}^{*} = arg min_{x} {∥ A x - b ∥}_{2}^{2},

which is also the least-squares (LS) solution of [link].

If prior knowledge of $x$ is known in the form of $Pr (x)$ , then we can search for the maximum a priori (MAP) solution

x_{MAP}^{*} = arg max_{x} Pr (x | b) = arg max_{x} {Pr (b | x) Pr (x)} .

The second term in [link] is often called penalty or regularization term.

If we again assume the noise $z$ is white Gaussian then the MAP solution becomes

\begin{matrix} x_{MAP}^{*} & = arg min_{x} {- log Pr (b | x) - log Pr (x)} \\ = arg min_{x} \{\frac{1}{2 σ^{2}} {∥ A x - b ∥}_{2}^{2} - log Pr (x)\}, \end{matrix}

which is also called a regularized or penalized LS solution of [link].

Consider the ML or LS solution in [link]. From linear algebra, we know that this solution can be obtained by the pseudo-inverse $x_{ML/LS}^{*} = A^{†} b = {(A^{T} A)}^{- 1} A^{T} b$ . However, in image processing applications, the size of $x$ is typically in the order of millions samples, and thus storing matrix $A$ of size $\tilde{1} 0^{6} \times 10^{6}$ and computing its pseudo-inverse are impractical. On the other hand, although $A$ is a big matrix, it has compact description (convolution with a filter $h [m, n]$ ) and fast algorithms for computing multiplications by $A$ and $A^{T}$ (convolution with short FIR filters or using FFT). Under these circumstances, iterative methods offer a practical solution.

Iterative methods for linear inversion

Generally, in an iterative method, there is a sequence of iterations where each one would successively improve the solution and hopefully they would quickly converse to the true solution. To measure the improvement at each iteration, an objective function is used.

The ML/LS solution of [link] is the minimizer of the following objective function

f (x) = {∥ A x - b ∥}_{2}^{2} = {(A x - b)}^{T} (A x - b) = x^{T} A^{T} A x - 2 b^{T} A x + b^{T} b .

The gradient of this objective function is

▽ f (x) = 2 A^{T} A x - 2 A^{T} b .

A direct solver would set $▽ f (x) = 0$ which leads to the normal equation $A^{T} A x = A^{T} b$ and the solution using the pseudo-inverse.

The search for the minimizer of $f (x)$ is a multidimensional search problem. Therefore, at each iteration step we should limit the search space to a one-dimensional search. Specifically, assume that $x_{n}$ the current estimate for the solution. Then for the next step, we will fix a search direction $p_{n}$ and limit the search space along that direction:

x_{n + 1} = x_{n} + α_{n} p_{n},

where $α_{n} \in R$ is chosen as the optimal step size

α_{n} = arg min_{α} f (x_{n} + α p_{n}) .

This optimal step size $α_{n}$ is the solution of the following equation

\frac{d f}{d α} (x_{n} + α p_{n}) = 0

which can be easily derived to be

α_{n} = \frac{p_{n}^{T} A^{T} (b - A x_{n})}{p_{n}^{T} A^{T} A p_{n}} .

Therefore, with an initial guess $x_{0}$ and a sequence of search directions $p_{0}, p_{1}, ...$ , the iterative search process is completely determined.

A common choice for search directions is to use negative of the gradient of $f$ at each step, or

p_{n} = - ▽ f (x_{n}) = 2 A^{T} (b - A x_{n}) .

This choice of directions leads to the steepest descent procedure. Instead of using the optimal step size $α_{n}$ at each iteration given in [link], which incurs some computation cost, a simple approach is to use a fixed step for all iteration. This leads to the following iteration which is known as Landweber

x_{n + 1} = x_{n} + λ A^{T} (b - A x_{n}) .

We want to examine the convergence property of the Landweber iteration. Rewrite [link] as

x_{n + 1} = \underset{C}{\underset{︸}{(I - λ A^{T} A)}} x_{n} + \underset{d}{\underset{︸}{λ A^{T} b}},

then we have

x_{n} = C^{n} x_{0} + (I + C + C^{2} + ... + C^{n - 1}) d .

Therefore, the sequence $x_{n}$ converges if all the eigenvalues of $C = I - λ A^{T} A$ has magnitude less than 1. This condition is equivalent to

\begin{matrix} | 1 - λ λ_{i} | < 1 for all eigenvalues λ_{i} of A^{T} A \\ ⟺ & 0 < λ < \frac{2}{λ_{max} (A^{T} A)} . \end{matrix}

Under this condition we have

\begin{matrix} lim_{n \to \infty} C^{n} = 0 \\ lim_{n \to \infty} (I + C + C^{2} + ... + C^{n - 1}) = {(I - C)}^{- 1}, \end{matrix}

and thus

lim_{n \to \infty} x_{n} = {(A^{T} A)}^{- 1} A^{T} b,

which is, as expected, the same as the solution using the pseudo-inverse.

Problems

Derive the Wiener for deconvolution given in [link].
Adjoint of convolution Iterative methods often requires fast algorithm for computing multiplication with AT, or adjoint of A.
1. Suppose that $\tilde{C}$ is the circular convolution matrix of size $N \times N$ for the circular convolution with a length $N$ real filter $h = [h_{0}, h_{1}, ..., h_{N - 1}]$ . Develop a MATLAB function for a fast implementation of ${\tilde{C}}^{T} y$ using fft and ifft commands.
2. Suppose that $C$ is the linear convolution matrix of size $(L + N - 1) \times L$ for the linear convolution of a length $N$ real FIR filter $h = [h_{0}, h_{1}, ..., h_{N - 1}]$ with a length $L$ input signal. Suppose that $L$ is much greater than $N$ . Develop a MATLAB function for a fast implementation of $C^{T} y$ using the conv command, where w = conv(u,v) convolves vectors u and v.

Image Interpolation

Interpolation can be considered as the inverse operation of sampling: it aims to reconstruct signal value at any locations given samples on a sampling grid. Image interpolation has numerous applications including general geometric transformations (image zooming, resizing, rotation, and warping) and digital-to-analog conversions.

We assume that the continuous-domain image $s (x, y)$ at resolution $(Δ_{x}, Δ_{y})$ has the following representation

s (x, y) = \sum_{m = - \infty}^{\infty} \sum_{m = - \infty}^{\infty} c [m, n] ψ (x - m Δ_{x}, y - n Δ_{y}) .

Taking the Fourier transform of both sides of [link] we obtain the following equivalent representation in the frequency domain

S (u, v) = C_{d} (Δ_{x} u, Δ_{y} v) Ψ (u, v) .

For a fixed $ψ$ , the space of all possible functions $s (x, y)$ that can be represented in the form [link] for some discrete-domain signal ${c [m, n]}$ is a subspace, called shift-invariant subspace.

Three basic examples of shift-invariant subspaces are:

For
$ψ (x, y) = \{\begin{matrix} 1 & if 0 \leq x < Δ_{x}, 0 \leq y < Δ_{y} \\ 0 & otherwise, \end{matrix}$
we have a subspace of piecewise-constant functions with pieces are rectangular regions $(x, y) \in [m Δ_{x}, (m + 1) Δ_{x}) \times [n Δ_{y}, (n + 1) Δ_{y})$ . This corresponds to a zero-order-hold (ZOH) digital-to-analog (D/A) converter.
For
$ψ (x, y) = sinc (x / Δ_{x}) sinc (y / Δ_{y})$
we have a subspace of bandlimited functions to frequencies $| u | \leq 1 / (2 Δ_{x}), | v | \leq 1 / (2 Δ_{y})$ . This corresponds to an ideal D/A converter. In this case, applying [link]-[link] we obtain [link].
Between these two above extremes, if $ψ (x, y)$ is a B-spline function then we have a subspace of spline functions.

We consider the generalized sampling scheme where sampled image is given by

\begin{matrix} w_{1} [m, n] & = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (x, y) φ_{1} (x - m Δ_{x}, y - n Δ_{y}) d x d y \\ = (s * \bar{φ_{1}}) (m Δ_{x}, n Δ_{y}) . \end{matrix}

Here we use the notation $\bar{φ} (x, y) = φ (- x, - y)$ so that correlation can be written as a convolution.

If $s (x, y)$ has the form of [link] then substituting it to [link] we have

w_{1} [m, n] = (c * b_{1}) [m, n],

where

b_{1} [k, l] = (ψ * \bar{φ_{1}}) (k Δ_{x}, l Δ_{y}) .

Note that in the last two convolutions, the first one is in the discrete domain while the second one is in the continuous domain.

Given $ψ$ and $φ_{1}$ we can compute $b_{1}$ . Then solving the deconvolution problem [link] we can recover coefficients $c$ from sampled data $w$ . Finally, using [link] we can compute the image $s (x, y)$ at any location $(x, y)$ .

Now suppose that we are given sampled image at a lower resolution

w_{2} [m, n] = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (x, y) φ_{2} (x - 2 m Δ_{x}, y - 2 n Δ_{y}) d x d y = (s * \bar{φ_{2}}) (2 m Δ_{x}, 2 n Δ_{y}) .

Then similarly, we have

w_{2} [m, n] = (c * b_{2}) [2 m, 2 n],

where

b_{2} [k, l] = (ψ * \bar{φ_{2}}) (k Δ_{x}, l Δ_{y}) .

In words, sampled image $w_{2}$ is obtained from $c$ by filtering followed by downsampling.

Problems

Given [link], prove [link].
Prove [link]-[link].
Suppose that the input image s(x,y) can be represented by a linear expansion
$s (x, y) = \sum_{m \in Z} \sum_{n \in Z} c [m, n] ψ (x - m Δ, y - n Δ),$
where φ is a chosen compactly supported basis function (e.g. B-spline). Our imaging system over-samples the convolution of s(x,y) with a known and compactly supported filter φ(x,y) and return discrete image
$r [m, n] = (s * φ) (m Δ / 2, n Δ / 2), for m, n \in Z .$
1. Derive a fast algorithm to implement the “forward” linear operator $A$ that maps discrete image ${c [m, n]}$ into discrete image ${r [m, n]}$ .
2. Derive a fast algorithm to implement $A^{T}$ , the adjoint operator of $A$ .

Image Reconstruction

Image reconstruction from projections

In several important applications such as computer tomography (CT), we can collect projection data of an object and would like to reconstruct the internal view of the object. The projection data of an image $s (x, y)$ is defined as

\begin{matrix} p_{θ} (t) & = \int_{- \infty}^{\infty} s (t cos θ - r sin θ, t sin θ + r cos θ) d r \\ = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (x, y) δ (t - x cos θ - y sin θ) d x d y . \end{matrix}

Theorem 2 (Projection-slice) Let $P_{θ} (f)$ be the one-dimensional Fourier transform of $p_{θ} (t)$ . Then

P_{θ} (f) = S (f cos θ, f sin θ) .

From the definition of the Fourier transform and projection we have

\begin{matrix} P_{θ} (f) & = \int_{- \infty}^{\infty} p_{θ} (t) e^{- j 2 π f t} d t \\ = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (t cos θ - r sin θ, t sin θ + r cos θ) e^{- j 2 π f t} d r d t \end{matrix}

Apply the following change of variables (in fact a rotation)

\{\begin{matrix} x & = t cos θ - r sin θ \\ y & = t sin θ + r cos θ \end{matrix} ⟺ \{\begin{matrix} t & = x cos θ + y sin θ \\ r & = - x sin θ + y cos θ \end{matrix}

to the last integral we have

\begin{matrix} P_{θ} (f) & = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} s (x, y) e^{- j 2 π f (x cos θ + y sin θ)} d x d y \\ = S (f cos θ, f sin θ) . \end{matrix}

Given $S (f cos θ, f sin θ)$ take the inverse Fourier transform using polar coordinates to recover $s (x, y)$

\begin{matrix} s (x, y) & = \int_{0}^{2 π} \int_{- \infty}^{\infty} S (f cos θ, f sin θ) e^{j 2 π (x f cos θ + y f sin θ)} f d f d θ \\ = \int_{0}^{π} \int_{- \infty}^{\infty} S (f cos θ, f sin θ) e^{j 2 π f (x cos θ + y sin θ)} | f | d f d θ \\ = \int_{0}^{π} [\int_{- \infty}^{\infty} P_{θ} (f) | f | e^{j 2 π f (x cos θ + y sin θ)} d f] d θ . \end{matrix}

Let

Q_{θ} (f) = P_{θ} (f) | f |,

then we have

s (x, y) = \int_{0}^{π} q_{θ} (x cos θ + y sin θ) d θ .

From [link] we see that $q_{θ} (t)$ is obtained by filtering $p_{θ} (t)$ . The operation [link] backproject to the spatial domain. Thus this called “filter-backprojection” algorithm for image reconstruction.

General linear inversion

In more general setting, we have access to linear measurements of an unknown image and after discretization, image reconstruction amounts to solve the following linear inverse problem

A x = b,

where $A$ is a matrix representing the linear image-data relationship, $x$ is a column vector representing the unknown image, and $b$ is a column vector representing the collected data. We can use tools for solving linear inverse problems as described before.

Problems

Suppose that you would like to reconstruct an unknown image from its given projection data. You planned to use the filtered back projection method, but forgot the ramp filtering step in [link] and instead back projected directly the projection data. Even worse, the acquired projection data were deleted! Study this problem in the continous-domain, show how can you recover the true image from the wrongly reconstructed image? What would be the major problem in this recovery and how can you overcome it?
Projection and back projection Consider the two by two image
$\begin{matrix} S = [\begin{matrix} s_{11} & s_{12} \\ s_{21} & s_{22} \end{matrix}] \end{matrix}$
Given the “row projections” $p_{j}^{R} = \sum_{i} s_{i j}$ and the “column projections” $p_{i}^{C} = \sum_{j} s_{i j}$ , can the elements of the image be recovered? Explain. Find the image $\hat{S}$ that is obtained by back projecting row and column projections, and compare $\hat{S}$ with $S$ .
Regularization using SVD Consider the following regularized inverse problem
$x^{*} = arg min_{x} {{∥ A x - b ∥}_{2}^{2} + λ {∥ x - m ∥}_{2}^{2}}$
for solving $A x = b$ while enforcing the solution $x$ to be closed to a known vector $m$ . Using the SVD to diagonalize the above minimization problem and find the solution $x^{*}$ explicitly.

TẢI VỀ

TÁI SỬ DỤNG

NỘI DUNG CÙNG TÁC GIẢ