Image Inpainting with Convex Optimization with Heuristics

2025-07-17

Image inpainting is the task of restoring missing or corrupted parts of an image. State of the art methods are largely generative methods that are pretrained on hundreds of thousands of images, and attempt to sample from the latent distribution of the image. Rather than taking the “deep” approach, we will explore methods that are entirely based on self-similarity measures. The methods that we will explore are completely portable: they don’t require any priors, and no pretrained model is necessary.

We pose the image inpainting problem as an optimization problem. To ensure that the problems are computationally tractable, we will aim for these problems to be entirely convex. Our exploration begins will begin with “classroom” examples of image inpainting with convex optimization, and we develop several heuristics for our problem, until we achieve visually satisfactory results.

We consider two kinds of image inpainting problems: random data loss, and rectangular box inpainting. Random data loss is representative of communication over a noisy channel, where we lose pixels 50% of the time. Rectangular box inpainting is a more explicit inpainting problem, where a user would choose boxes that they would like the be filled in with something else. This can be to erase unwanted features of an image.

We discover that within the image domain, first- and second-order differences are usually sparse. This is especially true for small holes (such as in data loss), or in large holes with simple surroundings (such as a street with a shadow).

Test Images

We consider the following test images that we will perform inpainting on: Lena Test Image Street Test Image

Section 1: Minimizing Total Variation

We begin by approaching the problem in the traditional, classroom way. This is by minimizing what’s called total variation.

Let \(U \in \mathbb{R}^{m \times n}\) be the matrix representing the reconstructed image. Then we define the total variation as

\[ \text{tv}(U) = \sum_{i=1}^{m-1} \sum_{j=1}^{n-1} \left\lVert \begin{matrix} U_{i+1,j} - U_{i,j} \\ U_{i,j+1} - U_{i,j} \end{matrix} \right\rVert. \]

The effect of this is to minimize any “large” deviation between known and unknown pixels. We can now solve this convex optimization problem. Let \( K \) be the set of \( (i, j) \) indices that are known. Call \( U^{\text{orig}} \) containing the known matrix values that do not need to be filled in. Then our optimization problem is

\[ \begin{align*} \text{min} & \quad \text{tv}(U) \\ \text{s.t.} & \quad U_{ij} = U_{ij}^{\text{orig}} \quad \forall (i, j) \in K \end{align*} \]

We can write this using CVXPY as follows:

 1def inpaint_tv_rgb(img):
 2    img_rgb = np.array(img).astype(np.float32) / 255.0
 3    H, W, C = img_rgb.shape
 4    mask = (np.sum(img_rgb, axis=2) > 0).astype(np.float32)
 5
 6    # --- Variables ---
 7    U = [cp.Variable((H, W)) for _ in range(C)]
 8
 9    # --- Objectives ---
10    total_loss = 0
11    constraints = []
12    for c in range(C):
13        total_loss += cp.tv(U[c])
14        constraints += [
15            cp.multiply(mask, U[c]) == cp.multiply(mask, img_rgb[:, :, c]),
16            U[c] >= 0, U[c] <= 1
17        ]
18
19    # --- Solve ---
20    prob = cp.Problem(cp.Minimize(total_loss), constraints)
21    prob.solve(solver=cp.MOSEK, verbose=False)
22
23    # --- Output ---
24    u_val = np.stack([np.clip(U[c].value, 0, 1) for c in range(C)], axis=2)
25    return u_val

The resulting image is Lena First Solution

We see quite good performance on small holes in the Lena image. Now we examine performance on the NYC image with loss, which has more structure. NYC First Solution

Here, we see that total variation is very effective at dealing with images with randomized loss, at the very least, in the visual sense. It would be difficult for someone to tell that the inpainted image was ever corrupted to begin with.

Now we examine how total variation inpainting deals with large corrupted blocks. NYC First Solution Block

Note that here, we see quite poor performance with large shadow regions. This is a clear consequence of us using a naive total variation approach. Regions with shadows create large variations, but these are expected variations, so in our next steps, we should figure out a way to punish bad variation, while encouraging “expected” variation.

Section 2: Approximating the Laplacian

We saw that, in our previous attempts, minimizing total variation is effective in mitigating random noise, but large, structural changes pose significant challenges. In order to try to fix this, we can add an additional term penalizing the Laplacian. Hopefully, this will be effective in capturing larger variations in the image.

We numerically approximate the Laplacian by calculating variations in the \(x\) direction and in the \(y\) direction.

\[ \left\lVert \nabla \right\rVert^2 = \left\lVert \nabla_x \right\rVert^2 + \left\lVert \nabla_y \right\rVert^2, \]

where

\[ \nabla_x^2 = U_{i-1,j} - 2U_{i,j} + U_{i+1,j}, \]

and

\[ \nabla_y^2 = U_{i,j-1} - 2U_{i,j} + U_{i,j+1}. \]

We add a hyperparameter, \(\gamma\), which we will use to tune how much we want to penalize the Laplacian. In CVXPY, we can write this as

 1def inpaint_tv_lap_rgb(img):
 2    img_rgb = np.array(img).astype(np.float32) / 255.0
 3    H, W, C = img_rgb.shape
 4    mask = (np.sum(img_rgb, axis=2) > 0).astype(np.float32)
 5
 6    # --- Variables ---
 7    U = [cp.Variable((H, W)) for _ in range(C)]
 8
 9    # --- Objectives ---
10    total_loss = 0
11    constraints = []
12
13    for c in range(C):
14        # --- Laplacian
15        gamma = 0.01
16        lap_x = U[c][:-2, 1:-1] - 2 * U[c][1:-1, 1:-1] + U[c][2:, 1:-1]
17        lap_y = U[c][1:-1, :-2] - 2 * U[c][1:-1, 1:-1] + U[c][1:-1, 2:]
18        lap = gamma * (cp.sum_squares(lap_x) + cp.sum_squares(lap_y))
19
20        total_loss += cp.tv(U[c]) + lap
21        constraints += [
22            cp.multiply(mask, U[c]) == cp.multiply(mask, img_rgb[:, :, c]),
23            U[c] >= 0, U[c] <= 1
24        ]
25
26    # --- Solve ---
27    prob = cp.Problem(cp.Minimize(total_loss), constraints)
28    prob.solve(solver=cp.MOSEK, verbose=False)
29
30    # --- Output ---
31    u_val = np.stack([np.clip(U[c].value, 0, 1) for c in range(C)], axis=2)
32    return u_val

We now examine the two NYC images: NYC Sol 2 NYC Sol 2 Block We see that, although the results didn’t get worse, they didn’t really get better, either.

Section 3: Total Generalized Variation

At this point, we try to introduce something called the Total Generalized Variation (TGV). This is an extension of TV, which attempts to balance first-order smoothness and second-order smoothness using the \(l_1\) norm.

Total Variation will, as much as possible, try to create a piecewise constant block. While this is effective for filling in small boxes in areas with little texture, we see that this approach fails when texture information is present. For example, with large shadows. The Laplacian will then again smooth out flat regions. This combination produces the smooth grey blocks that we see above.

Moreover, using the \(l_2\) norm for total variation and the Laplacian means that the optimizer will “spread” errors out over all coordinates, which creates a smoothing effect.

Total Generalized Variation, on the other hand, attempts to balance the first and second-order effects while also promoting sparsity, by using the \(l_1\) norm. The effect of using the \(l_1\) norm on first-order terms is that of allowing large jumps in one direction, which means that edges will be preserved. This is an assumption that the underlying structure is sparse.

If \(U\) is our image, then we’ll let \(W\) approximate \(W \approx \nabla U\). Then TGV can be written as

\[ \text{tgv}(U) = \alpha_1 \cdot \left\lVert \nabla U - W \right\rVert_1 + \alpha_2 \left\lVert \nabla W \right\rVert_1, \]

where \(\alpha_1\) and \(\alpha_2\) are tuning parameters.

Notice that minimizing the total generalized variation will promote small, sparse gradients, as well as small second-order effects. We implement the TGV based image-inpainting algorithm below.

 1def inpaint_tgv_rgb(img):
 2    img_rgb = np.array(img).astype(np.float32) / 255.0
 3    H, W, C = img_rgb.shape
 4    mask = (np.sum(img_rgb, axis=2) > 0).astype(np.float32)
 5
 6    # --- Parameters ---
 7    alpha1 = 2.5
 8    alpha2 = 6
 9
10    # --- Variables ---
11    U = [cp.Variable((H, W)) for _ in range(C)]
12
13    # --- Objectives ---
14    total_loss = 0
15    constraints = []
16
17    for c in range(C):
18        # --- First-order
19        DUx = U[c][1:, :] - U[c][:-1, :]
20        DUy = U[c][:, 1:] - U[c][:, :-1]
21        Wx = cp.Variable((H - 1, W))
22        Wy = cp.Variable((H, W - 1))
23
24        # --- Second-order
25        DWx = Wx[:, 1:] - Wx[:, :-1]
26        DWy = Wy[1:, :] - Wy[:-1, :]
27        tgv_first_order = cp.norm1(DUx - Wx) + cp.norm1(DUy - Wy)
28        tgv_second_order = cp.norm1(DWx) + cp.norm1(DWy)
29        tgv_total = alpha1 * tgv_first_order + alpha2 * tgv_second_order
30
31        # --- Total
32        total_loss += tgv_total
33
34        # --- Constraints
35        constraints += [
36            cp.multiply(mask, U[c]) == cp.multiply(mask, img_rgb[:, :, c]),
37            U[c] >= 0, U[c] <= 1
38        ]
39
40    # --- Solve ---
41    prob = cp.Problem(cp.Minimize(total_loss), constraints)
42    prob.solve(solver=cp.MOSEK, verbose=False)
43
44    # --- Output ---
45    u_val = np.stack([np.clip(U[c].value, 0, 1) for c in range(C)], axis=2)
46    return u_val

NYC Sol3 Block This is considerable improvement! We see that the shadows in the bottom left-hand corner is filled in, in such a way that preserves the shape of the shadow. That being said, we do see some vertical and horizontal artifacts. This is a consequence of the fact that we use the \(l_1\) norm in the \(x\) and \(y\) directions, which promotes sparsity precisely in these directions. If we were using the \(l_2\) norm as before, we would like be able to eliminate some of these artifacts

Section 4: Resolving Artifacts in TGV inpainting

In order to eliminate the vertical and horizontal artifacts, we add a small \(l_2\) total variation penalty. This will penalize the vertical and horizontal lines that we see above.

 1def inpaint_tgv_tv_rgb(img):
 2    img_rgb = np.array(img).astype(np.float32) / 255.0
 3    H, W, C = img_rgb.shape
 4    mask = (np.sum(img_rgb, axis=2) > 0).astype(np.float32)
 5
 6    # --- Parameters ---
 7    alpha1 = 2.5
 8    alpha2 = 6
 9    beta = 0.01
10
11    # --- Variables ---
12    U = [cp.Variable((H, W)) for _ in range(C)]
13
14    # --- Objectives ---
15    total_loss = 0
16    constraints = []
17
18    for c in range(C):
19        # --- (TGV) First-order
20        DUx = U[c][1:, :] - U[c][:-1, :]
21        DUy = U[c][:, 1:] - U[c][:, :-1]
22        Wx = cp.Variable((H - 1, W))
23        Wy = cp.Variable((H, W - 1))
24
25        # --- (TGV) Second-order
26        DWx = Wx[:, 1:] - Wx[:, :-1]
27        DWy = Wy[1:, :] - Wy[:-1, :]
28        tgv_first_order = cp.norm1(DUx - Wx) + cp.norm1(DUy - Wy)
29        tgv_second_order = cp.norm1(DWx) + cp.norm1(DWy)
30        tgv_total = alpha1 * tgv_first_order + alpha2 * tgv_second_order
31
32        # --- Total Variation
33        tv = beta * cp.tv(U[c])
34
35        # --- Total
36        total_loss += tgv_total + tv
37
38        # --- Constraints
39        constraints += [
40            cp.multiply(mask, U[c]) == cp.multiply(mask, img_rgb[:, :, c]),
41            U[c] >= 0, U[c] <= 1
42        ]
43    # --- Solve ---
44    prob = cp.Problem(cp.Minimize(total_loss), constraints)
45    prob.solve(solver=cp.MOSEK, verbose=False)
46
47    # --- Output ---
48    u_val = np.stack([np.clip(U[c].value, 0, 1) for c in range(C)], axis=2)
49    return u_val

NYC Block 4 Observe that adding a small total variation penalty has smoothed out the vertical and horizontal artifacts. Our final optimizer is thus

\[ \begin{align*} \text{min} & \quad \text{tgv}(U) + \beta \cdot \text{tv}(U) \\ \text{s.t.} & \quad U_{ij} = U_{ij}^{\text{orig}} \quad \forall (i, j) \in K \end{align*} \]

We can now test this inpainted on our remaining test images: NYC 4 Lena 4

Conclusion

We see that we’re able to perform image inpainting tasks that effectively preserve first-order effects (straight edges) and partially preserve second-order effects (textures). Moreover, we can accomplish this without any priors, using only local information within the image.

Since we use only local information (first and second order differences), large holes will pose significant challenges for us. Even small holes, if they contain significant texture information, will not adequately be filled. At that point, it would be more effective to use inpainting methods that take into account global information, like PatchMatch, or even generative methods, like GANs or stable diffusion.

That being said, large blocks with little texture information, or very small blocks where first and second order approximations are adequate will be filled in very effectively, as we have seen with the NYC holes and loss examples.