Abstract

Overexposure may happen for imaging of solar observation as extremely violet solar bursts occur, which means that signal intensity goes beyond the dynamic range of imaging system of a telescope, resulting in loss of signal. For example, during solar flare, Atmospheric Imaging Assembly (AIA) of Solar Dynamics Observatory (SDO) often records overexposed images/videos, resulting loss of fine structures of solar flare. This paper makes effort to retrieve/recover missing information of overexposure by exploiting deep learning for its powerful nonlinear representation which makes it widely used in image reconstruction/restoration. First, a new model, namely, mask-Pix2Pix network, is proposed for overexposure recovery. It is built on a well-known Pix2Pix network of conditional generative adversarial network (cGAN). In addition, a hybrid loss function, including an adversarial loss, a masked L1 loss and a edge mass loss/smoothness, are integrated together for addressing challenges of overexposure relative to conventional image restoration. Moreover, a new database of overexposure is established for training the proposed model. Extensive experimental results demonstrate that the proposed mask-Pix2Pix network can well recover missing information of overexposure and outperforms the state of the arts originally designed for image reconstruction tasks.

1. Introduction

Solar imaging can provide us more information than one-dimensional solar flux about solar activities, promoting our abilities of probing the mystery of the Sun. Especially, the instruments onboard satellites have more advantages than ground-based instruments, due to good seeing and nonocclusion. During the past few decades, number of satellites carrying solar probing instruments have been launched, including Yohkoh [1], SOHO [24], TRACE [5], STEREO [6, 7], Hinode [8], and SDO [912]. These space instruments can provide us solar observation with unprecedented time cadence, high dynamic range, and high spatial/frequency resolution. For example, Atmospheric Imaging Assembly (AIA) instrument onboard SDO can image solar atmosphere in 12 wavelengths every 12 seconds, with spatial resolution of .

Nowadays, satellite carrying solar instruments can image the complete process of a solar activity with unprecedented high time cadence and high spatial resolution. These recorded images and videos provide the scientists the opportunity for uncovering the nature of solar activities, such as sunspot, filament, coronal loop, flare, or coronal mass emission (CME). However, for some extremely violet solar bursts, the intensity of signal may go beyond of the threshold of a telescope, leading to overexposure; therefore, information is missed in the captured images/videos. Although there were automatic exposure-control algorithms which were implemented in case of flares, controlling of exposure time to reduce overexposure, for example, AIA avoids overexposure by alternating between low and long exposures, they cannot guarantee complete avoidance of overexposure. In addition, the time cadence would increase, so temporal resolution is compromised. For addressing overexposure, an iterative algorithm by employing the PRiL approximation and EM algorithm is proposed in [13]. Different from [13], we solve this issue via deep learning.

Recently, taking advantages of the powerful capability of deep learning in its nonlinear representation, lots of traditional image processing/reconstruction problems achieved new breakthroughs, such as image noising, superresolution, image inpainting. Therefore, in this work, we decide to restore overexposure region (OER) by taking advantage of deep learning. Inspired by image inpainting, our task can be recast into a similar optimization problem of image inpainting. Both of them are proposed to recover missing region in an image using surrounding pixels according to the continuity of signal. However, for our task, there are at least three challenges relative to traditional image inpainting. Firstly, the fidelity of signal is required, not just look-alikes, since the data concerned in our task are used for scientific purpose, for example, computing physical parameters. Secondly, larger OER is concerned in our task comparing to image painting. The last, more irregular shapes of OER result in complicated boundaries connecting OERs and nonOERs/normal regions. These boundaries should be suppressed to prevent unnatural change of pixel value.

The existing image inpainting approaches can be divided into three classes: diffusion-based, patch-based, and learning based. From our experience, neither of diffusion-based and patch-based methods can contribute effective results for OER recovery. The diffusion-based methods [14,15] are highly restricted to locally available information, so they fail to recover meaningful structures of large missing regions, like OERs in our taks. Patch-based methods [1619] assume that missing region (OER) can find its most similar/relevant patches in the given image. This assumption, however, does not hold for OERs. The existing learning-based inpainting methods are not competent in our task for two reasons. First, some of the learning-based methods, such as [2023] and [24], trained their networks with fixed shapes and locations of missing regions, while the OERs commonly exhibit irregular shape and random location. Second, despite several studies for irregular missing region in image inpainting, such as [2528], they focus on generating visually coherent completion or producing semantically plausible results. However, our task aims to restore missing OER with high fidelity, not just a visually plausible result.

To address the aforementioned challenges, we propose a learning-based model, namely, mask-Pix2Pix network, to estimate the OER of solar image. Our network is built on the Pix2Pix [29], which is a general network for image-to-image translation using a conditional generative adversarial network (cGAN) [30]. We adopt the similar architectures with the Pix2Pix: U-net [31] in generator and PatchGAN [30] in discriminator. The main improvements and contributions of this paper are as follows: (1) Unlike the conventional Pix2Pix, our network utilizes the Convolution-SwitchNorm-LReLU/ReLU [32] modules (LReLU for encoder and ReLU for decoder) rather than the Convolution-BatchNorm-ReLU [33] ones. The former (i.e., switchable normalization) can switch between BatchNorm [33], LayerNorm [34], and InstanceNorm [35] by learning their importance weights in an end-to-end manner. The improved architecture of our model boosts the robustness of the network. (2) Our objective function contains an adversarial cGAN loss, a masked L1 loss, and an edge mask loss/smoothness. The adversarial cGAN loss can capture the full entropy of the conditional distributions they model and thereby produce highly realistic textures. The masked L1 loss calculates the L1 loss only in masked regions (OERs), enforcing correctness at low frequencies which guarantees restoration of high fidelity for OERs. The edge mask loss is used for smoothing edges of OERs and suppressing edge artifacts in final restored image. (3) Additionally, a new database of overexposure is established for training and testing the proposed mask-Pix2Pix network, where 13700 images are collected from the Large-scale Solar Dynamics Observatory image database (LSDO) [36].

2. Background

2.1. Image Inpainting

Image inpainting is the process of reconstructing lost or deteriorated parts of an image. Nowadays, it still remains theoretically and computationally challenging. It is highly ill-posed. Existing image inpainting algorithms can be categorized into three groups, diffusion-based [14, 15], patch-based [1619], and learning-based. The former two typically use differential operators or patch similarity to propagate information from normal regions to missing regions. They work well for stationary textures. However, diffusion-based ones are highly restricted to locally available information, so they usually fail to recover meaningful structures of large missing regions. Patch-based ones, such as Planar Structure Guidance (PSG) [18] and Statistics of Similar Patches (SSPs) [19], assume that missing regions can find their most similar/relevant patches from normal regions of the given input image. This assumption does not hold for our task where OER probably does not have similar patches.

Recently, deep learning-based approaches have emerged as a promising paradigm for image inpainting. A significant advantage of them over conventional ones lies in their abilities to learn and understand semantics of images with complicated scenes. Context-Encoder (CE) [21] introduces a deep adversarial model to predict plausible structures with combined 2 pixel-wise reconstruction loss and adversarial loss. Although the adversarial loss improves the inpainting quality, the results exhibit blurriness and contain notable artifacts. To overcome the limitations of the CE [21] method, Iizuka et al. [23] propose a novel architecture with three networks: the completion network (using dilated convolutions [20]), the global, and local context discriminators. However, the training model with two discriminators is time-consuming, and the output heavily relies on Poisson image blending [37] as a postprocessing. Multiscale neural patch synthesis (MNPS) [22] approach contains a content network to learn the semantics and global structure of the image and a texture network to generate fine-detailed result by employing a pretrained VGG-19 [38] network. Liu et al. [25] introduced a partial convolution (PC) model for irregularly shaped hole inpainting with an automatic mask updating step to reduce artifacts. In the PC [25] method, the convolution is masked and renormalized to be conditioned on only valid pixels. Both of the methods of MNPS [22] and PC [25] add the TV loss term to encourage smoothness on the 1-pixel dilation of the hole boundaries. Therefore, they avoid requiring postprocessing steps (such as Poisson image blending operation) to enforce texture coherency near the hole boundaries.

2.2. Image-to-Image Translation with Pix2Pix

Pix2Pix [29] utilizes a conditional generative adversarial network (cGAN) [30] to achieve the target of image-to-image translation. Instead of using a conventional encoder-decoder, the generator in Pix2Pix employs an U-Net [31] architecture, in which the encoder layers and decoder layers are directly connected by “skip connection.” Since the skip connection can shuttle the low-level information (which are commonly shared between the input and output images) across the bottleneck of the encoder-decoder net, it effectively improves the performance of the image-to-image translation. The synthesized images of the generator should not be distinguished from the real one by an adversarially trained discriminator. In order to model high frequencies, the discriminator employs a convolutional PatchGAN classifier, i.e., only the structure at certain scale of patches is penalized. By this method, the PatchGAN is taken as a form of texture loss. To summarise, there are at least three factors that enable the Pix2Pix network outperform past works: the U-Net architecture for the generator, the convolutional PatchGAN classifier for discriminator, and the cGANs for network training.

3. Proposed Method

3.1. Problem Description

The overexposure problem is simply explained in Figure 1, where ground-truth image , overexposure image , binary mask map , and edge mask map are concerned. The mission of OER recovery is to recover the region labeled by in , and outside region of keeps unchanged. Given OER and nonOER , we can get and , where is the element-wise product operator. Inspired by image inpainting, the neural network, GAN, can be employed to retrieve the missing region of (i.e., ). In a GAN, a generator G is trained on the pairs of ground truth and degraded ones. Then, the generator is applied to a degraded image to output a repaired one, i.e., .

For our task, should meet the following criteria:(a)The restored image should have realistic textures as far as possible, i.e., visually coherent and semantically plausible relative to (b)The restored region should have high fidelity relative to corresponding regions in ground-truth (c)It should be avoided artificial edges () surrounding OERs, i.e., the boundary between OERs and normal regions should transit smoothly.

To achieve (3.1) and (3.1), conventional Pix2Pix network is employed and modified by adding a mask L1 loss term. To address (3.1), an edge mask loss/smoothness term is introduced, where ground-truth edge mask is given by the edge of mask map as shown in Figure 1.

3.2. Solar OER Recovery Database

For training our model, a database for overexposure is established. The raw data is cropped from LSDO [36]. The LSDO database records consist of three different parts: event records, corresponding images, and extracted image parameters. Event records encompass the list of solar events with generated and extracted attributes. Among these attributes, the bounding box attribute consists of four coordinate pixel values. LSDO has preprocessed all original polygon values of event records from their Helioprojective Coordinates (HPC) solar coordinate system into pixels. This preprocessing step is vital to make polygons compatible with the high-resolution images in the database. More information in detail of the LSDO can be referenced from the literature [36].

In this work, ground-truth is cropped from Active Region (AR) of images in LSDO. The database is established as following:(1)According to polygon values of AR in LSDO, corresponding AR square regions are cropped from nonover-exposed AIA/SDO images, we get .(2)Scale into the same size of (256  256), and label OERs in by a threshold segmentation: pixel values (from 0 to 1) over a threshold (e.g. 0.8) are set to be 1. Thereby, we get .(3)Label OERs () in as mask map , and dilate with circular kernel (the radius is 3). The dilated mask map is used to calculate the masked L1 loss in subsection 3.3.(4)Extract the edge of , with edge width of 5 pixel. This edge mask map accounts for edge mask loss/smoothness in subsection 3.3.

Thereby, each sample in database contains four parts: real image , overexposed (fake) image , mask map , and edge mask map .

3.3. Network Architecture

The proposed mask-Pix2Pix network is composed of a discriminator and a generator. The generator employs a U-Net architecture as demonstrated in Figure 2. The U-Net is named after its shape which looks like a “U.” It consists of an encoder of 8 layers, and a decoder of 8 layers. The parameters of each layer are explained in Table 1 in detail. In addition, skip connections are added between encoder and decoder at the same layer as shown in Figure 2 with dotted lines. Each skip connection simply concatenates feature map of encoder with that of decoder at the same layer (e.g., and at the l-th layer in Figure 2). This cross-layer connection can reduce sematic gap between corresponding layers of encoder and decoder since they are too far away in U-Net structure. The discriminator is a PatchGAN network, the structure of which is explained in Table 2. In a GAN framework, the discriminator is to judge “fake” instances from “real” ones. In our work, the output of the discriminator is a image, each pixel value of which ranges from 0 to 1, for measuring how real is the output. In addition, we adopt Convolution-SwitchNorm-LReLU/ReLU [32] instead of Convolution-BatchNorm-ReLU [33] of conventional Pix2Pix [29] in the proposed model. The former was proved to be more robust.

3.4. Loss Functions

It is extremely vital to design a loss function for modelings by using machine learning. Our task is to recover missing signal of OERs in an image, which mostly concerns fidelity of reconstructed signal, natural transition of pixel value. It has something in common with image inpainting, but not the same. Therefore, a new hybrid loss function is designed for OER recovery task as follows.

Given ground-truth image , degraded image , initial binary mask (1 for OERs), and ’s edge map , as demonstrated in Figure 1, the generator outputs . For training model, a hybrid loss function is defined, consisting of three components: (1) adversarial cGAN loss for high fidelity of reconstructed image relative to ground-truth , which is defined as

where G and D represents the generator and discriminator, respectively, the objective of G is to minimize so that is more like a real instead of fake image, while D pursues the maximum of to distinguish “fake” and “real” as far as possible. Figure 3 illustrates the process of the adversarial cGAN training, where the discriminator excuses two tasks. The left one is to discriminate if and are a pair of “real” and “fake”. The right other is to discriminate is a “real” or “fake,” where the generator G is trained to produce fakes for deceiving the discriminator D, while D is trained to identify “fake” as far as possible.(2)L1 loss of masked region relative to , targeting for accurate reconstruction of OER, which is defined as:(3)L1 loss of only edge mask, which could make the edges of OERs smooth to prevent artificial edges connecting OERs and nonOERs. It is defined as

In conclusion, the final optimization objective is given bywhere and are the weights for combining the above three loss components. They are set to 0.1 in our experience.

4. Experimental Results

To evaluate the proposed mask-Pix2Pix model, it is compared with other state-of-the-art image inpainting algorithms. In addition, the contribution of each factor of the loss function to the overall performance is evaluated. These two experiments are carried out on our synthetic overexposure images. Then, the pretrained mask-Pix2Pix model is evaluated on real solar images.

4.1. Implementation Details

We apply our model to overexposure training database which are artificially built from LSDO as described in subsection 3.2. For training, the input image size is and the mini-batch size is set to 16. We train our model with ADAM optimizer [39] by setting , , and . The initial learning rate is initialized to 0.0002 and then reduced to half every 100 epochs. We apply PyTorch on an NVIDIA Tesla P100 for model training.

4.2. Comparisons with State-of-the-Art Approaches

We compare the proposed Mask-Pix2Pix with a patch-based Planar Structure Guidance (PSG) [18] and a learning-based Pix2Pix network [29]. Figure 4 shows the comparisons on 3 samples, where it can be found that the proposed Mask-Pix2Pix can generate images with significantly better quality than other two benchmarks. Among these three compared methods, PSG [18] is the worst. It can be found that PSG does not work sufficiently for recovering OERs as shown in Figure 4. Pix2Pix [29] may underestimate the missing regions, still leaving overexposure in the center of OERs after reconstruction. We quantitatively measure their performances using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [40]. PSNR and SSIM are given below each reconstructed image in Figure 4. The average PSNR and SSIM are also computed on the whole database (over 1600 samples), listed in Table 3. It can be observed that the proposed model outperforms other two benchmarks significantly, achieving up to 5 dB relative to the Pix2Pix [29] and 15 dB relative to the PSG [18].

4.3. Evaluation on Each Component of Loss Function

To evaluate contribution of each loss component, we perform the following three testings:(1)The adversarial loss and conventional L1 loss on whole image, i.e., ;(2)The adversarial loss and masked L1 loss, i.e., (3)The adversarial loss, masked L1 loss, and edge mask loss/smoothness, i.e., .

Figure 5 illustrates three groups of experiences with different training losses. We can find that despite the outputs of have smooth boundary between missing regions and normal regions, it fails to estimate accurate intensity and content of missing regions. Benefiting from the masked loss, can obtain better intensity and content estimation for missing regions; however, it yields artificial edges as illustrated in the first and third images in Figure 5. The network trained with the mixed losses outperforms the other two, successfully addressing all of the three criteria aforementioned in subsection 3.1.

Another improvement of the proposed model lies in the replacement of Convolution-BatchNorm/ReLU [33] in Pix2Pix [29] by Convolution-SwitchNorm-LReLU/ReLU [32], which is witnessed by the second and third rows of Table 3. In Table 3, the second row lists the average PSNR and SSIM of Pix2Pix with Convolution-BatchNorm-ReLU [33], while the third row is for Pix2Pix with Convolution-SwitchNorm-LReLU/ReLU [32]. It can be found that the third row is obviously better than the second row with respect to both PSNR and SSIM, which proves the effectiveness of Convolution-SwitchNorm-LReLU/ReLU.

4.4. Discussions and Testing on Real Solar Image

In a real AIA image, overexposure region is related to two effects: primary saturation (PS) and secondary saturation (SS) [13]. PS refers to the fact that for intense incoming flux, CCD pixels loose their ability to accommodate additional charge. SS names the fact that PS causes charge to spill into their neighbors. Furthermore, overexposure often accompanies with diffraction, creating a star-like pattern in the images. However, there are no ground-truth images paired with the overexposure ones. Therefore, we train the mask-Pix2Pix model on synthesized database, where the saturated images cropped from the LSDO.

We further test our pretrained mask-Pix2Pix model on the flare image of Sep. 10, 2017 X8.2. The results are illustrated in Figure 6. The test result shows that the pretrained model recovers the intensity to some extent while files to recover the fine structure of flare.

5. Conclusions

We propose a mask-Pix2Pix network with a hybrid loss function to optimize the recovery of OERs in solar images. The proposed model can robustly handle all kinds of OERs regarding shape, size, and location in an image. In addition, the evaluations on each component of loss function indicate that the two incremental components beyond conventional cGAN have important contributions to the proposed model. Moreover, the investigation on Convolution-SwitchNorm-LReLU/ReLU also demonstrates that better performance can be achieved comparing to conventional Convolution-BatchNorm-ReLU. However, one limitation of our mode is that it may fail on highly textured areas, and large OERs, as illustrated in Figure 7, which will be further investigated in our future work.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request. And in the future, the data used to support the findings of this study will be published online.

Conflicts of Interest

The authors declare that they have no conflicts of interest.