Machine learning-based estimation of spatial gene expression pattern during ESC-derived retinal organoid … – Nature.com


CNN architecture and dataset for estimating spatial gene expression patterns

Our model utilizes a CNN that takes a phase-contrast image as input and estimates a fluorescent image as output (Fig.1A). The typical input to a CNN is a two-dimensional (2D) image. This 2D image is passed through several convolution layers, each followed by a nonlinear activation function. The training parameters correspond to the weights of these convolution kernels and the biases. Our network has a U-Net-like architecture27, which is an encoder-decoder structure with skip connections. The embedded features from the encoder are passed through the decoder, which consists of upsampling and convolution layers to increase the resolution of the intermediate feature maps to obtain a fluorescent image as output.

In our model, the ResNet5028 was used as the backbone of the encoder. The size of the input image for the ResNet50 is (3times Htimes W). To use the pre-trained model of the ResNet50, gray-scale phase-contrast images were replicated in the axis of the channel to create three-channel images. At the first layer, a convolution with stride 2 is applied to the input image to generate features of size (64times frac{H}{2}times frac{W}{2}). The ResNet50 has 4 residual blocks and the size of the output features of these blocks are (256times frac{H}{4}times frac{W}{4}), (512times frac{H}{8}times frac{W}{8}), (1024times frac{H}{16}times frac{W}{16}), and (2048times frac{H}{32}times frac{W}{32}), respectively. These features are then concatenated to the decoder to exploit multi-scale information. The output of the decoder is a fluorescent image of size (1times frac{H}{2}times frac{W}{2}). Note that each convolution layer has a batch normalization (BN) layer and a rectified linear unit (ReLU) activation function, except for the final convolution layer, which has a sigmoid activation function to constrain the range of the output values between 0 and 1.

The network was optimized by minimizing the training loss computed on the output and corresponding ground-truth fluorescent images. The combination of mean squared error (MSE) and cosine similarity, which captures structural patterns from the entire image, was used as the training loss.

To train, validate, and test our model, we cultured retinal organoids derived from mouse ESCs using the SFEBq method10. In this culture, a GFP gene was knocked-in under the promoter of a master gene of retinal differentiation, Rax. Using this method, we obtained a dataset of a pair of phase-contrast image and fluorescent image of Rax during retinal differentiation (Fig.1B). Images were captured for 96 organoids at 4.5, 5, 6, 7, and 8days after the start of SFEBq, where each sample was captured as 14 Z-stack images. This resulted in a total of (96times 5times 14=6720) image pairs were obtained. These image pairs were divided into 5880, 420, and 420 samples for training, validation, and test, respectively. 84, 6, and 6 organoids were used for training, validation, and test, respectively; thus, each organoid does not appear in the different datasets. For data augmentation, we randomly flipped the input images vertically and horizontally during training. While the image resolution of both phase-contrast and fluorescent images is (960times 720), the (512times 512) regions where organoids appear were extracted.

To demonstrate our model, we applied it to 420 samples of the test data. As a result, the proposed model successfully estimated the spatial expression patterns of Rax from phase-contrast images during retinal organoid development (Fig.2). During development, multiple optic vesicles are formed through large and complicated deformations (Fig.2A). This process begins with a spherical embryonic body, with some portions of the tissue surface evaginating outward to form hemispherical vesicles, i.e., optic vesicles. Importantly, the resulting morphology of retinal organoids, especially optic vesicles, varies widely29. This process is known to be governed by the expression of the Rax gene (Fig.2B). That is, the Rax gene is gradually expressed in several parts of the tissue surface, so-called eye field, where cells differentiate from neuroepithelium into several types of retinal cells.

Estimated spatial Rax expression patterns during retinal organoid development. (A) Phase-contrast images from day 4.5 to day 8. (B) Captured fluorescent images of Rax as ground-truths. (C) Estimated fluorescent images with our model. (D) Error maps between captured and estimated images. The error metric was a squared error. The organoids in (AD) are identical. Scale bars indicate 200m.

Our model successfully recapitulated the above features of Rax expression (Fig.2C), i.e., the Rax intensity was relatively low and homogenous at days 4.5, 5, 6, and gradually increased around the evaginated tissue regions at days 7 and 8. Remarkably, the regions of high Rax expression were accurately estimated even in organoids with various morphologies. On the other hand, as the Rax intensity increases, especially around the evaginated tissue regions, the error of the estimated image from the ground-truth image increases with time (Fig.2D).

To quantitatively evaluate the accuracy of the estimation, we statistically analyzed the estimation results at each stage. To clarify whether the model can estimate Rax intensity in both samples with high and low Rax expression, each of the ground-truth and estimated fluorescence images was divided into two categories by the coefficient of variation of the foreground pixels in a fluorescent image at day 8 (Fig.3A). The samples in each group were labeled as positive and negative, respectively. For each of these categories, the mean and coefficient of variation of the pixel values were calculated (Fig.3BE). In calculating these values, the phase-contrast images were binarized to obtain foreground and background masks, and then computed using only the foreground pixels and normalized to those of the background pixels.

Statistical analysis of fluorescence at each developmental stage for positive and negative samples. (A) Histogram of coefficient of variation for foreground pixel values of fluorescent images at day 8. (B, C) Means of pixel values in positive and negative samples at each stage for ground-truth (green bars) and estimated fluorescent images (blue bars), respectively. (D, E) Coefficients of variation in positive samples at each stage for both ground-truth (green bars) and estimated fluorescent images (red bars), respectively. (F, G) Plots of ground-truth and estimated pixel values in positive and negative samples at each stage, respectively. Errors are 0% and 25% on the solid and dotted black lines, respectively. Error bars in (BE) indicate standard deviations.

Positive samples showed a gradual increase in mean and intensity over the days passed (Fig.3B). The negative sample, on the other hand, showed relatively low values from the beginning and did not change significantly over the days (Fig.3C). Similarly, the coefficients of variation increased in the positive samples but not in the negative samples (Fig.3D,E). These results indicate that the model successfully estimates the feature of the spatial Rax expression patterns during retinal organoid development, i.e., positive samples gradually increase Rax expressions and their heterogeneity, but negative samples do not. The intensity of the estimated images is relatively lower than the intensity of the ground-truth images in the positive samples and vice versa in the negative samples.

To clarify whether the model is capable to estimate intermediate values of the Rax expression, we analyzed the correlations between ground-truth and estimated values on foreground pixels at each stage, respectively (Fig.3F,G). The results show that in the positive sample (Fig.3F), the distribution of intensities is initially concentrated at low intensities and gradually expands to high intensities as the day progresses, with a wide distribution from low to high intensities. Similarly, in the negative sample, the luminance distribution is initially concentrated at low intensities, but does not expand as much as in the positive sample (Fig.3G). These results indicate that the model successfully estimated the plausible values across all pixel intensities, demonstrating the capability of our method to infer intermediate levels of gene expression. Notably, predicting Rax expression in the organoids at later stages, such as day 8 in our experiments, becomes more feasible for the model due to their characteristic morphologies. These distinct morphologies provide features that can be efficiently extracted by the convolution operators of the model.

To determine whether the estimated Rax expression patterns correspond to tissue morphologies, we quantified the spatial distribution of Rax intensity and the mean curvature along the tissue contour around each optic vesicle (Fig.4). For this analysis, four typical optic vesicles were selected from the positive samples and their curvature and Rax distribution were quantified. In this analysis, tissue contours were extracted and the radius of a circle passing through three points on the tissue contour was calculated as the inverse of the curvature. Moreover, the Rax intensity was measured as the average value along the depth direction from the tissue contour.

Correlation analysis of spatial Rax expression patterns and optic-vesicle morphologies. (A) Phase-contrast images. (B) Captured fluorescent images of Rax as ground-truths. (C) Estimated fluorescent images with our model. (D) Mean curvatures as a function of the distance along the organoid contour. (E) Captured and estimated fluorescent intensities of Rax along the organoid contour. The organoids in (AC) are identical and captured on day 8. The mean curvatures and fluorescence in (D, E) are for the regions indicated by the red line starting from the red dot in (A). Scale bars indicate 200m.

Optic vesicles are hemispherical, with positive curvature at the distal portion and negative curvature at the root (Fig.4A,D). The Rax intensity is continuously distributed around each vesicle, being highest at the distal part and gradually decreasing toward the root (Fig.4B,E). Furthermore, because the test images were taken with a conventional fluorescence microscope, structures above and below the focal plane are included in each image. Therefore, although some images have multiple overlapping vesicles (e.g., samples iii and iv), the model successfully estimated the Rax intensity of the overlapping regions as well.

MSE is commonly used as the training loss for training regression models. In addition to MSE, this model also uses cosine similarity, which can capture structural patterns from the entire image. To analyze the effect of cosine similarity on the estimation accuracy, we tested the model with different weights for both error metrics (Fig.5). The trained models were evaluated with MSE for each test dataset on different days (Fig.5A). The results demonstrated that cosine similarity improved the estimation accuracy at the early and intermediate stages, such as from day 4.5 to day 6. At these stages, the intensity in the differentiated region is weak, making it difficult for the network to capture structural patterns using MSE alone. Cosine similarity, on the other hand, enabled the network to learn the patterns from the weak intensity by calculating the correlation between the normalized ground-truth and the estimated images (Fig.5B). Therefore, our model has the capability to achieve the best estimate at different stages with appropriate weight balancing.

Effects of the balance of training loss on estimation accuracy. (A) Mean squared error at each stage with different hyperparameters, where bold and underlined numbers stand for the best and second best results on each day, respectively. (B) Examples of estimated fluorescent images at days 6 and 8 with different hyperparameters. The MSE of each estimated image is described in the upper left. The results with the lowest MSEs are surrounded by the red boxes. Scale bars indicate 200m.

See original here:
Machine learning-based estimation of spatial gene expression pattern during ESC-derived retinal organoid ... - Nature.com

Related Posts