Predicting Solar Cell Efficiency with Machine Learning – Part 2 -Regularization

In part 1, ANN and linear regression models were built with Tensorflow and Keras to predict the efficiency of crystalline Si solar cells when the thickness of front silicon nitride anti-reflection layer changes.

As also discussed in part 1, the dev error is larger than the training error. This suggests the presence of variance. This may be resolved with larger data set, regularization and/or better models.

Here I experimented with L2 regularization to the ANN model in part 1 to improve variance. However, this will be at the expense of the accuracy to the training data – The infamous bias-variance tradeoff. Nevertheless, this could still be further resolved by having larger training data, which will be shown in the later posts.

Initialization

For this test, weight values were initialized with a fixed random seed as followed:

From the original model:

model.add(layers.Dense(30, input_dim=2, activation='relu')) 
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='linear'))

Change to:

model.add(layers.Dense(30, input_dim=2, kernel_initializer=test_initializer, activation='relu'))
model.add(layers.Dense(10, kernel_initializer=test_initializer, activation='relu'))
model.add(layers.Dense(1, kernel_initializer=test_initializer, activation='linear'))

where

test_initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=101)

The training and dev errors become

Error
Train0.0399
Dev0.0900

L2 Regularization

L2 regularization are added to the hidden layers, but not the output layer. This is because the output layer has a linear activation function with only one node. Therefore, the effect from L2 regularization on the output layer will not be as significant as the ones applied to the densely connected hidden layers.

As shown in part 1, the neural network has 2 hidden layers. The first layer has 30 nodes, while the 2nd layer has 10 nodes.

L2 regularization strength is tuned to improve the variance performance. For this test, variance (%) is defined as:

 variance=\dfrac{train\ error - dev\ error}{train\ error} \times 100

To add L2 regularization to the original codes, we first defined the L2 strength that we would like to test:

#L2 regularization parameter for hidden layer 1
reg_para_s_1 = [0, 0.01, 0.05, 0.1, 0.5, 1, 5]

#L2 regularization parameter for hidden layer 2
reg_para_s_2 = [0, 0.01, 0.05, 0.1, 0.5, 1, 5]

Then, for 1st hidden layer, the code is changed from

model.add(layers.Dense(30, input_dim=2, kernel_initializer=test_initializer, activation='relu'))

to include L2 regularizers

model.add(layers.Dense(30, input_dim=2, kernel_regularizer=regularizers.l2(reg_para_1), kernel_initializer=test_initializer, activation='relu'))

Similarly, for hidden layer 2, from

model.add(layers.Dense(10, kernel_initializer=test_initializer, activation='relu'))

to

model.add(layers.Dense(10, kernel_regularizer=regularizers.l2(reg_para_2), kernel_initializer=test_initializer, activation='relu'))

Test result

Figure 1: Variance (%) with respect to L2 regularization strength for hidden layers 1 and 2
Figure 2: Mean absolute error of training data with respect to L2 regularization strength for hidden layers 1 and 2
Figure 3: Mean absolute error of development/validation data with respect to L2 regularization strength for hidden layers 1 and 2

As shown in Figure 1, variance expectedly decreases with L2 regularization strength. In addition, a more pronounced improvement is observed for the 1st hidden layer than the 2nd hidden layer.

This is because the 1st hidden layer has a larger number of nodes (20), and hence, the number of weights, if compared to the 2nd hidden layer (10). As a result, weight decay from L2 regularization will be more significant for layers with larger number of nodes.

As shown in Figures 2 and 3, both train and development errors increase with larger L2 regularization. The criteria for performance after L2 regularization is set as :

  • Smaller variance than the original model
  • Train error < 0.1
  • Dev error < 0.1

The above condition is roughly satisfied with L2 regularization strength presented in the table as followed. This improves the variance by about 20% absolute, but with higher train and development errors.

No regularizationWith L2 regularization
Layer 1 L2 strength00.05
Layer 2 L2 strength00.05
Train error0.03390.0593
Dev error0.09000.0996
Variance~62.3%~40.5%

Source code and data file

Source codes and data file on Github.

https://github.com/KengSiewChan/PVML