Report#

Task 1#

1) Why should the MinMaxScaler be fitted on the training data only? (0.5 point)

It is important that the validation data is not touched during training, so to avoid leakage of information we fit the normalizer only on the training set.

2) Why is it crucial that the exact same scaler is used to transform the validation dataset? (0.5 point)

During training, the network will get used to seeing a certain range of input features and will learn to understand patterns in the data through that lens. If we use a different normalizer later (e.g fitted on the validation dataset) the network will produce confusing results. It is like the network is trained to understand one “language” which should be kept the same at all times.

3) The train_model function tracks the MSE loss for both training and validation datasets. In a case of extreme overfitting, what would you expect to happen with the training loss? What about the validation loss? (1 point)

We would expect the training loss to decrease, going all the way to zero in extreme cases. The validation loss should remain high throughout training, or first show a small decrease and then climb back up as the model overspecializes.

4) Looking at the loop in train_model, training progresses all the way until n_epochs is reached, regardless of how the validation loss is evolving. This makes the training prone to overfitting. Briefly explain in words how you could modify the code above in order to implement Early Stopping. (1 point)

To implement early stopping we could keep track of the minimum val_loss ever seen by the model. If there is no improvement for a number of epochs training can be aborted and the model with minimum val_loss can be returned.

Bonus answer (+1pt): Right now the function does not check the validation error at every epoch, but only at intervals that depend on n_epochs. To make early stopping work optimally, val_loss would have to be computed after every epoch.

Tasks 2 and 3#

5) Look at how both loss curves behave in the plot above. Is the model overfitting? Is the network learning during the training process? Is there a need to use more epochs for this particular model? (0.5 point)

Both losses are decreasing with epochs, so the model is learning something. Both quickly stabilize, so there is no need for more epochs in this case.

6) Look at the parity plots above for the one-feature model. We see that the model is not doing well. Is the model overfitting or underfitting? Why is this happening? Consider the plotted dataset at the top of the notebook to justify your answer. (1 point)

The model is underfitting, since both training and validation errors are quite high. Looking at the plotted dataset at the top, it is clear just looking at rainfall cannot tell us much about flooding probability. Picking for instance rainfall_mm = 2000 we see that there are cities in the dataset with flooding probabilities ranging all the way in \([0,1]\). The main insight is that a single feature is not enough to uniquely identify a city.

7) Are there cities for which even this model gives reasonable flooding probability predictions? Use your parity plots to motivate your answer. (0.5 point)

Yes, there are still cities close to the diagonal line of the parity plot.

Task 4#

8) Looking at the new parity plots, what suggests this model performs better than the previous one? (0.5 point)

Most data points lie along the diagonal line, suggesting that errors are very low.

9) Comparing training and validation parity plots, is this new model suffering from overfitting? (1 point)

The model is not suffering from overfitting, as both training and validation parity plots behave in a similar way and show low errors.

Task 5#

10) Looking at all of your new plots, what are the signs this new model is suffering from overfitting? What is the root cause for the overfitting? (1.0 point)

Training loss is essentially zero, this can be seen in the loss \(\times\) epochs curve and also in the parity plot points lying exactly at the diagonal. Validation loss is slightly increasing with epochs and there is significant scatter around the diagonal in the validation parity plot. We can also identify overfitting by noticing that the ‘gap’ between training and validation losses is widening with the epochs. Just the presence of a gap is in principle not a problem, but a gap that is widening significantly is a sign of overfitting.

The root cause of overfitting here is having a very complex model combined with a very small training dataset.

Task 6#

11) Given a comprehensive list of layer sizes and numbers, we would in theory expect the top left region of the heatmap to have high validation errors. Why is that? (0.75 point)

The top left region of the heatmap should correspond to very small models. Errors will tend to be high due to underfitting.

12) Following up on the previous question, we would also expect the bottom right region of the heatmap to have high validation errors since those models would suffer from overfitting. Why do we not see it happening here? Think about what changed between Task 5 and Task 6. (0.75 point)

The bottom right region of the heatmap should correspond to very large models. Validation errors should be high for those models, but we do not see it here because we are now training with a very large dataset again, in contrast to the model in Task 5.

By Iuri Rocha, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook