Parametric distributions from empirical data

Parametric distributions from empirical data#

Introduction#

Risk and reliability play a major role not only in engineering, but in insurance applications. In this group assignment, your team will derive some basic risk quantities both from parametric distributions and from empirical data.

As always, let us start by importing a few necessary libraries.

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

# Default config for the plots
plt.rcParams["figure.figsize"] = (8, 4.5)
plt.rcParams["axes.grid"] = True

Initial calculations#

A reliability engineer is hired by an insurance company for his knowledge of probability and statistics and his affinity with numerical reasoning. The company will start working in a new market where they want to introduce their products. They start with simple models to analyze this new market. The senior actuarial is sure that the hazard rate for the population is not constant. However, she wants to start testing some simple models for their products.

To begin, she wants to know \(P(T>95)\), that is the probability of surviving to 95 years assuming a constant failure rate of \(\frac{1}{18}\).

Task 1:

Compute the probability \(P(T>95)\) assuming a constant failure rate of \(\frac{1}{18}\).

lambda_exp =### YOUR CODE HERE ###  # hazard rate for t in years
t_target =### YOUR CODE HERE ### # target age t in years

S95 =### YOUR CODE HERE ###
print(f"The probability of surviving 95 years assuming a constant rate is {100*S95:.2f}%.")

Task 2:

Compute the probability \(P(T>95)\) assuming the data follows Weibull survival times with \(α=1.2\) and \(λ=\frac{1}{20}\).

Hint: The Weibull function is scipy.stats.weibull_min. Check how the density is defined in scipy.stats and adapt the parameters, if necessary.

alpha =### YOUR CODE HERE ###            # shape
lambda_weib =### YOUR CODE HERE ###      # lambda
scale =### YOUR CODE HERE ###

S95_weib =### YOUR CODE HERE ###
print(f"The probability of surviving 95 years assuming a changing rate is {100*S95_weib:.2f}%.")

Task 3:

Plot both hazard rates within the \([0,99]\) interval in years.

### YOUR CODE HERE ###

Now we have data!#

After some time, data becomes available to the team. It is provided by authorities in the following table where \(t\) indicates age in years and \(L(t)\) number of individuals alive at age \(t\).

t

L(t)

t

L(t)

t

L(t)

t

L(t)

0

1,023,102

15

962,270

50

810,900

85

78,221

1

1,000,000

20

951,483

55

754,191

90

21,577

2

994,230

25

939,197

60

677,771

95

3,011

3

990,114

30

924,609

65

577,822

99

125

4

986,767

35

906,554

70

454,548

100

-

5

983,817

40

883,342

75

315,982

10

971,804

45

852,554

80

181,765

The empirical density function may be estimated as:

\[ f(t)\approx \frac{\text{number of deaths during }[t,t +Δt)}{N \cdot Δt}=\frac{n(t+Δt)-n(t)}{N \cdot Δt} \]

The data is provided below:

Age = np.array([0, 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,
                70, 75, 80, 85, 90, 95, 99, 100], dtype=float)

Lives = np.array([1023102, 1000000, 994230, 990114, 986767, 983817, 971804, 962270,
                 951483, 939197, 924609, 906554, 883342, 852554, 810900, 754191,
                 677771, 577822, 454548, 315982, 181765, 78221, 21577, 3011,
                 125, 0], dtype=float)

Task 4:

Estimate and plot \(f(t)\) based on the available data.

### YOUR CODE HERE ###

Now we are interested in the reliability function. Based on the empirical data, this function \(\bar{F}(t)\) can be estimated empirically as the number of survivals at time t divided by the total population.

Task 5:

Estimate the reliability function \(\bar{F}(t)\) and the hazard-rate \(r(t)\) based on the available data.

# Your code goes here

Task 6:

Plot the computed hazard rate \(r(t)\) against the hazard rates you plotted in Task 3.

### YOUR CODE HERE ###

Task 7:

Discuss the hazard rate functions for the constant failure rate model and the Weibull model in comparison with the empirical (from data) estimation. What do you observe a) in the beginning of life and b) by the end of life?

Task 8:

Plot the reliability function \(\bar{F}(t)\) against the reliability functions of the distributions in Task 1 and Task 2.

### YOUR CODE HERE ###

Task 9:

Discuss the reliability functions for the constant failure rate model and the Weibull model in comparison with the empirical (from data) estimation.

It’s all about the money#

Now that you have analyzed the data, let’s think about some implications of the failure rate models we have on the life insurance pricing our company offers. Answer these questions in the report.

Task 10:

How should life insurance be priced by the company under the three different models? Consider insurance policies that change their premiums with age.

Task 11:

Think of other examples in engineering or everyday life and discuss what failure curves these examples follow. Explain why. Extra points for finding engineering examples of “bathtub” curves.

By Max Ramgraber, Renan Barros and Oswaldo Morales Napoles, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook