Aggregate claim models with one-way and two-way dependence among individual claims

Motivated by some real life correlations among insurance claims, we consider three aggregate claim models with dependence in this paper. Model one considers the dependence caused by a common index among indexed insurance benefits; model two takes into account the correlation arisen from common fixed costs; model three covers both types of dependence. Two random variables, Y1 denoting a random index and Y2 denoting a random cost, form the center part in the above three dependence models and detailed discussions are given on how the aggregate claims amount interacts with these sources of dependence. Theoretical results of these aggregate claim distributions are derived and algorithms for computational purposes are also provided. Some numerical results are presented for the compound Poisson case together with discussions and comparisons regarding the three dependence cases.


Introduction
One of the major research interests in actuarial science field is to model aggregate claims amount for various insurance business.The collective risk model is one well-known and widely used approach.A convenient yet unrealistic assumption is to assume independence between number of claims and individual claim amounts as well as among individual claim amounts.To make the collective risk models fit real insurance claims data better, researchers have been developing models with various types of dependence in the past few decades.
Early research outputs on dependence among numbers of claims include [3], who considered a general method of constructing a vector of p (p ≥ 2) dependent claim numbers from a vector of independent random variables, and derived formulas to compute the aggregate claim distribution for the book of p dependent classes of business.Motivated by these papers, researchers have developed collective risk models with dependence embedded in number of claims, for example [6], [10], [13], [18], [19], [20] and [22].Readers can also refer to Chapter XIII of [7] for an informative survey.
Dependence between claim arrivals and claim amounts are also among the focuses of researchers in this field, and various types of dependence models have been proposed and employed in risk theory, see [1], [8] and [9] for example.
Another type of dependence is grounded on individual claim amounts only.Multivariate distributions and copulas have all been employed to model such correlations in the literature.[12] and [14] studied the stop-loss premiums 469 of insurance portfolios with dependent risks and [11] examined stochastic bounds on sums of dependent risks.[2] used a simple idea of mixing to establish dependence among claim sizes and among claim inter-arrival times in collective risk models and obtained a number of explicit formulas for ruin probabilities and related quantities.More recently, [16] derived closed-form expressions for the probability distribution function of aggregated risks with multivariate dependent Pareto type II distributions proposed by [4] and [5], which is widely used in insurance and risk analysis.
In this paper we shall examine the aggregate claim distributions when the individual claim amounts follow a few simple dependence structures.These dependence structures are motivated by a couple of real-life insurance claim phenomena, i.e. the indexed insurance benefits and fixed costs in claim settlements.One typical implicit index is inflation, which affects almost all kinds of insurance products.The inflation risk is one of the most common risks that affect many industries, in particular the insurance business with long tails.Given a level of inflation, the individual and aggregate claims amount can be modeled as usual by scaling.However, when addressing the risk of inflation, actuaries also need to take into account the randomness of future inflation rates.This randomness not only changes the overall scale of insurance claims, but also increases the overall variance of the aggregate claims.Therefore, when modeling the aggregate claims amount we should introduce a common multiplicative factor, which is a random variable itself, if the individual claims generated by an insurance portfolio are explicitly or implicitly referring to a common economic index.Similarly, when settling claims there could be some common fixed costs that also vary in years, which lead to a common additive factor.Both phenomena bring dependence into the aggregate claim model alone or together.
Our main aim of this paper is to illustrate the impact of the above mentioned real-life dependence on aggregate claim distributions from computational point of view, so we shall only adopt discrete settings for the random variables involved in our models.The continuous case is also interesting but it would be much harder to develop recursive algorithms as in the discrete case.In practice, the method of discretization can be used to bridge the two cases nicely.
This paper is organized as follows.Three dependence structures are defined in Section 2 with some preliminary results.Some computational aspects are discussed in Section 3 where algorithms are provided to calculate the aggregate claim distributions.In Section 4, some numerical examples are given with detailed discussions.

Three Dependence Structures
In this section we consider three types of dependence structures: one with a common multiplicative factor, one with a common additive factor and one with common multiplicative and additive factors.

A Multiplicative Dependence
The first dependence structure is built on a common multiplicative factor: where N is a counting random variable (r.v.) denoting the number of claims with a probability function (p.f.) p n , n ∈ N; X i , i = 1, 2, . . ., are independent and identically distributed (i.i.d.) integer-valued random variables (r.v.'s) following a common p.f. f (x), x ∈ N; Y 1 is a discrete r.v. which has a p.f. g 1 (y), y ∈ A where A is a discrete set of values in R + .When N = 0, S 1 = 0. Assume that N, {X i } i∈N + and Y 1 are all independent of each other.For the purpose of convenience, we denote X a generic r.v. of {X i }.Further, we define f (z) = ∑ ∞ x=0 z x f (x) to be the probability generating function (p.g.f.) of X.Similarly, p(z) is the p.g.f. of N .
The model ( 1) is motivated by an insurance portfolio that pays indexed insurance benefits.When the individual indexed benefits X i Y 1 , i = 1, 2, . .., are linked to the same random index Y 1 , then the type of dependence emerged has the form of (1).In this context, S 1 denotes the aggregate indexed claims amount and one can easily see that it can be rewritten as S 1 = Y 1 Z where Z := ∑ N i=1 X i is the original (non index-linked) aggregate claims amount with i.i.d.individual claims X i , i = 1, 2, . ...Here Y 1 and Z are independent of each other.
Let S 1 and Z have p.f.'s h 1 (s) and h Z (x), x ∈ N, s ∈ B, respectively, where Then we have: Regarding the level of dependence among the individual indexed claim amounts X i Y 1 , i = 1, 2, . .., measured by their covariances and correlation coefficients, we have, for i, j ∈ N + and i ̸ = j, where ) and CV Y1 are the coefficients of variation of X and Y 1 respectively and ρ 1 is the correlation coefficient of each pair of individual indexed claim amounts.
We propose a general method regarding h 1 (s), s ∈ B, that can be useful for calculating its values.For y ∈ A, we define Let { 1 h Z (x; y)} x∈Ny,y∈A denote the p.f. of the compound distribution consisting of {p n } and { 1 f y (x)}, and ) when s ∈ N y .Then we have, for s ∈ B and s > 0, = ∑ y∈A;y≤s with h 1 (0) = h Z (0), where ⌊x⌋ is the floor function of x.The second equality in (5) holds because the largest value of Y 1 that does not void {Y 1 Z = s} is s.The result (7) specifies more clearly how to judge whether or not s ∈ N y when programming.

An Additive Dependence
The second dependence structure is built on a common additive factor: where N , {X i } are the same as in model (1).Y 2 is a r.v.independent of N and {X i }, and has p.f. g 2 (y), y ∈ N. Note that S 2 = Z + N Y 2 but here Z and N Y 2 are dependent of each other.Different from the model ( 1), ( 8) is motivated by an insurance portfolio with fixed costs (like paperwork, admin or transaction costs etc) on claim settlements.In a general setting, one could use a r.v. to denote this fixed cost when settling a claim that varies year by year.As settling each individual claim will incur the same fixed costs in the same time period, a dependence arises as in model (8).This is also why we assume Y 2 only takes non-negative integer values.
Similarly, we can obtain the following results based on model ( 8): Again, we consider the level of dependence among the individual claim sizes with fixed costs, for i, j ∈ N + and Let h 2 (s) be the p.f. of S 2 .To calculate h 2 (s), we shall derive a general result based on model (8).Similar to 1 f y (x), we define g 2 (n; y) as the p.f. of the r.v.nY 2 , n ∈ N + .One can see that In particular, g 2 (0; y) is a degenerate p.f. at the value 0. Then we have the following result, for s ∈ N + : where f * n (x) is the n-fold convolution of f (x) and f * n * g 2 (n; x) is the convolution of f * n (x) and g 2 (n; x).Also, Alternatively, let 2 f y (x) be the p.f. of X + y.Define set N y+ = {y, y + 1, y + 2, . ..}, y ∈ N, then we have y∈N denote the p.f. of the compound distribution consisting of {p n } and { 2 f y (x)}, and clearly 2 h Z (x; y) = 0 for all 0 < x < y.Then we have the second result regarding h 2 (s), for s ∈ N + , Obviously, result ( 14) is of higher computational importance than (13) due to its finite summation.

A Two-way Dependence
When both the random index and random fixed costs are in place, we propose the third dependence structure as follows: where N , {X i } i∈N + and Y j , j = 1, 2, have been defined in the above two models, and here they are all independent of each other.Under model (15), the individual claim sizes X i , i ∈ N + , are affected not only by the common multiplicative factor Y 1 , but also by the common additive factor Y 2 , so it forms a two-way dependence structure.Obviously, S 3 has a relationship with Z in the form of S 3 = Y 1 Z + N Y 2 where (1) and ( 8) are two special cases when we take Y 2 ≡ 0 and Y 1 ≡ 1 respectively.Firstly, we show some basic results: For i, j ∈ N + and i ̸ = j, we have where . Because of the two-way dependence existed in model (15), the valuation spaces of Y 1 and Y 2 , i.e. the positive real-valued discrete set A and the non-negative integer set N, are mixed together, which brings complications into the computational aspect of the new model.The formulas provided in the previous two models, (6) and ( 14), can not be employed directly and more assumptions are needed for the purpose of simplification.
To calculate the p.f. of S 3 , denoted by {h 3 (s)} s∈B * where B * is the domain of S 3 and yet to determine, we shall restrict the domain of Y 1 to a set of positive rational numbers, denoted by A r ⊆ Q + and Q + denotes the whole set of positive rational numbers.In addition, we assume that A r only has a finite number of elements, denoted by M .Then we can express all its elements in terms of fractions in simplest form, i.e.A r = { δi ηi , i = 1, . . ., M } where δ i , Conditional on Y 1 = δi ηi and Y 2 = y, y ∈ N, i = 1, . . ., M , our model (15) becomes In the above new expression of S 3 , ∑ N j=1 (δ i X j + yη i ) is an integer-valued random variable and so the domain of S 3 , B * , can be defined as Note that the real domain of S 3 is a subset in the above defined set B * that contains a number of elements with zero probability mass.However, these unnecessary elements can be easily eliminated when conducting numerical calculations.
For x ∈ N, we define the p.f. of δ i X j + yη i by Clearly, 3 f 1,1,y (x) = 2 f y (x) and 3 f δ,η,0 (x) = 1 f δ (x).Let { 3 h Z (x; δ i , η i , y)}, x ∈ N, denote the p.f. of the compound distribution consisting of {p n } and { 3 f δi,ηi,y (x)}, then we have, for s ∈ B * and s > 0, with Remark.An alternative structure of the two-way dependence can be defined as which allows the interaction between factors Y 1 and Y 2 .One interpretation of this slightly different dependence structure is that, in practice, the fixed costs normally are affected by inflation as well.However, this version is easier to compute than (15), as one can deal with Z + N Y 2 first and then incorporate Y 1 in the second step.Therefore, we shall only discuss the version given in (15) within the rest of this paper.

Computational Aspects of the Models
From the key results ( 6), ( 14) and ( 21) obtained in Section 2 we can see that to calculate the aggregate claim distributions h i (s), i = 1, 2, 3, we need to calculate the aggregate claim distributions h Z (x), 2 h Z (x; y) and 3 h Z (x; δ i , η i , y) first, which are all compound distributions under independence setting.In the actuarial literature, there have been many papers discussing how to compute the aggregate claim distributions under independence setting and recursive calculation is one typical approach.Based on the well-known Panjer's recursion, researchers have developed a number of generalized recursive methods for various types of claim number distributions, to name a few, the (a, b, 0) class, the (a, b, 1) class and the (A, B, 0) class, etc. Useful references include [15], [17], [21] and the references therein.
In the following, we shall discuss how to calculate the distributions h i (s), i = 1, 2, 3, using the (a, b, 0) class as an example, where The only non-trivial distributions in this class are Poisson, binomial and negative binomial.

The Calculation of h 1 (s)
According to the Panjer's Recursion, h Z (x) satisfies with h Z (0) = p(f (0)).Bear in mind that when dealing with an infinite series, we often omit the tail piece according to certain accuracy requirements.This argument applies within the rest of this paper.
To calculate h 1 (s), s ∈ B, from ( 6) we can see that, for any given y ∈ A, y ≤ s, we need to check whether s is a multiple of y, i.e. s ∈ N y .If the answer is yes, then the probability h Z ( s y ) associated with the y value is involved in h 1 (s) and can be determined using the Panjer's recursion.Quite often, the number of relevant y values for each s ∈ B is very small, so the additional computations from h Z to h 1 is very reasonable.For example, if A = {1.05,1.1, 1.15}, for s = 10.5 and s = 23.1 we have • h 1 (10.5)= h Z (10)g 1 (1.05); and More importantly, the computation of h 1 (s) does not require values of h Z (k) for k > s y * where y * is the minimum realization of Y 1 .Also, when Y 1 ≡ 1, h 1 (s) = h Z (s) for s ∈ N. A suggested algorithm to calculate h 1 (s) is given below: Step 1.Using N and the given domain A to generate domain B. There should be no duplicate values in B.
Step 2. Calculate and store h Z (x), x ∈ N using appropriate methods.In this case, the Panjer's recursion is an obvious choice.
Note that to obtain H 1 (s), s ∈ N, we need to calculate

The Calculation of h 2 (s)
The calculation of h 2 (s) is based on 2 h Z (x; y) that involves a modified individual claim amount distribution 2 f y (x).
It is actually a type of truncation by letting 2 f y (x) = f (x − y) for x ∈ N y+ .When {p n } belongs to the (a, b, 0) class and y ∈ N + , 2 h Z (x; y) satisfies with 2 h Z (0; y) = p 0 and 2 h Z (x; y) = 0 for x = 1, . . ., y − 1.Further, when Assume that Y 2 takes values in D ⊆ N. When D has only limited number of elements, a reasonable algorithm is given below: Step 1.For each y ∈ D, generate 2 f y (x), x ∈ N.
Step 2. Calculate and store 2 h Z (x; y), x ∈ N using the Panjer's recursion.Note that we need to store 2 h Z (x; y), x ∈ N, for every y ∈ D.
Step 3.For each s ∈ B, calculate

The Calculation of h 3 (s)
The calculation of h 3 (x) is much more complicated, which is a combination of the above two cases.Similar to Model 2, we assume that Y 2 takes values in D ⊆ N where D has only limited number of elements.According to (20) and (21) we propose an algorithm as follows: Step 1.For the given domain of Y 1 , A r , determine δ i and η i for all i = 1, . . ., M .

Numerical Studies
In this section, we shall provide some numerical examples to demonstrate the impact of the three dependence structures proposed in Section 2 on the aggregate claim distributions.To make the following numerical discussions more reasonable, we shall propose a condition of fairness, i.e. the aggregate claim models under discussion all have equal means.In the following, we shall consider four aggregate claim models, S i , i = 0, 1, 2, 3, where S 1 , S 2 , S 3 have been defined in Section 2 and S 0 = Z, which is the aggregate claims amount with i.i.d.individual claims.We first make the following assumptions: • the i.i.d.random variables {X i } in all models follow the geometric distribution with p.f.Let q i be the geometric parameter in model S i , i = 0, 1, 2, 3, and let q 0 = 109 110 , q 1 = q 2 = 0.99, q 3 = 0.989; Under the above assumptions one can easily verify that E[S i ] = 1100, i = 0, 1, 2, 3.As mentioned in Section 2.1, Y 1 is likely to be some real-life index when considering the indexed insurance benefits, so we suggest a distribution with a finite support for Y 1 .Bear in mind that here we are considering the aggregate claims amount for a given time period, so the distribution of the index may change over time.Also, Section 2.2 gave one motivation for Y 2 , i.e. the random fixed cost on claim settlements for an insurance portfolio.Again, a distribution with a finite support should be a reasonable assumption.To better demonstrate the impact of different level of dependence on the aggregate claim distributions, we consider the following four cases: • First of all, the non-smooth graphs of h 1 and h 3 look strange comparing with those of h 2 .It is the consequence of the multiplicative factor Y 1 , as different combinations of Z and Y 1 values could obtain the same value for S 1 .An example has been given in Section 3.1.Therefore, the probability masses of Z have been reallocated under the framework of S 1 or S 3 and resulted in the non-smooth dot plots of h 1 and h 3 .On the contrary, Y 2 has only a shifting effect on the aggregate claim distributions, so it did not affect the smoothness in the graphs of h 2 .
• The impact of the multiplicative factor Y 1 on the shape of p.f.'s is vertical.For Case 3 and 4 which have more variable Y 1 values, the p.f.'s of S 1 and S 3 change a lot vertically comparing with the other two cases.Also, together with the additive factor Y 2 , the p.f. of S 3 fluctuate a lot such that the probability masses of h 3 almost fill in certain regions.The number of values that S 3 can take given S 3 ≤ s is much bigger than the maximum value s.
• No matter how fluctuating the p.f. curves are, the c.d.f.'s are all smooth functions.Under the presumed numerical cases and the scale of the figures, the differences among the four c.d.f.'s are not significant in Figure 1-4.
• To better show the relationships among the four c.d.f.'s we show some detailed graphs for Case 1 and Case 4, which are the cases with the biggest difference, in Figure 5 and 6 respectively.One can see that for smaller aggregate claims amount, under our model assumptions, the independent model S 0 tends to underestimate their associated probabilities if the multiplicative dependence exists; it tends to overestimate the associated probabilities if either the additive or the two-way dependence exists.Figure 5 and 6 also show that the impact of the additive factor Y 2 is more obvious when the multiplicative factor Y 1 has a small variance, as the curves for S 0 and S 1 (without additive factor) are more separated from S 2 and S 3 (with additive factor) in Case 1 than in Case 4.
• Figure 6 shows the same trends as the ones shown by Figure 5. Due to the increased variance of Y 1 , its impact on the c.d.f's is more easily seen.Example 2. To further our findings in the previous compound Poisson case, we explore the compound negative binomial case in the following.Assume that N follows a negative binomial distribution with size 2 and probability  For the purpose of illustration, we present some graphs for Case 1 and 4 only.Based on the given algorithms in Section 3 we calculate the p.f. of S i , i = 0, . . ., 3, i.e. h Z and h i , i = 1, 2, 3, in R. The Panjer's recursion with

1 6 .
It can be verified that E[N ] = 10 and V [N ] = 60.Table2summarises the empirical results regarding the aggregate claim amounts S i , i = 0, . . ., 3, under the four cases specified in Example 1.One can see that the aggregate claims amounts S i , i = 1, 2, 3, have much higher variances under the compound negative binomial assumption than under the compound Poisson assumption, but the correlation among claim sizes is not affected by the number of claims assumption due to the independence between number of claims and individual claim sizes.

Table 1 .
Some preliminary results The details of how to conduct the computational tasks have been given in Section 3, so we need not repeat here.Based on the calculated results, we draw the following Figures1-6that visually present the p.f. and cumulative distribution function (c.d.f.) of S i , i = 0, ..., 3, under the above presumed four cases.According to the descriptive statistics summarised in Table1, one can see that the multiplicative factor Y 1 in Case 3 and 4 have higher variations and the additive factor Y 2 in Case 2 and 4 have higher variations.Overall, the variances of S i , i = 1, 2, 3, in Case 1 are the lowest and the variances of S i in Case 4 are the highest.One can see from the figures that:

Table 2 .
Some preliminary results