System Maintenance Using Several Imperfect Repairs Before a Perfect Repair

Allowing several imperfect repairs before a perfect repair can lead to a highly reliable and efficient system by reducing repair time and repair cost. Assuming exponential lifetime and exponential repair time, we determine the optimal probability p of choosing a perfect repair over an imperfect repair after each failure. Based on either the limiting availability or the limiting average repair cost per unit time, we determine the optimal number of imperfect repairs before conducting a perfect repair.


Introduction
An imperfect repair has earned its popularity as an important maintenance strategy since the 1980s. See [2], [3], [6], [7], [8], [17], [18], etc. A perfect repair is certainly desirable because it returns the failed system to a perfect state "as good as new." But a perfect repair is usually costly and time consuming. On the other hand, an imperfect repair is a more economical and faster way to revive the system to a functioning-though weaker-state. The state is weaker because an imperfect repair shrinks the next lifetime and/or extends the next repair times. A balanced strategy, which utilizes the advantages of both perfect repair and imperfect repair, is vital to maintain a reliable and efficient system.
A well-studied method of combining perfect repair and imperfect repair is to permit the option to conduct a perfect repair on a failed system with probability p, and an imperfect repair with probability (1 − p). This is the so called (p, q) rule method or BP model that was first proposed by Brown and Proschan ([7]) in 1986 . Later Block et al. ( [5]) generalized the (p, q) model to incorporate the age t of the component so that the probability of perfect and imperfect repair is state-dependent, resulting in the (p(t), q(t)) model. Shaked and Shanthikumar ([22]) considered the multivariate case of the (p, q) model. Later there came the [p(n, t), q(n, t), s(n, t)] model ( [14]) which considers parameter n, the number of failures since replacement. This model considers a third possibility that the repair could be unsuccessful. These models have been very well studied since then, incorporated with various factors and extended to more complicated systems ( [9], [10], [11], [16], [19], [20], [21], [22], [25], etc.). In these models, the probability p is fixed. Lim et al. ( [12]) proposed the bayesian imperfect repair model, in which p is not fixed but a random variable with a prior distribution. In practice, however, it is hard to pre-specify p and there hasn't been a efficient solution to determine p.
With this in mind, we are motivated to obtain a calculable formula of p to help us decide which repair mode to choose when system fails. In literature, it has been popular to consider cost and reliability measure such as 177 availability to optimize system maintenance model, see [23] and [24] for a literature review of various system optimization models. In this paper, we focus on the (p, q) model for a one-unit system having an exponential lifetime and an exponential repair time, the goal is to determine p based on two criteria-the limiting availability and the limiting average cost per unit time.
There have been many studies on system maintenance with options of imperfect repairs before a perfect repair. For example, Biswas et al. [4] investigated a periodically inspected system that is maintained through a fixed number of imperfect repairs before it is replaced or perfectly repaired. Wang and Pham [26] considered imperfect repairs after the first few (a predetermined number of) failures, and then they permitted a perfect repair done in a negligible amount of time (or they replaced the system with a brand new one). Badie and Berrade [1] designed a bivariate policy that takes into account the inspection times along with the allowed number of failures before a perfect repair. They obtained the optimal time interval between successive periodic inspections as well as the optimal number of failures prior to the final perfect repair.
We, on the other hand, determine the optimal probability p of choosing a perfect repair over an imperfect repair after each system failure. Thus, in our formulation, the number of imperfect repairs is endogenous to the problem (and not exogenously given). Indeed, our optimal choice of p after each failure, based on either of the two criteria mentioned above, leads to determining the optimal number of imperfect repairs before conducting a perfect repair. Moreover, imitating reality, we do not treat any repair time as negligible.
We begin with a list notation. In section 2, we formulate the problem, specifying the assumptions and detailing the stochastic process. Section 3 derives the formulas for the limiting availability, and maximizes it by determining the probability of choosing a perfect repair after each failure. Specifically, Proposition 3.1 states the conditions for choosing a perfect repair over an imperfect repair after each failure. Section 4 solves the optimization problem of determining the number of imperfect repairs before a perfect repair either to maximize the limiting availability, or to minimize the limiting average repair cost per unit time. Section 5 gives numerical results illustrating the optimization procedure. Section 6 concludes the paper. the ordinal number of the permitted imperfect repair, k = 1, 2, ..., K K * : optimal number of imperfect repairs to maximize limiting availability K # : optimal number of imperfect repairs to minimize limiting average repair cost per unit time λ k : expected lifetime after (k − 1) imperfect repairs and before k-th failure µ k : expected perfect repair time after k-th failure ν k : expected imperfect repair time after k-th failure p k : probability of choosing a perfect repair after k-th failure b 0 : overhead cost per unit time for a perfect repair b 1 : increment in perfect repair cost per unit time after each failure B k : cost of perfect repair per unit time after k-th failure; assume overhead cost per unit time for an imperfect repair c 1 : increment in imperfect repair cost per unit time after each failure C k : cost of imperfect repair per unit time after k-th failure; assume C k = c 0 + c 1 k N: state of the system at time t = 0, when the system is brand new P k : state of the system undergoing a perfect repair after k-th failure I k : state of the system undergoing an imperfect repair after k-th failure U k : state of the system operating after k-th imperfect repair 178 SYSTEM MAINTENANCE USING SEVERAL IMPERFECT REPAIRS BEFORE A PERFECT REPAIR

Statement of the problem
We consider a one-unit system whose lifetime has an exp(λ 1 ) distribution, where λ 1 denotes the mean lifetime, and 1/λ 1 denotes the constant failure rate. When the system fails for the first time, we have a choice to carry out either a perfect repair (with probability p 1 ), or an imperfect repair (with probability 1 − p 1 ). We want to determine the optimal value of p 1 . Following the first system failure, if a perfect repair is undertaken, then after an exp(µ 1 ) perfect repair time, the system becomes as good as new. On the other hand, following the first failure, if an imperfect repair is performed, then after an exp(ν 1 ) imperfect repair time, the system becomes functional, but weaker; and it operates for an exp(λ 2 ) duration (which we call its second lifetime) until it fails again. At that time, we must decide between a perfect repair (with probability p 2 ) and an imperfect repair (with probability 1 − p 2 ). Then this pattern repeats (we allow at most K imperfect repairs), until eventually a perfect repair is chosen, following which the system becomes as good as new (the original state). This epoch is called the renewal time. The duration between successive renewal times is called a cycle time. The stochastic behavior of the system within each cycle is exactly the same. Thus, we model the stochastic behavior of the system as a renewal process.
A perfect repair is completely thorough, in the sense that it restores the system to a functionally brand-new state. However, the duration of a perfect repair is much larger than that of an imperfect repair; that is, µ 1 >> ν 1 . Also, the cost of a perfect repair per unit time, in terms of actual cost to repair and the loss of revenue due to stoppage of production while under repair, is much higher than that of an imperfect repair; that is, B 1 >> C 1 . On the other hand, though an imperfect repair is much faster and costs less, after an imperfect repair, the system will not function for the same duration as a new system. The imperfectly repaired unit will have an overall shorter lifetime. Moreover, after an imperfect repair, the next time the system fails, a repair (perfect or imperfect, whichever we decide to do) will take on average longer time compared to the corresponding times after each previous failure.
If an imperfect repair has been chosen each time after the first (k − 1) failures (where k = 1, 2, ...), then the k-th lifetime of the system is exp(λ k ), where λ k = α k−1 λ 1 . After the k-th failure, if a perfect repair is chosen (with probability p k ), then the perfect repair time is exp(µ k ), where µ k = β k−1 µ 1 , with β > 1 being the expansion factor for perfect repair time. On the other hand, after the k-th failure, if an imperfect repair is chosen (with probability 1 − p k ), then the imperfect repair time is exp(ν k ), where ν k = γ k−1 ν 1 , with γ > 1 being the expansion factor for imperfect repair time. And so on, until eventually a perfect repair is chosen, at the completion of which the system is restored to the brand-new state. More generally, our method works under milder conditions: We first determine the optimal value of p k , the probability of choosing a perfect repair over an imperfect repair after the k-th failure, given that (k − 1) imperfect repairs have already been chosen since the last perfect repair. The optimal value is chosen with two criteria in mind-maximizing the limiting availability, and minimizing the limiting average repair cost per unit time. It turns out that, if we wish to maximize the limiting availability, then the optimal value of p k is 0 for k = 1, 2, . . . , K * , and 1 for k = K * + 1, K * + 2, . . . , K. Likewise, if we wish to minimize the limiting average repair cost per unit time, then the optimal value of p k is 0 for k = 1, 2, . . . , K # , and 1 for k = K # + 1, K # + 2, . . . , K. Thus, we endogenously determine the optimal number of imperfect repairs (K * or K # depending on the criterion we choose to optimize) before conducting a perfect repair.
3. Choosing p k to maximize limiting availability Figure 1. Transition diagram when at most K = 1 imperfect repair is permitted When K = 1, only one imperfect repair is allowed. We can choose to forgo this option and conduct a perfect repair after the very first failure. Or, we can choose to exercise the option, and carry out an imperfect repair after the first failure. Thereafter, we must necessarily conduct a perfect repair after the second failure. Suppose that we assign a probability p 1 of choosing a perfect repair and a probability (1 − p 1 ) of choosing an imperfect repair, after the first failure. We must determine an optimal value of p 1 to maximize the limiting availability. Let us, therefore, express the limiting availability as a function of p 1 .
When K = 1, the stochastic process has five states: N , P 1 , I 1 , U 1 and P 2 . See Figure 1. Let us denote the steady state probabilities as P N , P P1 , P I1 , P U1 and P P2 . Let P be the vector of these steady state probabilities; that is, P = (P N , P P1 , P I1 , P U1 , P P2 ) ′ . To find P, we solve the state equations M · P = 0; and P ′ 1 = 1 where 1 is a vector of all entries one, and M is the transition rate matrix We express each steady state probability as a multiple of P N as follows: Substituting these multiples in P ′ 1 = 1, we obtain Hence, the limiting system availability simplifies to As p 1 varies between 0 and 1, A 1 (p 1 ) takes values between (λ 1 + λ 2 )/(λ 1 + ν 1 + λ 2 + µ 2 ) and λ 1 /(λ 1 + ν 1 ). Furthermore, A 1 is a decreasing, a constant, or an increasing function of p 1 according as Thus, if "greater" holds in (3), we always choose a perfect repair; and thereafter the system becomes as good as new. On the other hand, if "less" holds in (3), we never choose a perfect repair after the first failure; we must always choose an imperfect repair. When equality holds in (3), we are indifferent between perfect and imperfect repairs after the first failure.
Interpretation of Condition (3): From Figure 1, we note that if we choose a perfect repair after the first failure, then between successive renewal times, the mean system up time (MSUT) is λ 1 , the mean system down time (MSDT) is µ 1 , and the limiting availability is given by the left-hand side of (3). On the other hand, if we choose an imperfect repair after the first failure followed by a perfect repair after the second failure, then between successive renewal times, the MSUT is (λ 1 + λ 2 ), the MSDT is (ν 1 + µ 2 ), and the limiting availability is given by the right hand side of (3). Hence, by comparing the two sides of (3) we can determine which type of repair is preferable after the first failure.
Next, suppose that K = 2; that is, we are allowed to carry out at most two imperfect repairs, and after the third failure we must do a perfect repair. Of course, if "greater" holds in Condition (3), we surely do a perfect repair after the first failure. Thereafter, the system is renewed. Therefore, suppose that "less" holds in Condition (3). Then we surely carry out 180 SYSTEM MAINTENANCE USING SEVERAL IMPERFECT REPAIRS BEFORE A PERFECT REPAIR Figure 2. Transition diagram when at most K = 2 imperfect repairs are permitted an imperfect repair after the first failure. The imperfectly repaired unit will operate for an exp(λ 2 ) duration until it fails again. What should we do after this second failure-a perfect repair or an imperfect repair? We consider these two options, assigning probability p 2 to a perfect repair and probability (1 − p 2 ) to an imperfect repair, which lead to the state transition diagram given in Figure 2. The steady state probabilities of the seven states are denoted by P N , P I1 , P U1 , P P2 , P I2 , P U2 and P P3 , with P denoting their row vector.
For K = 2, the rate matrix is As in the case of K = 1, to solve the steady state equations M · P = 0 and P ′ 1 = 1, we express each steady state probability as a known multiple of P N : Again, from P ′ 1 = 1, we obtain We have already seen that if an imperfect repair is followed by a perfect repair, then between successive renewal times, the MSUT is Λ 2 = λ 1 + λ 2 and the MSDT is ν 1 + µ 2 =Ñ 2 , say. Likewise, if two successive imperfect repairs are followed by a perfect repair, then between successive renewal times, the MSUT is Λ 3 = λ 1 + λ 2 + λ 3 and the MSDT is ν 1 + ν 2 + µ 3 =Ñ 3 , say. Using these notation, the limiting system availability is Stat., Optim. Inf. Comput. Vol. 9, March 2021 H. SMITHSON AND J. SARKAR 181 As p 2 varies between 0 and 1, A 2 (p 2 ) varies between Λ 3 /(Λ 3 +Ñ 3 ) and Λ 2 /(Λ 2 +Ñ 2 ). Also, A 2 is a decreasing, a constant, or an increasing function of p 2 according as Interpretation of Condition (6): Suppose that we have already chosen an imperfect repair after the first failure; and we are contemplating whether to perform a perfect repair right after the second failure, or to do an imperfect repair after the second failure followed by a perfect repair after the third failure. In view of Figure 2, the two sides of (6) give the limiting availabilities under these two choices. Hence, by comparing the two sides of (6), we can determine which type of repair is preferable after the second failure, if an imperfect repair is done after the first failure and a perfect repair must be done after the third failure.
Notice the similarities between corresponding formulas when K = 1 and K = 2. Next, we extend the results to any arbitrary positive integer K, which denotes the maximum number of imperfect repairs permitted before a perfect repair. See Figure 3 for the transition diagram. Let us also extend the notation for cumulative life-and repair times: If (k − 1) imperfect repairs are followed by a perfect repair, then the expected cumulative lifetime and the expected cumulative repair time are Along similar lines of reasoning, but omitting the details, we establish Proposition 3.1, which generalizes the three results: (1) the limiting availability if, given (k − 1) imperfect repairs, we choose to conduct another imperfect repair after the k-th failure with probability p k ; (2) the condition under which, given (k − 1) imperfect repairs, it is preferable to conduct another imperfect repair after the k-th failure; and (3) the maximum value of the limiting availability if we choose p k optimally. Let K denote the maximum number of imperfect repairs allowed. For k = 1, 2, ..., K, having chosen (k − 1) imperfect repairs in succession, suppose that after the k-th failure, a perfect repair will be chosen with probability p k and an imperfect repair with probability (1 − p k ), with the understanding that if an imperfect repair is chosen, then after the (k + 1)-st failure, surely a perfect repair will be chosen. Then the system limiting availability is which is a decreasing, a constant, or an increasing function of p k according as if " = ′′ holds in (10) Note that in case an inequality holds in Condition (10), the optimum p k is a corner solution (either p k = 0, or p k = 1), and in case an equality holds, we can choose either one of the two corner solutions to maximize A k .
Interpretation of Condition (10): Suppose that we have already chosen imperfect repairs after the first (k − 1) failures; and we are contemplating whether to perform a perfect repair after the k-th failure, or perform an imperfect repair after the k-th failure followed by a perfect repair after the (k + 1)-st failure. Then the two sides of (10) give the limiting availabilities under these two choices.
The next corollary establishes that, under a mild condition on the perfect repair times (satisfied, in particular, by the geometric perfect repair times), the choice between imperfect and perfect repairs after successive failures is self-correcting. That is, if after the k-th failure it is optimal to choose a perfect repair, but we erroneously choose to carry out an imperfect repair, then after the (k + 1)-st failure the optimal choice will be a perfect repair. Equivalently, if after the k-th failure it is optimal to choose an imperfect repair, then it must be true that after each previous failure the optimal choice was an imperfect repair.
Next, since ν k is increasing in k and assumption (11) holds, we havẽ Hence, from (13), we have whence, by dividendo, we have Λ k+1 . This completes the proof of (12).

The optimum number of imperfect repairs
In Section 3, Proposition 3.1 solves the optimal choice of p k that maximizes A k for k = 1, 2, . . . , K. If it turns out optimal to choose p k = 0 for all k = 1, 2, . . . , K, then we conduct K imperfect repairs followed by a perfect repair after the (K + 1)-st failure. The transition diagram is given in Figure 4; and from (9), the limiting availability in this case is In Proposition 3.1 we established that in order to maximize the limiting availability A k , the optimum choice of probability p k of conducting a perfect repair after the k-th failure, is always a corner solution (either p k = 0, or p k = 1); and in Corollary 3.2 we showed that p k 's are non-decreasing. We simply note down K * , the smallest k such that p k = 1. This K * is the optimal number of imperfect repairs such that A K * is the largest among {A 1 , A 2 , . . . , A K }. Thus, we have solved the problem of determining endogenously the optimal number of imperfect repairs that must be performed before performing a perfect repair in order to maximize the limiting availability.
In Section 2 , we assumed the following relations among the parameters: In other words, the lifetimes shrink and the repair times expand geometrically as k increases. We assume further that β < γ; that is, after several imperfect repairs when the system fails again, the next perfect repair time does not increase as fast as the next imperfect repair time. Under these assumptions, we can rewrite the results in Proposition 3.1: The system limiting availability is which is a decreasing, a constant, or an increasing function of p k according as According as the three cases in (15) hold, A k (p k ) attains a maximum value (at any p k ∈ [0, 1]); if " = ′′ holds in (15) 184 SYSTEM MAINTENANCE USING SEVERAL IMPERFECT REPAIRS BEFORE A PERFECT REPAIR Suppose that the cost per unit time of a perfect repair (and that of an imperfect repair) increases linearly depending on the number of imperfect repairs already made on a brand-new unit. To elaborate, for k = 1, 2, . . . , K, suppose that (k − 1) imperfect repairs have been made on the unit. Then after the k-th failure, the cost per unit time of a perfect repair is B k = b 0 + b 1 k; and that of an imperfect repair is C k = c 0 + c 1 k, where b 0 , b 1 , c 0 , c 1 > 0 are known parameters.
Suppose now that our criterion for optimization is to minimize the limiting average repair cost per unit time. Along similar lines of reasoning as in Section 3, we can determine how to choose optimally between a perfect and an imperfect repair after the k-th failure, if we have already chosen an imperfect repair after each of (k − 1) failures and in case an imperfect repair is chosen, we will follow it up with a perfect repair after the (k + 1)-st failure. We simply compare ξ k and ξ k+1 , the average repair cost per unit time between successive renewal times under these two options; and choose the type of repair that attains min{ξ k , ξ k+1 }, where Let K # denote that value of k (k = 1, 2, . . . , K) such that ξ K # achieves the minimum value among {ξ 1 , ξ 2 , . . . , ξ K+1 }. Thus, if our objective is to minimize the limiting average repair cost per unit time (or equivalently, to maximize the limiting average profit per unit time), then we must conduct K # imperfect repairs followed by a perfect repair after the (K # + 1)-st failure.
When K # and K * turn out to be equal (or nearly equal), we optimize (or nearly optimize) both the desirable objectives of maximizing the limiting availability and minimizing the limiting average repair cost per unit time. However, oftentimes (depending on the values of the parameters) the optimal number of imperfect repairs under these two criteria may differ, causing us to look for a compromise.

Numerical results
We compute both the limiting availability and the average repair cost per unit time, under different parameter values, to determine the number of imperfect repairs before the ultimate perfect repair. We choose the following values of the parameters: We first evaluate Condition (10) after the k-th failure, k = 1, 2, .... Recall that if "<" holds in Condition (10), then we choose an imperfect repair after the k-th failure in order to maximize the limiting availability. We present the results in Table  1. For ν = 10, we choose imperfect repairs after each of the first three failures; but after the fourth failure, we choose a perfect repair. Similarly, for ν = 4, we choose imperfect repairs after each of the first 5 failures, and a perfect repair after the sixth failure; and for ν = 1, we choose imperfect repairs after the first 8 failures, and a perfect repair after the ninth failure.
Next, we calculate the limiting availability using (14), and the average cost per unit time using (16) and plot these quantities in Figure 5. For ν 1 = 10, we note that K * = 4 = K # . Thus, there is no conflict between the two desirable criteria to optimize-maximize the limiting availability and minimize the average cost per unit time. However, when ν 1 = 4, we have K * = 6 ̸ = 5 = K # ; the maximal limiting availability is A 6 = .870, which is .230% more than A 4 = .868; and the average cost per unit time is ξ 6 = 6.628, which is 1.554% higher than ξ 5 = 6.525. In this case, we may choose K * = 6 imperfect repairs before a perfect repair, if the 1.554% increase of limiting average repair cost per unit time is within the budget; otherwise, we may choose K # = 5 imperfect repairs before a perfect repair, if we are willing to sacrifice a .230% limiting availability. Similarly, when ν 1 = 1, we have K * = 9 ̸ = 6 = K # ; the maximal limiting availability is A 9 = .911, which is .878% more than A 6 = .903; and the limiting average cost per unit time is ξ 9 = 6.570, which is 6.682% higher than Table 1. To determine the number of imperfect repairs before a perfect repair, either maximize the limiting availability A

Conclusion
In an attempt to device an efficient maintenance policy that will increase limiting availability and reduce repair cost, we permit system maintenance using a relatively quick and inexpensive imperfect repair after a few failures, followed by an eventual perfect repair that brings the system back to "as good as new." First, we address the problem of determining the probability of choosing a perfect repair over an imperfect repair to maximize the limiting availability (or minimize the average repair cost per unit time). We exhibit that the optimal probability is 0 after the first few failures and thereafter it is 1 after each additional failure. We determine a straight-forward condition whose evaluation determines these optimal probabilities. Furthermore, we show that if the condition indicates that a perfect repair is optimal, but we mistakenly conduct an imperfect repair, then after the next failure, the condition will again indicate that a perfect repair is optimal. Thus, we can determine endogenously the number of failures after which imperfect repairs must be done followed by a perfect repair after the next failure, to either maximize the limiting availability or minimize the average cost per unit time. It would be ideal if these two criteria to optimize lead to the same or nearly the same number of imperfect repairs. Otherwise, we have to compromise either on the limiting average repair cost or on the limiting availability.
Throughout the paper we have assumed both lifetime and repair times are exponentially distributed. It is imperative that a future study incorporate more general life-and/or repair time distributions.