Survival Analysis is a branch of measurements for breaking down the normal span of time until one or more occasions happen, for example, demise in natural living beings and disappointment in mechanical frameworks. This point is called unwavering quality hypothesis or dependability investigation in building, span examination or length demonstrating in financial matters, and occasion history examination in human science.
Survival Analysis endeavors to answer inquiries, for example, what is the extent of a populace which will make due past a specific time? Of those that get by, at what rate will they kick the bucket or fall flat? Can numerous reasons for death or disappointment be checked? How do specific circumstances or qualities increment or diminishing the likelihood of survival?
To answer such inquiries, it is important to characterize “lifetime”. On account of organic survival, demise is unambiguous, however for mechanical dependability, disappointment may not be very much characterized, for there may well be mechanical frameworks in which disappointment is halfway, a matter of degree, or not generally confined in time. Indeed, even in natural issues, a few occasions (for instance, heart assault or other organ disappointment) may have the same equivocalness. The hypothesis illustrated underneath expect all around characterized occasions at particular times; different cases might be better treated by models which expressly represent vague occasions.
More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an “event” in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research.
Introduction to survival analysis
Survival analysis is used in several ways:
- To describe the survival times of members of a group
- Life tables
- Kaplan-Meier curves
- Survival function
- Hazard function
- To compare the survival times of two or more groups
- Log-rank test
- To describe the effect of categorical or quantitative variables on survival
- Cox proportional hazards regression
- Parametric survival models
- Survival trees
- Survival random forests
Definitions of common terms in survival analysis
The following terms are commonly used in survival analyses.
Event: Death, disease occurrence, disease recurrence, recovery, or other experience of interest
Time: The time from the beginning of an observation period (such as surgery or beginning treatment) to (i) an event, or (ii) end of the study, or (iii) loss of contact or withdrawal from the study.
Censoring / Censored observation: If a subject does not have an event during the observation time, they are described as censored. The subject is censored in the sense that nothing is observed or known about that subject after the time of censoring. A censored subject may or may not have an event after the end of observation time.
Survival function S(t): The probability that a subject survives longer than time t.
The object of primary interest is the survival function, conventionally denoted S, which is defined as
where t is some time, T is a random variable denoting the time of death, and “Pr” stands for probability. That is, the survival function is the probability that the time of death is later than some specified time t. The survival function is also called the survivor function or survivorship function in problems of biological survival, and the reliability function in mechanical survival problems. In the latter case, the reliability function is denoted R(t).
Usually one assumes S(0) = 1, although it could be less than 1 if there is the possibility of immediate death or failure.
The survival function must be non-increasing: S(u) ≤ S(t) if u ≥ t. This property follows directly because T>u implies T>t. This reflects the notion that survival to a later age is only possible if all younger ages are attained. Given this property, the lifetime distribution function and event density (F and f below) are well-defined.
The survival function is usually assumed to approach zero as age increases without bound, i.e., S(t) → 0 as t → ∞, although the limit could be greater than zero if eternal life is possible. For instance, we could apply survival analysis to a mixture of stable and unstable carbon isotopes; unstable isotopes would decay sooner or later, but the stable isotopes would last indefinitely.
Lifetime distribution function and event density
Related quantities are defined in terms of the survival function.
The lifetime distribution function, conventionally denoted F, is defined as the complement of the survival function,
If F is differentiable then the derivative, which is the density function of the lifetime distribution, is conventionally denoted f,
The function f is sometimes called the event density; it is the rate of death or failure events per unit time.
The survival function can be expressed in terms of probability distribution and probability density functions
Similarly, a survival event density function can be defined as
The survival event density function in other fields, such as Statistical Physics is known as the first passage time density.
Hazard function and cumulative hazard function
The hazard function, conventionally denoted λ, is defined as the event rate at time t conditional on survival until time t or later (that is, T ≥ t). Suppose that an item has survived for a time t and we desire the probability that it will not survive for an additional time dt. That is, consider
Force of mortality is a synonym of hazard function which is used particularly in demography and actuarial science, where it is denoted by μ. The term hazard rate is another synonym.
The force of mortality of the survival function is defined as
The force of mortality is also called the force of failure. is the probability density function of the distribution.
In actuarial science, the hazard rate is the rate of death for lives aged x. For a life aged x, the force of mortality t years later is the force of mortality for a (x + t)–year old. The hazard rate is also called the failure rate. Hazard rate and failure rate are names used in reliability theory.
Any function is a hazard function if and only if it satisfies the following properties:
In fact, the hazard rate is usually more informative about the underlying mechanism of failure than the other representatives of a lifetime distribution.
The hazard function must be non-negative, λ(t) ≥ 0, and its integral over must be infinite, but is not otherwise constrained; it may be increasing or decreasing, non-monotonic, or discontinuous. An example is the bathtub curve hazard function, which is large for small values of t, decreasing to some minimum, and thereafter increasing again; this can model the property of some mechanical systems to either fail soon after operation, or much later, as the system ages.
The hazard function can alternatively be represented in terms of the cumulative hazard function, conventionally denoted λ:
so transposing signs and exponentiating
or differentiating (with the chain rule)
The name “cumulative hazard function” is derived from the fact that
which is the “accumulation” of the hazard over time.
From the definition of λ(t), we see that it increases without bound as t tends to infinity (assuming that S(t) tends to zero). This implies that λ(t) must not decrease too quickly, since, by definition, the cumulative hazard has to diverge. For example, exp(-t) is not the hazard function of any survival distribution, because its integral converges to 1.
Quantities derived from the survival distribution
Future lifetime at a given time t0 is the time remaining until death, given survival to age t0. Thus, it is T-t0 in the present notation. The expected future lifetime is the expected value of future lifetime. The probability of death at or before age t0+t, given survival until age t0, is just
Therefore, the probability density of future lifetime is
and the expected future lifetime is
where the second expression is obtained using integration by parts.
For t0=0, that is, at birth, this reduces to the expected lifetime.
In reliability problems, the expected lifetime is called the mean time to failure, and the expected future lifetime is called the mean residual lifetime.
As the probability of an individual surviving until age t or later is S(t), by definition, the expected number of survivors at age t out of an initial population of n newborns is n × S(t), assuming the same survival function for all individuals. Thus the expected proportion of survivors is S(t). If the survival of different individuals is independent, the number of survivors at age t has a binomial distribution with parameters n and S(t), and the variance of the proportion of survivors is S(t) × (1-S(t))/n.
The age at which a specified proportion of survivors remain can be found by solving the equation S(t) = q for t, where q is the quantile in question. Typically one is interested in the median lifetime, for which q = 1/2, or other quantiles such as q = 0.90 or q = 0.99.
One can also make more complex inferences from the survival distribution. In mechanical reliability problems, one can bring cost (or, more generally, utility) into consideration, and thus solve problems concerning repair or replacement. This leads to the study of renewal theory and reliability theory of ageing and longevity.