In Survival Analysis we are interested in understanding the risk of an event happening at a particular point in time, where time is a continuous variable.
For example, let’s consider the event firing of a neuron: we define the time of firing as $X$, and time in general as $t$.
The hazard function, which is a function of time, is defined as:
\[h(t) = \lim_{\Delta t\to0}\frac{P(t<X<t+\Delta t|X>t)}{\Delta t}\]We are conditioning on $X>t$ because we want to condition our probability on the fact that the event hasn’t occurred yet.
Is there a way to rewrite $h(t)$ in a different way?
\[h(t) = \lim_{\Delta t\to0}\frac{P(t<X<t+\Delta t|X>t)}{\Delta t}\\ h(t) = \lim_{\Delta t\to0}\frac{P(t<X<t+\Delta t,X>t)}{P(X>t)\Delta t}\\\]It is easy to see that $(t<X<t+\Delta t)$ is just a subset of $X>t$
1
2
3
4
5
O---------------------- { X > t }
| o-------------- { t < X < t+Δt }
| |
----.-------.----.---------
t X t+Δt
$P(X>t)$ is called the survival function and is just $1$ minus the cumulative distribution function (CDF):
\[P(X>t) = 1-F(t)=1-\int_{t_0}^tp(t)dt\]The remaining part is the definition of the derivative of the CDF, which is just the probability density function (PDF) at time $t$
\[\lim_{\Delta t\to0}\frac{P(t<X<t+\Delta t)}{\Delta t}= \lim_{\Delta t\to0}\frac{P(X<t+\Delta t)-P(X <t)}{\Delta t}=\\ \lim_{\Delta t\to0}\frac{F(t+\Delta t)-F(t)}{\Delta t}=p(t)\]So, finally we can rewrite the hazard function as:
\[h(t) = \frac{p(t)}{1-\int_{t_0}^tp(t)dt}\]