If Y is a Continuous Random Variable We Argued That for All infinity a Infinity PYa0
Continuous Random Variables¶
We spent the last two days thinking about discrete probability distributions. Particularly these include random variables that are counting things (what do the binomial, geometric, and Poisson random variables count?).
However you can probably give some examples of numerical random variables which are not discrete.
Note the thing that should give us pause, and explains why we have to introduce these random variables in a slightly different way from how we introduced discrete random variables:
-
For a discrete random variable, it makes sense to ask \(P(Y = r)\) for some \(r\). I.e. there will be some \(r\) such that this number is non-zero. In fact we took this as our starting point for all of the discrete random distributions we have talked about.
-
For a continuous random variable: \(P(Y=x)\) has us a little bit nervous. If \(Y\) is truly continuous then near \(x\) there are infinitely many (uncountably infinitely many) values that are also possible. They can't all have a non-zero value without us having trouble adding them all up, and at the same time if only a discrete set of them are non-zero so that we can add them up then what we have is a discrete variable. In other words, we have a bit of a paradox. Luckily Calculus, as it usually does, gives us the language to talk about continuous things as generalizations of discrete things.
First we need to make our definition precise: How will we recognize a continuous random variable? (if you were in my class last summer you might recall our sandwich activity).
One picture that should have you thinking from earlier this week is the graphs we made of the Cumulative Distribution Function for our random variables, here is the one for the binomials random variable:
n <- 10 p <- 0.38 r <- c ( -1 : ( 50 * n )) / 50 plot ( r , pbinom ( r , n , p ), type = "l" )
onsider instead what this graph will need to look like for a random variable that can take any value rather than just the integer values?
Recall the properties of our CDF \(F(x) = P(Y \leq x)\):
-
\(\lim_{x\to -\infty} F(x) = 0\)
-
\(0 \leq F(x) \leq 1\) for all x.
-
\(F(x)\) is a non decreasing function of \(x\): given \(x_1 < x_2\) then \(F(x_1) < F(x_2)\).
-
\(\lim_{x\to \infty} F(x) = 1\)
A Continuous Random Variable is one for which the cumulative distribution function: \(F(x) = P(Y \leq x)\) is a continuous function.
Note that we can't say a discrete random variable is one where the CDF is not-continuous as it could have continuous and non-continuous parts. There is a whole theory of decomposing a general random variable into discrete and continuous components that is beyond what we want to do for this class.
The next thing to note is that for our discrete random variables, the jump for the steps in the CDF is the probability distribution for that value.
For a continuous random variable then, the probability that the variable achieves any specific value must be 0 because otherwise \(P(Y = x) \) would be the size of a jump discontinuity in the CDF.
Probability Density Function¶
So we don't have a distribution in the sense that we do for discrete random variables, however note that for continuous functions the idea which captures how much the function is increasing (the size of the steps) is given by the derivative. This leads to:
The Probability Density Function of a continuous random variable with CDF \(F(x) = P(Y \leq x)\) is given by: $\( f(x) = \frac{d}{dx} F(x) \)$ where the derivative exists.
Note a few consequences:
-
\( f(x) \geq 0 \)
-
\( \int_{-\infty}^\infty f(x) dx = 1\)
Why is this called the density function?¶
Let's assume that \(f(x)\) exists everywhere. Then the Fundamental Theorem of Calculus implies that:
\[ F(x) = P(Y \leq x) = \int_{-\infty}^x f(t) dt \]
Suppose we wanted to know \(P(a \leq Y \leq b)\)? On the one hand we compute this from the CDF:
\[ P( a \leq Y \leq b) = F(b) - F(a) \]
However we can rewrite this in terms of the PDF using the algebra of integrals:
\[ P(a \leq Y \leq b) = \int_a^b f(x) dx \]
I.e. the area under \(f(x)\) over a region of \(x\)-space gives the probability that \(Y\) lies in that region. Note some consequences:
-
This is another way of thinking about why \(P(Y = x) = 0 \): It corresponds to an integral over a single point.
I think of this as density in the same sense as we would use in Physics if the density is a proportion of the whole (i.e. we choose units so the total mass is 1). The integral of the density function for an object gives the mass of that object, center of mass and mean are the same thing, and the variance we are using corresponds to moments - describing how much the mass of an object is spread from its center.
Uniform Distribution¶
For our first example consider the cumulative distribution function:
\[\begin{split} F(x) = \left\{ \begin{matrix} 0 & x < 0 \\ x & 0 \leq x \leq 1 \\ 1 & x > 1 \end{matrix} \right\} \end{split}\]
To plot this, we first need to define the function. We will go over this in class, but this is the syntax for definition a function in R and also because it is piecewise the syntax for if statements. Note this is the most complicated programming we will do and you can just copy this function and modify it. A couple of notes about using this in R - from the notebook you need to execute this command with the cursor at the very top, working from the middle will only run the middle block of code. The other note is that I want to write the function so that it handles columns the same way the builtin functions we have already met do. That means I need to assume the input is a list of values and it needs to return a list of values.
# We need to write functions so they take a column of input values and return a column of output values, # This will make what we do next easier. F <- function ( x ) { result <- c () for ( k in x ) { # The actual function - note the if statements we need because of the piecewise nature if ( k < 0 ) { result <- c ( result , 0 ) } else if ( k > 1 ) { result <- c ( result , 1 ) } else { result <- c ( result , k ) } } result } # testing that it works F ( c ( -0.25 , 0.25 , 3 ))
- 0
- 0.25
- 1
With that defined we can now plot the function.
n <- 100 x <- c (( -2 * n ) : ( 2 * n )) / n plot ( x , F ( x ), type = "l" )
This is our cumulative distribution. Without even computing anything we can tell that the support of our density is on the interval \([0, 1]\). Note that we have corners in the CDF so there are two places where the density is not defined - your instincts from Calculus are correct that not having a density at a discrete set of values is not a big deal. You may recall from MATH 534 that there are some particularly nasty functions, but while they are theoretically interesting as CDFs they do not correspond to random variables we see often (if at all).
We can compute the probability density function by differentiating this one. We get a piecewise defined function:
\[\begin{split} f(x) = \left\{ \begin{matrix} 0 & x < 0 \\ 1 & 0 \leq x \leq 1 \\ 0 & x > 1 \end{matrix} \right\} \end{split}\]
This is an example of a uniform continuous distribution. On the support of the variable, the probability that \(Y\) is in an interval is proportional to the length of that interval. It is tempting to say that each value of \(Y\) is equally likely, however this while being correct is meaningless - for a continuous random variable the probability of any value occuring is zero - hence they are all the same, regardless of what the density is. For continuous random variables we can only talk in terms of (a) the probability density or (b) the probability of the variable occuring in a set. This distinction is what separates those who have been carefully learning probability and statistics from those who tried to cut some corners or are getting sloppy.
General Unifrom Distributions¶
Find the constant \(C\) such that \(f(x)\) below is a valid PDF:
\[\begin{split} f(x) = \left\{ \begin{matrix} 0 & x < a \\ C & a \leq x \leq b \\ 0 & x > b \end{matrix} \right\} \end{split}\]
If you translate this question into Physics: Find \(C\) such that the total mass of the object with density \(f(x)\) is 1, you will not be surprised by the answer.
-
Make a prediction about the Expected Value of the uniform random variable on \([a, b]\).
Simmulating continuous random variables¶
Following our pattern for discrete random variables, I would like to first simmulate this random variable so we can experiment and see what we think the Expected Value and Variance are. However simmulating continuous random variables is somewhat harder than discrete ones. For example with discrete ones we could even simmulate them mechanically by filling a bag with chips with different markings on them.
Luckily, R comes with a uniform random variable simmulator already prepared for us. It uses similar technology that that hiding behind our discrete sample command we used already. The command is runif r is the prefix and unif is the name of the distribution - hence we could simmulate our Poisson variable for example using rpois.
result <- runif ( 50 , 0 , 1 ) result
- 0.44196895044297
- 0.639591481303796
- 0.156283399788663
- 0.524530499242246
- 0.794476743787527
- 0.964458289789036
- 0.740097160451114
- 0.000646769069135189
- 0.563960117287934
- 0.256437463220209
- 0.0728266832884401
- 0.190172113711014
- 0.895058761350811
- 0.346919201314449
- 0.609722225228325
- 0.428403049241751
- 0.962903729407117
- 0.420675455359742
- 0.16847921255976
- 0.104754390195012
- 0.929244366241619
- 0.329721350222826
- 0.716274270787835
- 0.572553566191345
- 0.49146158574149
- 0.629327341215685
- 0.693135792389512
- 0.0176478256471455
- 0.900211474159732
- 0.542440827004611
- 0.225210800068453
- 0.696116974111646
- 0.563509899424389
- 0.789375897496939
- 0.0962984010111541
- 0.732412374578416
- 0.474899863824248
- 0.192756704753265
- 0.056589160580188
- 0.859179525403306
- 0.683803251944482
- 0.423192921560258
- 0.919298293534666
- 0.580085315974429
- 0.508078619139269
- 0.658739357953891
- 0.710519689600915
- 0.468984005507082
- 0.831684718606994
- 0.757585759274662
and maybe you can see that we might have a problem. Our sample is just that a sample. It is only a loose approximation of the distribution and in fact if all we knew was the sample we might conclude something veyr different about our distribution. Addressing this discrepency or maybe a defeciency is going to be out goal for the next few weeks.
Expected Value¶
In any case, we can use our sample to estimate the expected value of the distribution - intuitively, we expect or maybe hope that the mean of the sample is close to the expected value of the distribution. We will show that this precisely what is happening later, but for now let us allow ourselves to be naive.
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0006468 0.3340208 0.5637350 0.5266541 0.7283778 0.9644583
You might try increasing the size of the sample we take and check what happens to our summary statistics.
Exact Value¶
We can find the exact value similarly to how we did it in the discrete case. Recall that for a discrete random variable \(Y\) the expected value is computed exactly but finding the sum (if we can find it):
\[ E(Y) = \sum r P(Y=r) \]
In the continuous case with a random variable \(Y\) with PDF \(f(x)\) we appeal to our intuition from Calculus:
\[ E(Y) = \sum x P(\mbox{$Y$ is in a small neighborhood near x}) = \sum x f(x) \Delta x\]
and in the limit as the neighborhood sizes \(\Delta x \to 0\) we get (again no surprises here if you use your intuition that sums become integrals):
\[ E(Y) = \int x f(x) dx \]
Where the integral is over the support of the random variable.
For \(Y\) the uniform random variable on the interval \([a, b]\):
\[ E(Y) = \int_a^b x \frac{1}{b-a} dx = \frac{1}{2(b-a)} x^2 \bigg|_a^b = \frac{b^2 - a^2}{2 (b-a)} = \frac{b+a}{2} \]
Which surprise, surprise is the midpoint of the interval \([a, b]\).
Variance¶
The estimate variance from our sample can be found using var:
Again this actually just an estimate of the variance of the random variable. You should check what happens as the size of the sample is changed.
Exact Value¶
Again we can find the exact value using some algebra:
\[ V(Y) = E(Y^2) - \frac{(b+a)^2}{4} \]
and an integral:
\[ E(Y^2) = \int_a^b x^2 \frac{1}{b-a} dx = \frac{1}{3(b-a)} x^3 \bigg|_a^b = \frac{b^3 - a^3}{3 (b-a)} \]
I would just use Wolfram Alpha to simplify this, but you could also factor the cubic using the binomial theorem:
\[ E(Y^2) = \frac{b^2 + ab + a^2}{3} \]
(again one reason I like to teach this class to our MA program is all of these exercises from Calculus get used).
Then we combine this with the expected value to get:
\[ V(Y) = \frac{b^2 + ab + a^2}{3} - \frac{b^2 + 2 ab + a^2}{4} = \frac{b^2 - 2 ab + a^2}{12} = \frac{(b-a)^2}{12} \]
For the interval \([0, 1]\) we have:
Normal Distributions¶
The uniform distribution is a nice simple first example. However our next example we will see forms the basis for a large portion fo the results we will work with. The Normal Distribution will turn out to be a a universal structure that shows up in many instances.
The Probability Density Function for the normal distirbution is the shape classically known as the Bell Curve. It is given by the function:
\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-(y - \mu)^2 / (2 \sigma^2) } \]
Note that the distribution has two free parameters (suggestively written as) \(\mu\) and \(\sigma^2\).
A graph of the PDF, using the builtin expression for the function dnorm:
n <- 100 mu <- 1 sigma <- 0.5 x <- c (( -4 * n ) : ( 4 * n )) / n v <- dnorm ( x , mu , sigma ) plot ( x , v , type = 'l' )
Adjust the \(\mu\) and \(\sigma\) in this code block until you have a hypothesis about what the role is that they play in the shape of the distribution.
Note that the distribution never actually gives 0, though it does get very close as \(x\to \pm \infty\). The support of this random variable is all real numbers.
The cummulative distribution function is defined to be:
\[ F(x) = \int_{-\infty}^x \frac{1}{\sigma \sqrt{2\pi}} e^{- (t-\mu)^2 / (2 \sigma^2) } dt \]
Note that while this is an unbounded integral, it is one that will have no problems converging. That said, it is also an integrand for which none of the rules you learned in Calculus 2 work. We are left with numerical approximations, and in our case we can just use the builtin function pnorm in R.
n <- 100 mu <- 1 sigma <- 0.5 x <- c (( -4 * n ) : ( 4 * n )) / n v <- pnorm ( x , mu , sigma ) plot ( x , v , type = 'l' )
Sampling from the Normal Distribution¶
We will spend a large portion of later chapters asking what happens when we sample from a normal distribution, but r provides a builtin function that does it for us, so let's go ahead and use that. Though it is worth pausing here to discuss how one would implement such a thing as it is instructive:
-
Suppose that we have a good pseudo random number generator that will give us a \(Q\) uniformly distributed random variable on the interval \([0, 1]\). If we generating a \(Q = t\) and then take \(F^{-1}(t) = x\) (the inverse of the cummulative distribution function, this will then give us an \(x\) that we can use a sample from our random variable with cummulative distribution \(F(x)\).
-
Note that because \(F(x)\) is non-decreasing it is 1-1 and therefore the inverse function exists. Though we may not have a nice formula for it - for example with the normal distribution we do not have a nice formula for it.
-
Note that where \(F(x)\) is steepest, which corresponds to where \(f(x)\) is largest, the inverse function will be less steep. Therefore for a larger region of \(Q\) values we will get a smaller region of \(Y\) values - and vice versa.
Of course it is all moot as we do not have an exact formula for \(F(x)\) for the normal distribution and therefore we do not have an exact formula for the inverse of the CDF. What we do have though is a builtin sampling function rnorm.
Again note how consistent R is - the r prefix means sample from the variable.
result <- rnorm ( 50 , 1 , 0.5 ) result
- 1.10302382699968
- 0.915567411252461
- 1.34078913440361
- 1.35518332983056
- 1.27732379421102
- 0.996764356865005
- 1.54999917004702
- 0.847284810321782
- 1.34149126117627
- 0.744467443025406
- 1.00364241921739
- 0.879413498912197
- 1.31074036385416
- 0.890134160727683
- 0.749300576704844
- 2.04862290374271
- 1.77907022957382
- 1.81485161830309
- 1.09277555191982
- 1.41363068659387
- 0.0441813364950897
- 0.160910743948317
- 1.36254009752661
- 1.05197970572718
- 0.587606979937809
- 0.661570067866324
- 1.3692372638617
- 0.201971815349588
- 0.761847135482081
- 1.13072992243876
- 1.44140364246529
- 0.727632981374012
- 1.22662012487936
- 1.44574831965139
- 0.382637983945087
- 1.46813306605027
- 1.35785978544413
- 1.12369720955207
- 1.05411956618817
- 1.16996471035764
- 1.78324801919079
- 0.443188392882776
- 0.746353471167837
- 1.51400506316582
- 1.07755256258947
- 0.347864941739979
- 0.591258657141711
- 1.53666244928099
- 1.253736431819
- 0.718412043215062
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04418 0.74709 1.09790 1.06394 1.36137 2.04862
Take the time to adjust the size of the sample and see what we get.
Examples¶
Total Probability¶
We should check that the total probability of this density is 1. That is in fact what the the fraction in the density: \(\frac{1}{\sigma \sqrt{2\pi}} \) is for. It has always fascinated me that \(\pi\) shows up here! Unfortunately explaining why is beyond the scope of our class, but is a fascinating application of the Fourier Transformation.
Probability of \(Y\) in an Interval¶
Suppose we have the normal random variable \(Y\) with mean 0 and variance 1. Find the probability that \(Y\) is in the interval \([0, \infty)\).
We compute this from the integral of the density over this region:
\[ P( 0 \leq Y ) = \int_0^\infty \frac{1}{\sqrt{2\pi}} e^{-x^2 / 2 } dx \]
We can ask Wolfram Alpha or use the symmetry to argue we get 0.5.
Expected Value of a Function¶
The expected value of a function of our normal variable with mean 0 and variance 1 is found by integrating that function against the PDF for the distribution. For example to find
\[ E( e^{x} ) \]
we need to compute:
\[ \int_{-\infty}^\infty e^x \frac{1}{\sqrt{2\pi}} e^{-x^2 / 2 } dx \]
These are not typically easy integrals to compute but we can approximate the value numerically or use Wolfram Alpha: 1.64872.
Find an Interval with given probability¶
Find an interval \([a, b]\) such that the normal random variable with mean 0 and variance 1 has probability 0.95 of being found in the interval.
To answer this one it is worth using the cummulative distribution function defined in R pnorm.
Expected Value and Variance of Normal Random Variables¶
As you might expect the parameters \(\mu\) and \(\sigma^2\) are so named because for the normal random variable they are precisly the mean and variance.
\[E(Y) = \int_{-\infty}^\infty x \frac{1}{\sigma \sqrt{2\pi}} e^{-(x-\mu)^2 / (2\sigma^2) } dx = \mu \]
Showing this exactly is beyond the scope of the class, really the two best ways to do it use Complex Analysis or the Fourier Transform.
The variance should be compute directly (i.e. not using our algebra trick:
\[ V(Y) = E( (Y-\mu)^2 ) = \int_{-\infty}^\infty (x-\mu)^2 \frac{1}{\sigma \sqrt{2\pi}} e^{-(x-\mu)^2 / (2\sigma^2) } dx = \sigma^2 \]
Note that immediately a change of variables can be made that eliminates \(\mu\) - hence the \(V(Y)\) must even before we compute it, only depend on \(\sigma^2\).
carrillothoulladay.blogspot.com
Source: http://virgilpierce.org/MATH_550/_build/html/05-Continuous.html
0 Response to "If Y is a Continuous Random Variable We Argued That for All infinity a Infinity PYa0"
Post a Comment