So I'm faced with a much bigger stat problem, all I need to know to continue on my proof is:
- what are the assumptions needed to get the standard error of a proportion
- how do you approximate the standard error of a proportion given p (proportion) and n (number of samples).
I found a site that says:
SEp = sqrt[ p * ( 1 - p ) / n ]
but I'm not quite sure how they got it.
Anyone willing to explain to me how to estimate it?
Thanks in advance.
The below text answers your questions. I listed the "assumptions" that I needed to use when I took Statistics (some are kinda stupid, o well), and also answered your second question.
First let's be clear about what a proportion is*. Suppose you have a yes/no question. The proportion is the probability of the answer being yes.
Example: You may want to know the proportion of people who like ice cream (the yes/no question is, do you like ice cream?).
To estimate a proportion** you take a sample of responses (each yes/no response is a trial), and find the fraction of yes in this sample. For this to work well you obviously need these assumptions:
1. MUST BE A RANDOM SAMPLE - you can't have bias... like you can't just choose people who are buying ice cream, or you'll probably get an estimate that's too high
2. TRIALS MUST BE INDEPENDENT - previous responses can't influence the next response... the point here is the probability of each trial resulting in "yes" must stay the same
Example: Suppose p = 60% of people like ice cream. You ask n RANDOM people if they like ice cream. The fraction that answer "yes" is your estimated proportion.
For each trial to be totally independent, you have to be allowed to choose the same person again. But we usually don't do that, and it turns it doesn't matter much if you choose "without replacement" if the population is more than 10 times the number of trials (10 is arbitrary). This means we have another assumption:
3. 10 PERCENT CONDITION: If your sampling procedure means you don't choose anyone twice (which is usually the case), the sample must be less than 10 percent of the population.
* p stands for proportion
** p with a ^ on top stands for estimated proportion
The difference between your estimate and the actual proportion is clearly on average zero. But your estimate could be higher or lower; it can vary. As you know standard deviation measures how much something varies:
Standard deviation of estimated proportion - actual proportion = sqrt[ p * ( 1 - p ) / n ]
(Note: The above follows from the definition of standard deviation.) The problem is you don't know the actual proportion, p, so you don't know the standard deviation. In order to approximate it, we do:
Standard error = sqrt[ X * ( 1 - X ) / n ]
where X = estimated proportion. (There's the answer to your second question.)
USING THE STANDARD ERROR
In order to use the standard error for stuff like confidence intervals you need to know HOW an estimate can vary. It turns out that if you have a large enough sample, it will follow (about) a normal distribution. Here's what we usually say is "large enough":
4. ASSUMING NORMALITY If n*(1-X) and n*X both exceed 10, the normal distribution is a good model for how much an estimate can vary