Confidence Interval For My Breakfast Choices?
Every now and then I’m going to start spicing up your life with a little statistics, because everyone needs a little excitement from time to time, right? I won’t get crazy and nerdy with it, but I realized that when I bring up studies I will often times mention statistical terms that might need some explanation, depending on where you are in training/practice.
First off, I apologize for leaving you without a post for the last couple of weeks. I’m really not sure how you’ve been able to function during this time. For full disclosure, Twin Cities Tox got a vision upgrade, but it was done using the non-flap all laser PRK version rather than Lasik (google PRK if you desire further details; the interweb is back open today and ready to inform you on searches like that) so my visual recovery time has taken a while.
Now that we have that out of the way, let’s move on to the meat of the discussion, which is a basic explanation of confidence intervals. This is becoming more and more often used and desired by some of the prominent journals (Annals of Emergency Medicine, for example), and you should have some understanding of it. A confidence interval is basically a percentage of certainty (95% is most common in our scenario) that an upcoming unknown parameter will fall between a certain set of values, with that interval based on a set of previously measured similar parameters. It is truly an assessment of sampling, and ideally projects the reliability of that sampling projected onto potential future sampling by saying “this is the interval computed from the sampled data, which, if that study were repeated multiple times in the future, would contain that future sampling data 95% of the time”. In it’s most basic form the main thing in a study that affects a confidence interval is the number of samples, with a higher number of samples offering a better (narrower) CI. Really, the spread of the data matters as well, but ignore that for now.
If I told you in general I have smoothies for 71% of my breakfasts, and bowls of oatmeal for 29% of my breakfasts, you would probably just say “ok” because you don’t give a shit what I eat for breakfast. If I were presenting that breakfast breakdown as a study, however, you would have to ask more about my sampling of that data in order to verify it. This also applies if you’re making up a disturbing instance where you give a shit about what I eat for breakfast, or even more disturbing would be if you’re not making it up, and you really do care.
Here are a couple versions of sampling, which drastically change the 95% CI of the data (this is crude but accurate calculation, everyone but statisticians should be fine with this).
In the first scenario I tell you that I got that breakfast data from sampling one week, or 7 mornings. I had 5 days of smoothies and 2 days of bowls of oatmeal. That data gets presented like this: Smoothies 71% (95% CI 30%-95%); Oatmeal 29% (5-70).
In the second scenario I tell you I’ve been sampling for a year. That’s 259 smoothies and 106 bowls of oatmeal. Now I get to report: Smoothies 71% (66-76); Oatmeal 29% (24-34).
In the second scenario my sampling is much more acceptable and potentially accurately predictive because the intervals are narrower and don’t cross each other. If I’m looking at just the week of sampling, I can report what happened in raw terms (5 S, 2 B of O, for 71% and 29% respectively) but I can’t make a comment on what I do in general, even outside that sample. That’s what CI allows me to do.
Make any sense? I hope so, because I’m going to hit on CI in the next couple days with a study example.