For the past week I’ve been pondering a probability question while late-night nursing. As with many probability questions, this one is open for debate. Although Sir Monkeypants has convinced me that he has the right solution, I thought I’d still throw it open for comments.
Say you have 10 items and one is special (i.e. you have 10 cups, and one has a marble in it). You start picking cups at random. What are the chances that the special one (the “prize”, so to speak) will be the last one you choose?
Sir Monkeypants doesn’t know this, but I started thinking about this problem due to something that happened a couple of weeks ago on The Amazing Race. There were 220 hay bales in a field. 20 had a clue in them, and the other 200 were empty. The contestants had to roll out bales until they found a clue. One girl wound up rolling out over 100 bales and still didn’t find a clue. I thought this was so statistically improbable as to be nearly impossible, but I can’t figure out how to mathimatically prove it. I’ve assumed that:
* 8 other teams found clues, so 8 of the clues are gone from the subset
* each other team took about 4 tries to find the clue, so 8×3 = 24 of the empty ones are gone too
That means out of 12 clues and 176 empty bales, she opened over 100 and still didn’t find a clue. What are the chances of that?
I think the answer is (n-1)/n * (n-2)/(n-1) * (n-3)/(n-2) * … * 1/2 * 1/1.
Simplified, this is just (n-1)! / n!.
Let’s do the example with n = 10 items. You are repeatedly trying to draw the non-special item until the end, when you are trying to select the special item. The first time you draw an item you have a 9/10 probability of selecting a non-special item. The next time you draw an item you have a 8/9 probability (since there are only 9 items left and 8 non-special ones). This continues until you have 2 items left and 1 is special. At this point there is a 1/2 probability of selecting the non-special item. Finally there is only one item left and you have a 1/1 of selecting the special item.
(10-1)!/10! = 9/10 * 8/9 * 7/8 * … * 1/2 * 1 = 0.1.
So, with 10 items, this scenario will happen 10% of the time.
I don’t think the chances of it happening are that remote. For the cups problem, my thought is that you’ll still get the “prize” cup last 10% of the time.
Hehe…looks like we posted at the exact same time :).
Those were my thoughts too.
For the haybale case, I think to go 100 picks without getting one, it would be:
( 176! / (176-101)! )
—————————–
( (176+12)! / (176+12-101)! )
Which is about 1/200. Unlikely, but not totally out there.
Now for your second question. There are 176 bales and 12 bales with clues. That means the first time you draw there will be a 164/176 chance of getting something a non-clue bale. The second time you draw there will be a 163/175.
The one hundredth time you will draw one of 100 non-clue bales out of a possible 112 bales.
This amounts to:
164/176 * 163/175 * … * 100/112
You can express this in factorials:
(164!/99!) / (176!/111!) = 0.0031.
So, this would happen roughly 3 times every 1000. Your instincts were right… this was pretty rare.
Hehe… we seem to be repeating eachother at the same time. Ya, I worked out the hay problem below, too. We used slightly different numbers but the same method of solving.
Okay .. I think you subtracted off the clue bales twice .. 200 bales without clues of which 24 have been unrolled by other teams=176 without clues. 20 bales with clues with 8 found by other teams leaves 12 remaining bales with clues. Total bales would be 176+12=188.
Ignoring this small oversight I still think that the number is much much smaller than the 3/1000 that you came up with … I’m going to go with your bale numbers for continuity.
The number of ways to choose 100 loosing bales out of the 164 empty bales is 164 choose 100. Typing this into google gives you a big ass number: 2.7*10^46.
The number of ways to choose 100 bales from the total number of bales is simply 176 choose 100. Google again helps us out telling us that this is 1.1*10^51.
So I think the answer is then 2.4*10-5.
Skewing the number of bales up by 12 for each section makes this number: 1.1*10^51/1.5*10^55 = 7.1*10-5.
Of course really we should all be waiting for capnplanet to be weighing in ..
I agree with this.
The chance of getting the “special” cup on the last try is the same as getting it on the first try (although by time you -reach- the last try, your chances have jumped to 100% obviously.)
Yes, I agree with this. Your formula is the same as mine, above, where I just did the arithmetic wrong. It winds up being about 1 in 14000, not 1 in 20000 as I said above.
This will give it too:
88 choose 12
============
188 choose 12
Mmm…. you only have 65 terms there instead of 100. Should be
(164!/64!)/(176!/76!) = 2.47E-5
which is about 1 in 40,0000.
Curiously, Google calculator can only do 170!, though it can handle 176 choose 100 (which is how I got the answer).
Anyway, sirmonkeypants has already set the record straight I think.
Ah, well there’s no need, since you’ve got it exactly right. (Two semesters of being a statistics TA has me pretty convinced that you’re right). And you’re right about the counting too – there are 176 empty bales and 12 bales with clues, so the correct answer is (176!/76!)/(188!/88!) = 7.2E-5.
I prefer to think of this the way first explained it: using the “old” numbers, on your first try you have a 164/176 chance of not finding a clue, on the second you have 163/175, etc., so the chance of missing 100 times is
(164/176)(163/175)…(65/77) = (164!/64!)/(176!/76!)
which is the same as ‘s answer (164 choose 100)/(176 choose 100) if you remember the formulas for n choose m and do a bit of simplification.
And as for ‘s original question, the analysis looks right too. By “brute force”, the answer is
(9/10)*(8/9)*…*(1/2) = 1/10. Of course when the answer simplifies so nicely I always expect there’s a better way of seeing the answer, and as points out, the chance of getting the prize cup last is the same as the chance of getting it first — 1/10. Why? You’re basically considering all permutations of the integers 1 through 10 (i.e. all ways of ordering them), and asking what fraction of them have (say) 1 in the last position. There’s nothing “special” about the last position, so 1/10 of them should have 1 in the 10th position, 1/10 should have 1 in the 4th position, etc. etc.
Another statistic that comes to mind here is what is the average number of bales that one should expect to turn over before finding a clue?
I think I know how to answer the question, but I’ll leave it open to discussion if anyone’s interested…
Considering sirmonkeypants had to set me straight several times on the original question, I doubt I can answer this one, although I will think about it. If no one answers for a few days, definitely post the right answer so I don’t have to lose sleep at night!
OK, I guess no one’s taking the bait, so I’ll put my $0.02 in. I haven’t solved this problem to my complete satisfaction, but I can at least answer the question I asked.
The original question about the bales was “what is the chance that at least 100 bales would be turned over before a clue was found?”. A very slight twist on this question is “what is the chance that the first clue appears on the 101st bale?”. To get this, we simply multiply the answer to the first question by the probability of getting a clue given the number bales and clues left. We’ll call this p(100); in general p(k) is the probability that a player turns over k bales before finding a clue under the (k+1)st bale.
The average number of bales one expects to turn over before finding a clue is just the sum of all possible values of k, each weighted by the probability of that value appearing. At the start of the game, there are 200 “clueless” bales, so k can be any integer between 0 and 200. This is called the expected value of k, E(k). So in symbols, E(k) = Σk k p(k)
Well, it’s nice to write that out, but what is it? We already know what p(k) is – it’s a fairly complicated expression involving 4 factorials or two binomial coefficients (binomial coefficient = “n choose m”), multiplied by another term for the probability of finding a clue exactly on step k+1. So at first I didn’t really have any hope of finding a closed-form expression for the summation. Instead I just wrote a little C++ program to calculate it. It works out to be about 9.5. (This is for the very beginning of the game, when there are 220 bales and 20 clues.)
So does this number make sense? I did a couple of other tests to make sure the numbers coming out of my program made sense. For one thing, the sum of p(k) for all the values of k should be 1, and that worked. I also checked my values of p(k) against the calculations that everyone made to find the probability of turning over 100 bales without finding a clue, starting with either 176 or 188 bales and 12 clues, and those numbers were good too. So I think the program is correct.
So what do we make of this number 9.5? Suppose instead that the players weren’t allowed to “remember” which bales were turned over, so that the numbers didn’t change at each step. That is, there are always 220 bales and 20 clues, so the probability is always 1 in 11 of finding a clue. Then as it turns out, the expected value is 10. (In general, with M clues and N bales with no clues, the expected value is N/M.) With “memory”, the advantage improves slightly; with each bale turned over, the chances of finding a clue improve slightly, so one expects that it will be a little faster than if you weren’t allowed to remember which bales you’d looked under already.
Given the value 9.5 for the real problem, though, it’s kind of surprising that many of the groups only had to turn over 4 bales or so – another statistical improbability! (My program tells me p(4) is about 6.3%.) Maybe the bales were arranged in some way that gave away the location of the clues? Maybe the team that took 100 tries didn’t see this, or were misled somehow? Maybe the whole thing is fixed????
Anyway, as you can see I’ve spent probably more time that I should have thinking about all of this. Actually what I’ve just discussed didn’t take that long. However, the problem stuck in my mind, and I decided to see if I could find a closed form answer for E(k) (i.e. a simple function of the input parameters). Well, I started whipping out recurrence relations and generating functions, differential equations and integrating factors, and I finally got a simple DE for the generating function which in principle I can solve, but in practice seemed like it wouldn’t help me much. This isn’t really my area of expertise; even if I had the generating function I’m not entirely sure how to find E(k) using it. There are some good sections of Knuth’s “The Art of Computer Programming” on this subject, but now I think I really have spent enough time on this.
(And to finish, an almost completely unrelated trivia question: what is the maximum length (in characters) of a reply post on LiveJournal? I found out the answer before editing this post slightly…)
It’s 5:30am here and I’ve been up with The First Lady for ages now, so I just came downstairs to answer your post, and here you’ve already posted! Ah well.
sirmonkeypants and I have discussed at length over the past two days and we were about 3/4 of the way to your solution. We knew we needed a weighted average and that the goal was to sum up each (number of tries) x (probability). We were just looking for a forumlaic way of finding the sum, rather than having to calculate each probability by hand (or, using a computer program). Sounds like that was pretty tough so I’m glad I didn’t actually break out the pencil and paper and try it myself!
As for the 4 bale thing, that is totally my own assumption. They didn’t show enough on the actual program itself to guess at how many attempts each team took — the only concrete information I had from the show is that no team found the clue on the very first try. I figured that giving each team an average of 4 bales was a little on the light side, but erred on the side of safety. On the show, when they came out to stop the 100+ team, that team indicated that there were “only a few” bales left, period. That implies that the other teams took, I’m guessing, closer to 10 tries each to find a clue.
I think given the statistical improbability of having to open 100 and not finding a clue, combined with the fact that there were only a few left at the end, implies that the girl involved actually did open at least a few containing a clue, but for some reason her technique of searching the hay did not lead her to find the pertinent piece of paper. It was pretty sad to watch — she was a super nice girl and we really liked her, plus she was at the hay bale thing for over 8 hours, which must have been so frustrating.
And how long can a live journal post be? I had no idea there was a limit!
Oh well – I’m sorry I spoiled it for you, but it sounds like you knew as much as I did anyway. Kudos!
As for the 100 bales, you are probably right – she must have just missed the clues on some of the bales. Reality is always more complicated than the model! But I’m astonished that she was there more than 8 hours… man, that’s perseverance! Did she get eliminated?
The Amazing Race has been on my radar for a few years, but I’ve never quite gotten hooked. I happened to catch the last half of this year’s opener, though, and I was so entertained I vowed I’d watch it all this year. However, for some reason I never got around to adding it to my TiVo list, and now I’ve missed several episodes. Probably it’s because I already have too much stuff queued up that I don’t have time to watch…
Apparently comments can’t exceed 4300 characters. I expect posts are exempt from this restriction though.