The circle is the total number of students that take the exam, the sectors are the tree professor, and the small circle in the inside is the students that fail. We will call the number that fail

**f**and the number that pass

*x***!f**. The subscript x indicates under which professor the exam was taken. Let's call F the event the any student that fail and !F the event that a student doesn't fail.

*x*The whole circle represents (F+!F), call it 1 (normalized). The whole green sector, i.e. A, is the number of students that take the exam, times the probability of

**A**, that is (F+

**!F**)*P(

**A**) = P(

**A**).

The darker green region, i.e.

**F**∩

**A**, is the number of students that take the exam with A and fail, i.e. P(

**A**) * (

**fa**)/(

**fa+!fa**). This is clearly dependent on the number of students that take the exam with A (or, equivalently, on the number of days A is examining, i.e. P(A))

We know that if the examiner is

**A**then 12% fail. What this means is that fa/(fa+!fa) = 0.12. This is not the same as

**F**∩

**A**. In fact this information (the fraction of students that fail with professor A) doesn't depend on the number of students A is examining. Therefore, if we want to define it in terms of F ∩ A, we'll have that 0.12 = P(F ∩ A)/P(A). that is the definition of P(F|A).

Another way of looking at it is to consider the inner circle, i.e. P(

**F**). Since

**A,B,C**form a partition

P(

**F**) = P(F ∩ A) + P(F ∩ B) + P(F ∩ C).

This is P(

**F**) = P(F|A)*P(A) + P(F|B)*P(B) + P(F|C)*P(C). This is

**not**P(F|A) + P(F|B) + P(F|C).

If we wrongly consider 12% as P(

**F**∩

**A**), then the total number of students that fail will not depend any more on the events A, B, C happening, i.e. even if professor A doesn't turn up at all, the same number of students will fail (even if the % of students failing with professor C and B is lower!). This is incorrect.

Let's now have an event

**D**, i.e. the event that one - and only one - student fail. The total probability of D will be:

P(D) = P(D|A)*P(A) + P(D|B)*P(B) + P(D|C)*P(C).

as

**D**is a special case of F,

**D**is a subset of F.

We know that P(F|A) is 0.12, so clearly if we have n students taking the test with A, the probability of only one student failing with A will be 3*(0.12)(1-0.12)

^{n-1}. This, again, does not depend on A happening, because is "given" that A has happened. In other words it is

**not**P(D ∩ A), it is P(D ∩ A)/P(A).

EDIT: damn, I got the nCr bit wrong. Thanks Daniel for telling me.