Of Correlation and Causation

We all need to know what causes things to occur in order to survive.

We need this information to avoid bad outcomes and achieve good ones. For example, if don’t we accept that drinking foul water causes stomach upsets, then we are likely to suffer, if not die.

However, the teasing out of what causes what is subtle and prone to error.

Correlation is when two variables are, in some way related to one another and this is more easier to spot. Causation is where one variable causes another, and this is altogether more tricky to ascertain.

Let us say that we know two things are completely related, or as we might say wherever we find A we find B and vice versa. More formally A is 100% correlated with B. However does A cause B or vice versa and is this the only two options?

  • A may cause B, that is the presence of A is sufficient to cause B
  • B may cause A, that is the presence of B is sufficient to cause A
  • A and B are related to another variable C that causes both.

An example of the last possibility is that as ice cream sales increase, the rate of drowning deaths increases sharply too. Therefore, does ice cream consumption cause drowning. Or perhaps swimming encourages ice cream consumption. In reality they are both strongly correlated with good weather, where ice cream sales increase and swimming increases, so more drowning deaths occur. The causation is more likely to be good weather causes both.

We normally take as good evidence of causation a difference in timing. For example if A is observed before B then it is difficult to see how B could cause A. Although in this case, it does not imply that A is necessarily the causation of B.

Moreover in cases of less than 100% correlation if A is always present when B occurs but B can occur without A then the explanation that A causes B is more difficult to sustain.

It is wise to be careful when assuming there is a causal relationship between two correlated variables. It is often not a simple as it may appear.

