It is used in statistics to confirm data resulting from the combination of unequal size of group that has been pooled into a single data set.
Edward H. Simpson was first to describe this phenomenon in a technical paper in 1951 but Karl Pearson in 1899 and Udny Yule in 1903 who are both statisticians has mention previously the similar effects.
Simpson’s paradox are frequently encountered in social science and medical studies that are mostly confusing data when frequencies are overly specified, thus elimination of underlying associations are bring about into considerations.
The significance of Simpson’s paradox are manifest in decision making of some intriguing dilemma like, which data must be used to consult in choosing action, the aggregated or the partitioned? Either of the two can be used depending on the story behind the given data wherein each account utters its own choice.
It is a conventional rule that the bigger the data set the more reliable the conclusion drawn. However it reveals that an enormous deal of concern has to be in use when merging small data sets into a large one which results sometimes that the large data set are closely the opposite conclusion from the smaller data.
Generally, Simpson’s paradox is not really a problem if prior to the experiment variables are being identified and controlled.
Also called: Yule–Simpson effect
• Prosecutor’s fallacy
• Ecological fallacy