Eighty years ago, on December 7, 1941, the once “Great Empire of Japan” attacked, without warning and in the middle of a negotiation, the most important United States naval base in the Pacific: Pearl Harbor. A day later President Roosevelt declared war against the Japanese Empire. This humiliation not only changed the course of the war but also that of diagnostic tests.
How is it possible that they did not detect the attack? One of the hypotheses is the lack of statistical tools that would allow them to distinguish a real threat from a false alarm (caused by radar “noise”). For this reason, the US began to work on a binary classifier to discern them, that is, a method that assigns to the signals the probability of being a real attack, a value between 0 and 1. In addition, they also needed a tool to decide the threshold of probability from which a signal would be classified as a real attack or a false threat. If they had wanted to be conservative (setting a high threshold, of 0.8, for example), very few signals would have been labeled as real attack, whereas if they had wanted to be alarmist (with a low threshold, of 0.2), most signals would have been considered an actual attack, whether or not they were.
There are infinite thresholds and therefore infinite options. To visualize all of them, the US Army developed what is known as Receiver Operating Characteristic Curve or ROC curve. This tool makes it possible to relate, for each specific classifier, the proportion of false threats classified as real (false positive rate) with the sensitivity or true positive rate (proportion of real attacks classified as such), for each of the possible thresholds.
In the example above illustrated in the image, the red threshold classifies as positive all real attacks (100% sensitivity) and 24 of the 35 false threats (false positive rate of 68.6%), while the orange threshold classifies as positive only seven of the 15 real attacks (46.7%) and eight of the 35 false threats (22.9%). Both thresholds represent two particular points on the ROC curve, the points (0.686, 1) and (0.229, 0.467). Depending on the rate of true and false positives that you are willing to assume, you will select one point or another on the ROC curve, associated with a specific threshold.
Since the ROC curve allows us to visualize for the infinite thresholds how the infinite ways of classifying work, it is an effective tool for comparing different methods. It is enough to observe the area enclosed under the curve obtained with each one. If, for all thresholds, a method is able to perfectly distinguish real attacks from false threats, the area under its curve will be 1. As the area shrinks, the classifier reduces its ability to distinguish between real attacks and false threats. (For example, an area of 0.5 would represent the area of a method that classifies completely randomly).
This idea was used years later by radiologist Lee B. Lusted to study the diagnostic power of radiography in the detection of pulmonary tuberculosis. Lusted recruited 10 radiophysicists in 1971 and provided them with chest X-rays of 14,000 patients. Its mission was to decide whether patients had tuberculosis or not, such that the probability of being ill was calculated as the proportion of experts who agreed on it. In a few cases all the experts agreed, being obvious that they were healthy or sick, but in most of them there was no unanimity. What proportion of opinions were necessary to diagnose the disease? Was half enough or was a greater consensus necessary? Lusted rescued the ROC curve to visualize the different options based on the different thresholds, and thus was able to obtain the global predictive power of radiography as a test and compare it with other techniques.
Curiously, at the same time, in Canada, the first pregnancy test, devised by Margaret Crane, was marketed, this time using the levels of the hormone chorionic gonadotropin in urine samples as a criterion. From what values should a result be considered positive (pregnancy)? With levels greater than 30 mIU / ml? Greater than 50? Of the infinite thresholds that can be visualized with the ROC curve, the one that allows the best classification between pregnant and non-pregnant is normally between 20 and 35 mUl / ml. With it, the diagnostic test has a sensitivity and specificity (defined as one minus the false positive rate) close to 100%.
The ROC curve has also been key in those already known as PCR (qRT-PCR) for the detection of SARS-CoV-2. In these tests, the genetic material of the virus (RNA) is extracted, it is converted into DNA (reverse transcription) and it is amplified by carrying out replication cycles, and then a fluorescent probe is added. The more cycles, the more copies are obtained and the more fluorescence is emitted; it is this fluorescence that determines whether the result is positive or negative. For each fluorescence threshold, a rate of true and false positives is obtained (patients with positive and negative PCR, respectively). In this way, it is possible to compare tests from different manufacturers (based on the area under the ROC curve), and choose, for each one, the point of the curve that provides the true and false positive rates that is considered appropriate. The tests marketed in Spain have a very high specificity (99.9%, 0.1% false positives) and a high sensitivity (80-95%).
All this, the consequence of a battle that changed the course of biostatistics.
Javier Álvarez Liebana is a popularizer (@dadosdelaplace), doctor in Statistics and Assistant Professor Doctor in the Complutense University of Madrid
Joaquin Martinez Minaya He is a Doctor of Statistics and Assistant Professor of the Polytechnic University of Valencia
Coffee and theorems is a section dedicated to mathematics and the environment in which it is created, coordinated by the Institute of Mathematical Sciences (ICMAT), in which the researchers and members of the center describe the latest advances in this discipline, share meeting points between the mathematics and other social and cultural expressions and remember those who marked its development and knew how to transform coffee into theorems. The name evokes the definition of the Hungarian mathematician Alfred Rényi: “A mathematician is a machine that transforms coffee into theorems.”
Editing and coordination: Agate A. Rudder G Longoria (ICMAT).
You can follow MATTER on Facebook, Twitter e Instagram, or sign up here to receive our weekly newsletter.