When one is dealing with a disease that is at a low frequency in the population and one has a large number of people to test, it is natural to do group testing. A fixed number of samples, say 10, are mixed together. If the combined sample is negative, we know all the individuals are. But if a group tests positive then all the samples in the group have to be retested individually.
If the groups are too small then not much work is saved. If the groups are too large then there are too many positive group tests. To find the optimal group size, suppose there are a total of N individuals, the group size is k, and 1% of the population has the disease. The number of group tests that must be performed is N/k. The probability a group tests positive is k/100. If this happens then we need k more tests. Thus we want to minimize
(N/k)( 1 + k2/100) = N/k + Nk/100
Differentiating we want –N/k2 + N/100=0 or k = 10. In the concrete case N = 1000, the number of tests is 200.
Note: the probability a group test is positive is p = 1 – (1 – 1/100)k but this makes the optimization very messy. When k=10, 1 + kp = 1.956, so the answer does not change by very much.
Recent work reported on in Nature on July 10, 2020 shows that the number of tests needed can be reduced substantially if the individuals are divided into groups in two different ways for group testing before one has to begin testing individuals. To visualize the set-up consider a k by k matrix with one individual in each cell. We will group test the rows and group test the columns . An individual who tests negative in either test can be eliminated. The number of k by k squares is N/k2. For each square there are 2k tests that are always performed. Each of the k2 individuals in the square have their group test positive twice with probability (k/100)2. These events are NOT independent, but that does not matter in computing the expected number of tests
(N/ k2)(2k + k4/10,000) = 2N/k + N k2/10,000
Differentiating we want –2N/k2 + 2Nk/10,000 = 0 or k = (10,000)1/3 = 21.54. In the concrete case N=1000 the expected number of tests is 139.
Practical Considerations:
One could do fewer tests by eliminating the negative rows before testing the columns, but the algorithm used here allows all the tests to be done at once, avoiding the need to wait for the first round results to come back before the second round is done.
Larger group sizes will make it harder to detect the virus if only one individual in the group. The Nature article, Sigrum Smola of the Saarland University Medical Center in Homburg has been is quoted as saying he doesn’t recommend grouping more than 30 individuals in one test. Others claim that it is possible to identify the virus when there is one positive individual out of 100.
Ignoring the extra work in creating the group samples, the method described above reduces the cost of test by 86%. The price of $9 per test quoted in the article would be reduced to $1.26, so this could save a considerable amount of money for a university that has to test 6000 undergraduates several times in one semester.
In May, officials in Wuhan used a method of this type to test 2.3 million samples in two weeks.
References
Mutesa, L et al (2020) A strategy for finding people infected with SARS-CoV-2: optimizing pooled testing at low prevalence arXiv: 2004.14934
Malliapaty, Smriti (2020) The mathematical strategy that could transform coronavirus testing. Nature News July 10. https://www-nature-com/articles/d41586-020-02053-6