Math Vacation: Estimating Populations by the Capture-Recapture Method

Wednesday, February 2, 2022

Estimating Populations by the Capture-Recapture Method

The capture-recapture method is used to estimate population sizes when it is impractical to physically count each member of the population. For instance, estimating the population of a species in a large National Park. The method is also known as the capture-mark-recapture method. I learned about this method in Kit Yates' book - see Math Vacation: Book Review: The Math of Life & Death by Kit Yates (jamesmacmath.blogspot.com). This method may not be practical for some species. For instance, consider taking a census of worms where marking them may be difficult. A worm farm census was completed by a different sampling method found in this prior post: worm census.

The method is simple. First, a researcher captures alive a sample of the target species. Each captured animal is marked using a tag, a collar or other method, and then is released. At a future date, the researcher returns to the same area and again captures the target species. During this visit, the researcher counts how many animals in the second sample had also been marked in the first sample (repeated captures).

The ratio of the animals marked (m) in the first visit to the total population (N) should be approximately equal to the ratio of repeated captures (r) to the size of the second sample (n).

Therefore, N can be approximated by:

The method is subject to a number of assumptions. First, one needs to assume that the animals released after the first sample have sufficient time to mix with the full herd. Also, one has to assume the samples are representative of the herd. For example, if the captured animals just happen to be the animals that are most easily captured, then those specific animals are more likely to be recaptured and the ratio above would not be true.

I tried a tabletop experiment using acorns. I picked up a large number of acorns fallen from a tree in my yard. They are pictured below.

I took 30 of the acorns and marked them by cutting off a small portion of the acorns (m=30) - unmarked and marked specimens shown below.

Unmarked acorns:

Marked acorns:

I returned the 30 marked acorns to the original group and scrambled all of them. Then, I scooped out a cup of the acorns to represent my second sample. I counted the total number of the sample (n=47) and the number found within this sample that were marked (they were also from the original 30). In this case there were 7 repeats. The estimated population of all acorns is therefore:

I repeated this twice. After returning the prior sample and remixing, I had a scoop with n=42 and r=4 (repeats). This estimated N to be approximately 315. Returning the sample and remixing, the next sample had n=44 and r=3 estimating N to be 440.

The actual count of the full group of acorns was 300. The middle estimate was very close while the other two yielded estimates that were about 33% lower and higher than the actual count. For samples (m or n) that are roughly 10% of the size of the population (N), one should expect errors of this magnitude.

In understanding possible errors, I like looking at extreme cases. Consider the case when in the first sample, the researcher actually captures the full population (so m=N). When returning, no matter what new sample size (n) is captured, all the second sample will be marked so n=r.

In the equation above, if n=r then the estimate of N will always be m, which is the actual population N because we stated for this extreme case the researcher marked the full population. Another extreme is when N is very large compared to the sample size m. In this case it is possible that the researcher finds no marked (r) animals in the second sample (m). This would lead to N being undefined as we have equation with division by zero. The researcher would recognize this error and return to increase the second sample until at least one repeat was found. This would result in N being very large. At this larger sample size, m, the researcher might find 2 repeats. The new estimate of N would be half of the original so a small difference of 1 repeat in the sample leads to 50% to 100% change in the estimate of N.

Some resources on the capture-recapture method are given below:

Mark and recapture - Wikipedia

A Review of Capture-recapture Methods and Its Possibilities in Ophthalmology and Vision Sciences (tandfonline.com)

Spatial Capture-Recapture Models to Estimate Abundance and Density of Animal Populations | U.S. Geological Survey (usgs.gov)

https://www.newscientist.com/article/mg26635490-700-the-maths-hack-that-can-help-you-count-things/