Of Mice and Men and Round Numbers

Of Mice and Men and Round Numbers
The Garden of Earthly Delights (central panel, detail) Hieronymus Bosch, c. 1490–1510

What does it mean for a chemical to be safe?

The chemical industry as we know it is largely a child of the Second World War. The capacities built for the war effort — synthetic organic chemistry at industrial scale, nerve-agent research having common origin with pesticide research, polymer and solvent production for materiel — went looking for civilian applications. In the decades since, well over a hundred thousand new substances have been introduced into commerce.

These chemicals are omnipresent in our lives. They go onto fields and into our food, into furniture, packaging, plumbing and clothing, and eventually into rivers, groundwater, blood, and breastmilk. When the CDC samples the blood and urine of a representative slice of the population, it routinely detects hundreds of synthetic chemicals; for many of them, nearly everyone tested carries a measurable amount.

One might assume that some serious effort has gone into checking whether these chemicals threaten human health. In reality, the National Research Council found decades ago that of the tens of thousands of chemicals in commerce, only a handful had been seriously tested; for the great majority, the data essential to a health-hazard assessment were simply lacking. Since then, the situation has not dramatically changed.

Even those existing evaluations are mostly indirect. Deliberately dosing human beings with substances we suspect might harm them is prohibited by the Nuremberg Code of 1947. What we call "human evidence" in toxicology therefore comes mostly from accidents, occupational exposures, and natural experiments — meaning populations harmed first and studied after.

Given this, how do we decide whether a chemical with a new use case is safe? What data do regulators and risk assessors use in their deliberations?

The vast majority of what is officially known about a chemical's toxicity comes from animal, usually rodent, experiments. Depending on the study, exposure may be acute (a single dose or short window), subchronic, or chronic (spread out in time). The endpoints examined range from acute lethality to reproductive, developmental, and carcinogenic effects. At the end of an experiment, the animals' organs are harvested and studied for tumors and other anomalies. By comparison with a control group, the No Observed Adverse Effect Level (NOAEL) is derived, which is (at least in theory) the dose that does not cause any harm, according to a given experiment and a given measured endpoint.

But how can an experiment on a handful of rodents license a decision about exposing billions of people? How does one extrapolate across the species barrier, and from a small sample to a big fraction of the entire human population? 

You might think there was some sophisticated apparatus that bridges the gap...

The answer is almost insultingly simple. We divide by a hundred. This essay is about that one hundred.

Pasteur Institute, rats for experimentation (1913))

A three-page editorial: 10x10=100

The factor of 100 can be traced to a three-page editorial from 1954 by Arnold Lehman and O. Garth Fitzhugh, who were both in the U.S. FDA's Division of Pharmacology. They published "100-Fold Margin of Safety" in the Quarterly Bulletin of the Association of Food and Drug Officials. It has no references; it reads like a memo.

The piece proposes that a chemical's chronic no-effect dose in animals, divided by 100, is a reasonable approximation of a no-effect dose in humans.

The first 10 comes from interspecies variability. The editorial offers two empirical comparisons:

"Man can ingest 1 part per million of fluorine in his daily diet without harmful effects, whereas the rat can tolerate about 10 parts per million. In experiments on subacute toxicity man begins to show signs of intolerance to arsenic at about 30 parts per million in the diet, whereas the dog can tolerate 127 parts per million. In other words, man is about 10 times as sensitive to poisons as the rat, and somewhat more than 4 times as sensitive as the dog."

The second 10 comes from intraspecies variability: 

"It has been estimated that a sick individual may be as much as 10 times more susceptible to toxic substances than an individual in good health."

Combined, they give a factor of 100.

This number has since spread across regulatory environments and frameworks. It is also present in Monsanto's assessments of glyphosate safety that we have written about previously.

Because there are no references in this memo, one has to dig to figure out where these suspiciously round numbers came from.

The first ten: animal-to-human factor 

The 1954 paper's empirical justification rests on two chemicals – arsenic and fluorine. The choices have not aged well. Both have since become exhibits in the case against the framework they were supposed to underwrite.

Arsenic

The 127 ppm dog figure could have originated in a paper from their own division. In 1938, Calvery, Laug & Morris published "The chronic effects on dogs of feeding diets containing lead acetate, lead arsenate, and arsenic trioxide in varying concentrations," in the Journal of Pharmacology and Experimental Therapeutics. Herbert Calvery had been the head of FDA's Division of Pharmacology; Lehman succeeded him in 1946. The 1938 paper was an in-house dog exposure study from the same lab that wrote the 1954 memo. Its purpose, originally, was to evaluate the safety of lead arsenate pesticide residues on apples.

The 1938 paper clearly outlines the importance of acute and chronic toxicity: 

"It is clearly recognized that the question is not one of acute toxicity, but one concerning chronic toxicity. The fact that both forms of toxicity can be caused by both arsenic and lead was recognized even in ancient times."

This distinction will turn out to be important.

The 30 ppm human figure appears to come from a pharmacology manual. Torald Sollmann's A Manual of Pharmacology, the standard thousand-page reference text of the era, has a chapter on inorganic arsenic. It describes a dosing strategy: begin at 5 mg of arsenic trioxide three times daily and increase the dose "until some local manifestation of the arsenic appears – usually diarrhea, colicky pains or conjunctivitis or swelling of the eyelids." These were therapeutic dosages, and the treatment was meant to produce those side effects, at which point the convention was to retreat, pause, and resume. The number was never tied to lifelong chronic exposure of a healthy population, but rather to the treatment of a range of conditions – asthma, chorea, psoriasis, syphilis, the leukemias. Pushed to that intolerance point, the daily dose reaches the low tens of milligrams – and at roughly 40 mg of arsenic trioxide a day, spread over three pounds of food, it works out to about 29 ppm, broadly matching the figure in the 1954 memo.

What have we learned since then about arsenic exposure?

Between 1958 and 1970, the city of Antofagasta in northern Chile switched its municipal water supply to a source containing about 0.87 ppm of arsenic (roughly eighty-seven times today's WHO drinking-water guideline). Approximately 130,000 people drank from that supply for twelve years before a treatment plant was installed. Cancer mortality in Antofagasta peaked roughly two to four decades after the arsenic exposure ended. Children exposed in utero or in early childhood developed cancers in their thirties, forties, and fifties at rates several times higher than otherwise comparable populations. Cardiovascular disease, peripheral vascular disease, and reduced birth weight followed the same exposure curve, on the same multi-decade timescale. The doses that produced delayed lethal cancers were doses that would have shown no "signs of intolerance" at the time. The Lehman–Fitzhugh framework, applied to Antofagasta's water before 1970, would have called it safe.

In Bangladesh, between 35 and 77 million people have been drinking groundwater contaminated by naturally occurring arsenic since the 1990s, with skin lesions, cardiovascular disease, neuropathy, diabetes, and a suite of cancers appearing on the same multi-decade lag. In Taiwan's southwestern coastal villages, artesian wells produced "blackfoot disease" – peripheral vascular gangrene – from the 1950s on. In Japan, the 1955 Morinaga dry-milk incident exposed more than twelve thousand infants to arsenic-contaminated milk powder; long-term follow-up over six decades shows elevated cancer mortality, neuropsychological deficits, and shorter stature. In Manchester and Salford in 1900, arsenic-contaminated sulfuric acid used to make brewing sugar poisoned roughly six thousand drinkers, with chronic peripheral neuritis as the dominant lasting effect.

Arsenic is an acute poison, but it is also a chronic one, and its long-term toxic effects bear almost no relation to the "signs of intolerance" the 1954 paper cited.

Even if the absolute numbers we’ve been relying on for safe exposure are wrong, the ratio between species might still be accurate. Or close enough. Here, too, arsenic is instructive – it shows how mechanistically different the reactions of different species can be.

In the 1938 study, it was noted that "the storage or retention of arsenic by the dogs... was much less than that of the rats." The reason has since become clear. The liver methylates arsenic – attaches methyl groups to convert it into mono- and dimethylated forms – and these are excreted in the urine over a few days. This works much the same in humans, mice, and dogs. Rats, however, have a kind of red-cell trap. As a result, the arsenic is not cleared in the urine within days but accumulates, bound inside the blood cells, for the lifespan of the cell. So rats are more sensitive than dogs or humans.

Two lessons can be extracted from arsenic.

First, species can react in very different, and unexpected, ways and a single interspecies ratio cannot capture these conceptually different mechanistic reactions.

Second, chronic exposure is different from acute exposure. But we already knew that from "ancient times".

Fluoride

Our second example is fluoride, much in the news of late. 

Lehman and Fitzhugh claim that humans tolerate 1 ppm in the diet and the rat tolerates 10, a factor of 10.

From "Classification of Mottled Enamel Diagnosis" by H. Trendley Dean (1962)

The 1 ppm human figure comes, almost certainly, from H. Trendley Dean's epidemiology in the late 1930s and early 1940s. Dean identified 1 ppm not as a toxic threshold but as an optimum – the water concentration at which caries protection was nearly maximal while dental fluorosis (i.e. adverse effects from excess fluoride exposure) stayed, in his phrase, “of no public health significance”. It was a public-health compromise between benefit and harm, the point where the two curves crossed.

Figure showing the onset of mottled enamel condition at around 1 ppm. From "Chronic Endemic Dental Fluorosis (Mottled Enamel)" by H. Trendley Dean (1962)

The 10 ppm rat figure probably traces to Floyd DeEds and John O. Thomas's 1934 paper "Comparative chronic toxicities of fluorine compounds," in the Proceedings of the Society for Experimental Biology and Medicine. DeEds and Thomas reported that fluoride produced chronic intoxication in rats – bleached incisors, the early sign of dental fluorosis – at "as low as 14 to 16 parts per million" of dietary fluorine.

What is noticeable here is that the dose for humans is not of toxicity onset, but rather the optimal deduced dose from Dean’s cost-benefit estimate. And the toxicity dose for rats is just about one order of magnitude away. This is notably a narrow window between benefit and harm. It has been recognized by many international organizations, like WHO and the European Commission's Scientific Committee on Health and Environmental Risks (EU SCHER), who uses "narrow margin" language. Today, the European Food Safety Authority’s (EFSA) caries-preventive dose is ~0.05–0.07 mg/kg/day which is close to the fluorosis-causing dose >0.1 mg/kg/day — a margin of well under a factor of 2.

The United States has fluoridated public water near Dean's 1 ppm for eighty years, on the premise that mild mottling at that level was cosmetic, not a health matter. Over that time, dental fluorosis in twelve-to-fifteen-year-olds climbed from 23 percent in the late 1980s to 65 percent by 2011–12, with the moderate-to-severe category rising from about 1 percent to 30 percent. The endpoint Dean used to define "no public health significance" is now significant for the majority of American teenagers. And that is only the mild form: endemic skeletal fluorosis, the crippling kind (bone deformity, joint stiffness, peripheral neuropathy) affects tens of millions in India, China, and the East African Rift, appearing in field studies at drinking-water levels starting around 1-3 ppm.

Dental health is only one of many measurable endpoints. Fluoride's effects are not confined to tooth enamel. Prospective birth cohorts in Mexico and Canada link prenatal fluoride to lower childhood IQ at exposures near the 0.7 mg/L fluoridation target; the U.S. National Toxicology Program's 2024 monograph concluded with "moderate confidence" that fluoride exposures above the WHO's 1.5 mg/L guideline are "consistently associated with lower IQ in children"; and a 2025 meta-analysis of 74 studies found an inverse association robust to the highest-quality subset. Nor has the question been quietly settled: a federal court found in 2024 that fluoridation at 0.7 mg/L "poses an unreasonable risk of reduced IQ in children" and ordered EPA rulemaking, a ruling a Ninth Circuit panel vacated in 2026 on procedural grounds without touching the science.

What about this neurological endpoint in rodents?

Here the rat is uninformative. The National Toxicology Program's study found no effect on learning or memory in rats at 10–20 ppm in drinking water; the same paper's literature review found deficits only at ≥100 ppm, and those were likely confounded by general toxicity. Taken at face value, that gap reads as an interspecies uncertainty of a factor of a hundred. But the comparison is obviously incoherent: a population-level downward shift in human IQ is not the same kind of thing as neurotoxicity in a rat dosed to near systemic poisoning.

On one endpoint the interspecies difference can be around 10, while on another endpoint, arguably more important, that gap may be both wider and harder to define and measure.

The second ten: human-variability factor

In the 1954 paper, the human-variability factor appears in a single sentence. A sick individual, the authors write, "may be as much as 10 times more susceptible" to toxic substances than a healthy one. Again, no source.

Variability between people has been acknowledged for a long time. Even in their fluoride work in 1943, Dean & Arnold wrote that the “same amount of fluorine that causes a mild toxic reaction in one individual may cause a severe reaction in another”. That variability can also increase due to cooking patterns and other behaviors. Yet another unaccounted factor in the “safety factor”.

Different values have been suggested and used for this safety (or as it is now more frequently called “uncertainty”) factor. 

In some particular cases, very specific values, such as 42, have been suggested (no relation to Hitchhiker's Guide to the Galaxy). Defining a factor means assuming some underlying smooth distribution of responses to all toxins across the population. In this particular case, 42 is taken to account for 99% of the variation. But what about the 1% at the tails of the distribution? When applied to billions of people, this is still a sizable number of individuals.

As is increasingly well established, human variability is not a smooth distribution that can be described by a mean and dispersion. There are at least several increasingly well-understood phenomena (see below for specific examples), none of which reduces cleanly to a multiplier:

  • Genetics. Even among healthy people, an ordinary dose turns into a harmful one for someone carrying a particular allele.
  • Biological sex. Sex influences the genetic architecture of nearly every human trait and disease. Clinical dosing calibrated largely on male physiology produced dosing errors and adverse events that, in some cases, prompted market withdrawals only decades later.
  • Life stage. An infant, a pregnant adult, and an eighty-year-old do not share one metabolism or one susceptibility.
  • Accumulated burden. Human bodies carry mixtures of chemical and non-chemical stressors, all of which affect an individual's response to an exposure.

And these factors interact non-linearly, compounding one another in ways no single multiplier can capture.

Why human variability resists a single factor

Below are a few examples that show why one-size-fits-all exposure calculations built on fixed, rounded factors cannot capture the diversity of the human population.

Genetics. Even among healthy people, an inherited difference can turn a dietary or a therapeutic exposure into a source of harm.

  • G6PD deficiency — affects hundreds of millions, concentrated in Mediterranean, African, Middle Eastern, and South Asian populations. Eating fava beans, harmless for others, can trigger acute hemolytic anemia. Severity tracks the specific deficiency allele.
  • Hereditary fructose intolerance — repeated exposure to fructose, readily metabolized by the rest of the population, causes progressive liver and kidney damage.
  • Hereditary hemochromatosis — people who inherit two copies of the variant accumulate iron from an ordinary diet until, untreated, it damages liver, heart, and pancreas over decades; the same diet leaves others unaffected.
  • ALDH2 variant — common in East Asian populations; slows clearance of acetaldehyde, alcohol's toxic and carcinogenic intermediate, substantially raising esophageal cancer risk under chronic exposure that those with normal ALDH2 activity tolerate.
  • Warfarin sensitivity — variants in VKORC1 and CYP2C9, whose frequencies differ sharply across ancestral populations, can make a standard anticoagulant dose ineffective for one person and a bleeding hazard for another.

Biological sex. Biomedicine has long treated women as smaller men. However, we increasingly learn that sex differences in physiology, metabolism, and the genetic architecture of complex traits are pervasive, and none reduces to a difference in body size. From 1977 to 1993 the FDA excluded women of childbearing potential from early-phase trials; the resulting body of approved drugs and dosing was calibrated largely to male physiology, and the miscalibrations surfaced only later.

  • Zolpidem — women clear the sleep drug more slowly. In 2013, more than two decades after approval, the FDA halved their recommended dose because morning-after blood levels were high enough in women to impair driving.
  • Sex-skewed withdrawals — of ten drugs withdrawn from the US market between 1997 and 2000, eight posed greater risks to women, several via a cardiac arrhythmia (torsades de pointes) to which female physiology is more susceptible.
  • Alcohol — adjusted for body weight, it produces higher blood concentrations in women and drives liver disease at lower cumulative exposures.

Life stage. The same external dose meets a physiologically different organism at each point in a lifespan — different absorption, different clearance, different vulnerability.

  • Thalidomide — marketed as a safe sedative and given for morning sickness; during a narrow window of early gestation it caused severe limb and organ malformations, while producing no comparable malformations in the adult taking it. Susceptibility was defined almost entirely by developmental timing.
  • Lead — children absorb several times the fraction of ingested lead that adults do, and the developing nervous system is far more vulnerable; blood-lead levels that are subclinical in adults track measurable IQ loss in children, and health agencies now recognize no safe threshold in childhood.
  • Chloramphenicol ("gray baby syndrome") — newborns lack the mature liver enzymes (glucuronidation) needed to clear it, so a dose tolerated by an older child or adult accumulates to cardiovascular collapse.
  • The aging body — declining kidney and liver clearance, a higher fat-to-water ratio that prolongs the action of fat-soluble drugs, and polypharmacy combine so that standard adult doses can become toxic. The Beers Criteria exist precisely to flag medications whose risks outweigh their benefits in older adults — drugs to avoid, use cautiously, or dose-adjust.

Accumulated burden. Risk assessment evaluates one compound at a time; real bodies carry mixtures of chemicals and non-chemical stressors at once, and these interact.

  • Mixtures below individual no-effect levels — combinations of chemicals, each held below its own no-effect level, can together produce a clear combined effect, shown in vitro for weak estrogenic compounds and in vivo for estrogenic and thyroid-disrupting mixtures in rodents. The single-compound no-effect level — the very quantity the safety factor is applied to — does not by itself predict the mixture.
  • Potentiation — two compounds can be far more toxic together than the sum of their separate toxicities when one disables the body's means of detoxifying the other. The classic case is the insecticide malathion, sharply potentiated by a second organophosphate (e.g. EPN) that inhibits the carboxylesterases that would otherwise break malathion down. The original study is co-authored by the same Fitzhugh in 1957.
  • Nutritional status — a non-chemical condition that changes chemical dose: iron- and calcium-deficient children absorb more ingested lead than iron-replete children, so the same environmental lead produces a higher internal dose in the malnourished child.
  • The regulatory gap — regulators have partly acknowledged this. The US Food Quality Protection Act (1996) requires EPA to assess the cumulative effect of pesticides sharing a "common mechanism of toxicity." But the requirement stops at the edge of a known shared mechanism, while biomonitoring detects hundreds of unrelated chemicals in the same person simultaneously — the real cumulative exposure is never assessed as a whole.

Despite our new mechanistic understanding of biological processes across chemicals and across human conditions, the “historical” factor of 100 is still used.

In assessments of glyphosate, for example, this factor of 100 – there called the uncertainty factor (UF) – is everywhere, often with the same 10x10 justification:

And the list goes on.

In their own words

We doubt that Lehman and Fitzhugh acted with ill intent. For what was known at the time, a factor of one hundred could have been a reasonable first  approximation. The authors were candid in conceding that there were "no scientific or mathematical means by which we can arrive at an absolute value." Their reasoning was aimed at striking a balance: the factor is, in their words, "high enough to reduce the hazard of food additives to a minimum and at the same time low enough to allow the use of some chemicals which are necessary in food production or processing."

That last clause reflects the environment these scientists were reacting to – one in which chemicals had  already been introduced at scale, and in which the assumption that they were a necessity, prior to rigorous scientific assessment of their risks, had been widely internalized. The authors’ role, therefore, was not to interrogate that premise but to keep pace with it: to evaluate, as best they could, a food supply that was already being modified with chemicals faster than those could be tested.

Twenty-six years later, when the FDA's Oral History Project brought the retired Division of Pharmacology staff back together, the same men were more forthcoming than the 1954 note had been. The publicly available transcript of the meeting shows two things. First, it shows how these scientists lived through the suppression of their arsenic research, which we have mentioned above. Edwin Laug, co-author on the 1938 study quoted above, described it as "pressure from the growers": after a shipment of apples was seized for excessive chemical residues, the orchardist "complained to his Congressman," who then wrote into the agency's appropriation that none of its funds could be used "for the study of toxicity of lead and arsenic." The laboratory work, Laug recalled, stopped and the long-term lead studies and two-year rat experiments were "all terminated" partway through their course. According to the footnote in the paper: “as a result of this Congressional action, all of our animals were killed and all laboratory experiments ended by June 30, 1937”. The results survived at all only because, as he put it, "we did salvage enough of the data to be able to publish it later." The arsenic evidence whose weaknesses we noted earlier was, in addition to that, actively suppressed.

As for the factor of one hundred, Fitzhugh recalled that it "was very much controversial as far as the industry was concerned." Asked by the interviewer how that opposition had shown itself – whether manufacturers came in "pounding on your desks" or attacked the agency in the trade press – Fitzhugh explained that requiring "such a large factor of safety" in food had "many times seemed to be unreasonable" to industry, which "raised questions about it, of course, at meetings of all kinds."

As for how the number itself had been reached, Bert Vos remarked that one hundred "seemed so extravagant that you couldn't be wrong by worse than that." Geoffrey Woodard was blunter still: "I think we maybe got the figure first, then justified it secondly, didn't we?" Fitzhugh, the surviving co-author, agreed that this was "part of it."

And on what the figure had become in the decades since, Woodard said it was by now "currently gospel in toxicology... just like reading straight out of the scriptures."

FDA's Oral History Project (1980)