What the Carbon Markets Couldn't Hold
I.
For nearly two decades, voluntary carbon markets were treated as the working answer. Funds built portfolios around them. Foundations channeled grants through them. Corporations made net-zero commitments backed by them. Governments referenced them in policy as evidence that the market mechanism could handle what regulation had not. People who came up in environmental finance during this period learned to read the apparatus (additionality assessments, permanence projections, leakage calculations, registry balances) the way an earlier generation had learned to read balance sheets, and with roughly the same confidence that the numbers meant what they said.
Then, over a relatively short period, the apparatus stopped connecting to outcomes. Investigative journalism, specifically a joint investigation by The Guardian, Die Zeit, and SourceMaterial published in January 2023, found that more than ninety percent of the Verra rainforest carbon offsets analyzed had produced no meaningful emissions reductions. Peer-reviewed studies found that carbon sequestration had been systematically overestimated, in some cases by multiples of what the verification protocol reported. Field measurements found that permanence claims were being undermined by fires, by changes in land use, by political shifts the original analyses hadn’t priced. The people who had built careers around this apparatus, who had spent years reading the numbers, who had structured funds around the commitments those numbers implied, found themselves holding tools that no longer connected to outcomes. They could see the mismatch. They couldn’t yet say what to do about it.
This is not primarily an essay about what went wrong with carbon markets. The failures are documented, the investigations published, the papers peer-reviewed. What interests me more is the experience those failures left behind: watching a trusted apparatus lose its connection to outcomes, and not yet having language for why. That is what this essay traces. A framework that serious people built in good faith, applied with care and rigor, stops producing the outcomes it was designed to produce. Not because the people were careless, not because the intent was wrong, but because assumptions baked into the design turned out not to hold under the conditions the framework was eventually asked to work in. The carbon markets are the case. The pattern is what travels.
II.
The original design intent was serious, and the problem being solved was real. The theoretical case for pricing carbon emissions was well-grounded: if atmospheric carbon imposes costs that markets don’t price, a mechanism that prices them is a rational response. The cap-and-trade approach had produced measurable results when applied to sulfur dioxide under the 1990 Clean Air Act Amendments, reducing acid rain more cost-effectively than prior regulatory approaches had achieved. The Kyoto Protocol’s Clean Development Mechanism, established in 1997, applied the same logic to greenhouse gases, allowing developed countries to meet reduction commitments partly by financing mitigation projects in the developing world. The voluntary carbon market grew alongside compliance markets (the EU Emissions Trading System, California’s cap-and-trade program) as corporations and institutions sought to address emissions that existing regulatory frameworks hadn’t yet reached.
By 2021 the voluntary carbon market was a roughly two-billion-dollar global system. Verra’s Verified Carbon Standard was the dominant registry. Projects ranged from avoided deforestation in the Amazon to cookstove programs in sub-Saharan Africa to industrial gas capture in South Asia. Each project issued credits representing a metric ton of carbon dioxide equivalent avoided or removed; buyers purchased those credits against their own emissions, on the shared understanding that the credit represented a real-world reduction that would not have occurred otherwise. Reasonable people, looking at the scale of the system and the rigor of the certification apparatus, concluded this was a workable mechanism and invested accordingly. That conclusion was not unreasonable given the information available at the time.
This essay is primarily concerned with the voluntary market: the self-certified, largely unregulated system that grew alongside the compliance markets. The EU ETS and California’s compliance program operate under different structural conditions, including mandatory participation, regulatory enforcement, and iterative government reform. They have different track records and have been substantially revised in response to early design failures. The voluntary market, whose structural properties and failure modes are the subject of what follows, is a distinct mechanism with distinct analytical problems.
III.
The failures, when they came, were not random. The more carefully one looks at the mechanism, the more predictable each failure appears from the structure itself. Not execution failures, not bad actors making unusually bad choices, but structural failures built into the design whether or not anyone saw them coming.
There are four of them, and they deserve to be named carefully.
Additionality. A carbon credit is supposed to represent an emissions reduction that would not have occurred in the absence of the market, such as a forest protected that would otherwise have been cleared, or a stove installed that would not otherwise have been affordable. The technical term for this condition is additionality: the credit is only real if the reduction was additional to what would have happened anyway.
The structural problem is that additionality is a counterfactual. You cannot directly observe what would have happened in the world without the project; you can only model it. Those models, developed by verification consultants retained and paid by project developers and submitted to registries for approval, required assumptions about baseline deforestation rates, alternative land use, and whether the financial incentive was actually necessary to change behavior. The relationship between the party being audited and the party conducting the audit created predictable selection pressure on the model’s conclusions. The January 2023 investigation by The Guardian, Die Zeit, and SourceMaterial analyzed Verra’s REDD+ tropical forest protection credits, the most scrutinized and most contested category within the voluntary market, and found that more than ninety percent of those credits produced no meaningful emissions reductions. Credits issued for other project types, including certain industrial gas and methane capture projects, have different verification architectures and different track records. The structural problem in the additionality mechanism (the auditor-paid-by-audited relationship generating predictable overestimation) shows up across categories, but its severity varies with how directly counterfactual the baseline modeling must be. Forest protection is the category where the counterfactual is hardest to establish and where the gap between claimed and actual additionality has been most thoroughly documented.
Permanence. A credit representing a ton of carbon sequestered in a forest assumes that carbon stays there over the period for which the credit is claimed, typically decades. The structural problem is that forest carbon is not permanent. Fire, disease, land-use change, and political instability can release stored carbon quickly. The California compliance market, which issued credits against forest carbon stocks, discovered this explicitly in 2021, when wildfires burned through forest areas covered by the market’s protocol. The California Air Resources Board maintained a buffer pool as a reserve against precisely this contingency, but the model underestimated the rate at which climate-driven fire conditions would outpace the permanence assumptions built into the protocol. The buffer pool was depleted to levels that called the market’s integrity into question. The risk had been anticipated; it had been calibrated to a prior rate of change.
Leakage. Protecting one area of forest from logging should not simply shift the logging to an adjacent area. The technical term for this displacement is leakage, and the accounting systems were supposed to measure and adjust for it. The structural problem is that leakage is genuinely difficult to measure across boundaries and across time. The convention that developed in most verification protocols was to measure leakage within the project boundary and in a nearby reference region, then apply a deduction factor. Research published in the Proceedings of the National Academy of Sciences by West and colleagues in 2020, examining REDD+ projects in the Brazilian Amazon, found that additionality had been substantially overestimated, with leakage at larger geographic scales than the protocols measured as a significant contributor. The specific magnitude of the overestimation has been contested in subsequent exchanges, with Verra and independent researchers disputing aspects of the methodology. But the directional finding has proven durable across multiple independent assessments: leakage measured at the scale of displacement consistently exceeds leakage measured at the scale of the project boundary. The measurement was technically defensible at the scale at which it was conducted; it was incomplete at the scale at which the actual displacement was occurring.
Double-counting. A single ton of avoided emissions should not be claimed twice: once by the country hosting the project toward its national climate commitments, and again by the corporate buyer toward its net-zero commitment. The voluntary carbon market’s registry architecture was not designed to prevent this. Different jurisdictions maintained separate ledgers; reconciliation between national accounts and voluntary-market credits was retroactive at best and absent in many cases. The double-counting problem occupied international climate negotiations for a decade: Article 6 of the Paris Agreement, governing how carbon credits could be transferred across national borders without double-counting, was the last major unresolved issue at COP25 in 2019. Agreement on Article 6.4 was finally reached at COP29 in 2024, establishing a UN-supervised mechanism with clearer corresponding adjustment requirements. Whether that architecture proves adequate is a legitimate ongoing debate; what the decade-long negotiation record documents is that the original voluntary market design had not solved the problem, and that solving it required a level of intergovernmental coordination the market itself could not produce.
IV.
The harder question, the one worth dwelling on, is not why the mechanisms failed; the failure modes described above are clear enough in retrospect. The more interesting question is why competent, analytically rigorous people couldn’t see the failures coming, and why the system continued attracting serious institutional investment even as warning signs accumulated. Both questions have answers. They are different answers, and both matter.
None of what follows means that abstraction itself is avoidable. Large-scale governance systems necessarily simplify the realities they coordinate: financial accounting abstracts firms, GDP abstracts economies, insurance abstracts mortality risk. The SO₂ cap-and-trade program worked, and it’s worth understanding precisely why, not just because SO₂ from one smokestack is genuinely fungible with SO₂ from another, but also because it was a mandatory compliance program with regulatory enforcement and government-backed integrity mechanisms. The voluntary carbon market lacked both properties: the fungibility of the underlying good was weaker, and the verification architecture was self-certified rather than regulated. The carbon market’s structural vulnerability ran through both channels simultaneously. The issue was not that it abstracted ecological systems. The specific abstractions required for liquidity and fungibility appear to have exceeded the tolerances within which ecological measurement remained reliable, and the verification architecture, lacking regulatory backing, could not compensate for that gap.
Three assumptions were doing the most load-bearing work.
Commensurability: the assumption that one ton of CO₂ equivalent avoided in a Brazilian rainforest is interchangeable with one ton avoided in a Kenyan mangrove, or in an American industrial facility. This assumption is required for the market to function; a market in which every unit is unique and incomparable is not a market in the sense the mechanism needed. But commensurability is an artifact of the measurement system, not a property of carbon in ecological systems. A ton of carbon stored in an old-growth forest involves a very different set of ecological functions, permanence risks, and co-benefits than a ton avoided through industrial gas capture. The measurement system created the appearance of equivalence; it did not create equivalence.
Fungibility: the assumption that credits can substitute for each other without meaningful loss. Fungibility follows from commensurability: if all tons are equivalent, then any ton can stand in for any other. The mechanism required this for the market to be liquid. But the underlying goods, forests in particular, are not fungible in any sense that matters ecologically or, it turned out, financially. The specific characteristics of a specific forest in a specific place determined whether the permanence and additionality claims held. Treating that specificity as irrelevant to the credit’s value was a convention that served the market’s liquidity, not a description of the underlying reality.
Measurability: the assumption that the emissions reductions produced by a project could be accurately quantified using the available verification methods. The entire mechanism depends on this: a market in units that cannot be accurately measured is a market in fiction. The assumption was reasonable given the state of measurement science when the mechanism was designed. It became increasingly untenable as remote sensing, satellite monitoring, and peer-reviewed study produced measurements that diverged significantly from what the verification protocols had found.
There is one piece of writing from this period worth sitting with carefully. In 2006, before the public scandals had broken and before the peer-reviewed evidence had accumulated, Larry Lohmann made an argument that has since proven structurally precise. Writing from a skeptical position on carbon commodification, he argued that the commodification of carbon required treating an ecological process as a fungible unit, and that the market would generate verification mechanisms unable to support that treatment. The critical framing doesn’t diminish the predictive accuracy; if anything, it sharpens it. The problems were in the design. Verification consultants, registries, project developers, and corporate buyers all operated within those design constraints; the apparatus could get more elaborate without getting more accurate, and it did. Beyond a certain threshold, more elaborate verification increases procedural complexity without resolving the underlying mismatch between ecological specificity and the standardization the market required.
The second answer, why the system persisted even as warning signs accumulated, is less epistemological and more institutional. Carbon markets solved problems beyond emissions reduction itself. They translated climate obligation into measurable financial instruments. They allowed corporations to demonstrate action within existing accounting frameworks. They gave governments a partial mechanism where direct regulation was politically difficult to achieve.
They provided capital markets with climate-compatible assets legible to existing investment mandates. None of this made the underlying measurements more accurate. But it created structural incentives, distributed across the chain from buyer to developer to auditor to registry, that worked against aggressive scrutiny of the measurement failures. When an instrument is simultaneously analytically fragile and institutionally useful, and when every party in the chain benefits from the instrument continuing to function, the process of registering the fragility tends to be slower than external analysis would suggest. The reason is not bad faith. It is that the institutional architecture selected for attention to function rather than to accuracy.
V.
The three assumptions that undermined voluntary carbon markets (commensurability, fungibility, measurability) are not unique to carbon. What I find striking, looking across the adjacent domains where environmental markets are now being built, is how directly these same assumptions are being carried forward. They are not being adjusted in light of what failed in carbon. They are being transported wholesale.
Biodiversity credits are the most direct replication. The mechanism is structurally identical: units of biodiversity loss are measured, offset by units of biodiversity preservation elsewhere, traded in a voluntary market that channels capital to conservation. The UK’s Biodiversity Net Gain regulations, effective in 2024, made this mechanism mandatory for new development. Research published in Nature in 2023 by zu Ermgassen and colleagues, reviewing the global implementation record of biodiversity offset programs, found the same pattern. The commensurability assumption, that an acre of wetland in one jurisdiction is exchangeable for an acre of woodland in another, is even harder to defend ecologically than the carbon-market equivalent, and the additionality problem, if anything, more intractable. The mechanism is being built with full knowledge of the carbon market’s failures, with adjustments at the margins, but without engaging the underlying assumptions that produced those failures.
Water markets have different origins; water trading has a longer history in the American West and Australia than in the environmental-finance space. But the same structural tensions appear when applied to ecosystem-services goals. The Murray-Darling Basin water market in Australia, the most extensively developed water-trading system in the world, was the subject of a Royal Commission in 2019 whose findings documented dysfunction that followed recognizable patterns. The findings named measurement failures, leakage between areas the market treated as equivalent, and governance failures that compounded the measurement problems. The basin’s ecological condition continued to deteriorate through the period in which the trading system was in operation. Water in a specific place, at a specific time, serving specific ecological functions, is not fungible with water elsewhere; treating it as fungible produced the predictable consequences.
Payment for environmental services, meaning direct payments to landholders or communities for maintaining ecosystem functions, has performed better than commodity markets where the payment relationship is direct and the arrangement is durable. Costa Rica’s Pago por Servicios Ambientales program, in continuous operation since 1996 and supported by substantial peer-reviewed evaluation, has produced meaningful conservation outcomes in a way that market-intermediated offset mechanisms have not. The features that appear to matter: a direct relationship between payer and steward, place-specific measurement calibrated to the actual ecological function being maintained, and an arrangement designed for durability rather than for liquidity. It would be a serious mistake to treat PSA as a template for global carbon markets. It operates across roughly one million hectares in a country with unusually stable property rights, direct government administration, and political conditions that are not representative of the jurisdictions where large-scale carbon markets operate. What the Costa Rica case offers is not a scalable alternative but something more limited and more useful: evidence that certain design features (direct relationship, place-specificity, durability over liquidity) correlate with better outcomes at the scale where they can be tested. Whether those features can be preserved at larger scales, and under what governance conditions, is among the genuinely open questions.
What the survey of adjacent mechanisms suggests, taken together, is that the voluntary carbon market failures were not anomalies or isolated errors made in one domain that can be corrected by designing better mechanisms in the next. The same assumptions are being carried into new domains, adjusted at the edges, with notably limited engagement with why the edges needed adjusting. This is what happens when a framework’s blind spots are structural rather than incidental: the framework’s users see the symptoms, address them case by case, and miss the pattern connecting the symptoms because the pattern is in the premises, not in the execution.
VI.
What this leaves us with is not the new playbook. The new playbook does not exist yet. Anyone selling it with confidence is selling something else.
What we have is a more precise version of the question: which environmental goods can be abstracted to the tolerances commodity markets require, and what institutional arrangements are capable of handling the ones that cannot. The answer is not obviously “no markets, ever.” The SO₂ precedent holds, compliance-market reforms in the EU ETS have produced measurable industrial emissions reductions, and the Article 6.4 architecture now emerging may address some of what the voluntary market’s design left unresolved. The work of determining where the threshold runs, which goods support the required abstractions, at what scale, under what verification architecture, is genuinely underway in peer-reviewed literature and in the legal and financial experimentation that is mostly not yet ready to be claimed as a framework.
People who have built careers in this space are doing that work. Others are applying the same analytical lens in water, biodiversity, and ecosystem services. The work is being done piece by piece, often in private, by people who are mostly too close to the previous framework’s failure to trust a new set of assumptions prematurely.
That is not a failure mode. It is what serious work looks like in the period after the previous answer has stopped working and before the next one is ready. What is available, for now, is the discipline of articulating the failure precisely: naming the specific assumptions that didn’t hold, watching where those same assumptions are being carried into new domains, and refusing the temptation to replace one set of overconfident answers with another. The voluntary carbon markets couldn’t hold the analytical weight placed on them. The question of what can hold that weight, at scale, over time, across the range of ecological goods that need institutional stewardship, remains genuinely open, and the seriousness with which it is being engaged, slowly and carefully, in pieces, is the work that matters now.



