Safety Without Friction: Why “Responsible AI” Fails Under Commercial Incentives

The Comfort of Moral Seriousness

Over the past several years, a new genre of writing has emerged alongside the rapid development of large-scale AI systems: the AI safety essay. These essays are typically long-form, reflective, and morally serious. They acknowledge uncertainty, emphasise responsibility, and signal an awareness of historical precedent—often invoking nuclear power, climate change, or other dual‑use technologies as cautionary parallels. As a genre, they are thoughtful, articulate, and frequently well‑intentioned...the tragic irony of my participating in the very thing i'm about to criticise is not lost on me!
They are also increasingly influential. Such essays shape public discourse, inform policymakers, reassure customers, and pre‑empt criticism by demonstrating that risks are being taken seriously at the highest levels of leadership. In doing so, they help establish a narrative in which responsibility is framed primarily as a matter of foresight, reflection, and stated concern.
The problem is not that this moral seriousness is insincere. The problem is that it is often mistaken for constraint.
In complex sociotechnical systems, risk is not primarily governed by what organisations say they care about, but by what their structures, incentives, and defaults allow—or reward—people to do. Tone does not limit behaviour. Intent does not enforce boundaries. Reflection does not substitute for mechanisms. A system can be designed by people who are deeply aware of its dangers and still produce outcomes that systematically amplify those dangers if its incentives point in that direction.
This distinction matters because modern AI systems are no longer speculative research artifacts. They are deployed products, integrated into workflows, APIs, and markets. Once systems operate at scale, safety is no longer an abstract property of models or a future research objective; it is an emergent property of defaults, interfaces, pricing, and responsibility allocation. In that environment, the most consequential decisions are rarely philosophical. They are commercial.
This essay advances a simple thesis: the dominant risks associated with advanced AI systems arise not from a lack of moral awareness or safety discourse, but from incentive structures that reward unsafe‑by‑default designs while externalising the cost of safety. Where safeguards introduce friction, slow adoption, or complicate integration, they are treated as optional. Where responsibility can be shifted to users, developers, or downstream actors, it often is.
The result is a recurring pattern: systems framed as powerful and potentially dangerous are nonetheless shipped with permissive defaults, reversible safeguards, and ambiguous accountability—accompanied by extensive discussion of how carefully these risks have been considered. This combination is not accidental. It is the predictable outcome of aligning safety primarily with rhetoric and research, while aligning product success with speed, capability, and ease of use.
Understanding this gap—between moral seriousness and operational constraint—is essential. Without it, debates about alignment, agency, and long‑term risk, risks missing the most immediate and tractable source of harm: the incentives that shape what is built, what is shipped, and what is left to someone else to worry about.

On Institutional Positioning and Incentives

Any serious discussion of AI safety must begin by acknowledging standpoint. Safety is not evaluated from a neutral vantage point; it is interpreted through the incentives, constraints, and trade‑offs of the institutions doing the evaluating. What risks appear salient, which mitigations seem practical, and what costs are considered acceptable all depend on where one sits in the system.
This essay is written from the perspective of a team building Worka.ai: a safe‑by‑default AI product designed around a WASM‑based, sandboxed execution model. That positioning matters. It shapes not only our conclusions, but the kinds of trade‑offs we confront daily. We are not approaching safety as an abstract research problem or a post‑hoc policy layer. We are approaching it as an architectural and commercial choice that must survive contact with real users, real workflows, and real market pressure.

Safety as a Commercial Decision

Choosing to build a safe‑by‑default system is not a moral flourish; it is an economic commitment. In Worka’s case, this has meant designing around constrained execution environments, explicit capability boundaries, and sandboxed agents whose powers are limited by default rather than expanded opportunistically. A WASM‑first architecture is not the easiest path. It introduces complexity, restricts ambient authority, and forecloses entire classes of shortcuts that would make rapid iteration easier.
Those constraints are not accidental. They exist precisely because experience shows that safety mechanisms added later—after systems are already powerful, integrated, and relied upon—are fragile, optional, and routinely bypassed. By contrast, safety that is embedded at the execution layer is difficult to remove without deliberate effort. That difficulty is the point.

The Real Cost of Safety

This approach carries real costs. Safe‑by‑default systems impose friction where permissive systems do not. They slow down experimentation, complicate APIs, and force explicit design decisions where implicit behaviour would be cheaper. They reduce the surface area for “just ship it” solutions and make certain classes of product promises harder to sustain.
In commercial terms, this translates to adoption drag. Users must understand constraints and developers must work within them. Features that could be exposed immediately must instead be mediated, scoped, or delayed. These are not theoretical downsides; they are competitive disadvantages in markets that reward speed, flexibility, and minimal onboarding friction.
It is precisely because these costs are real that incentive alignment matters. When safety competes directly with growth, ease of use, or revenue, it will lose unless it is structurally enforced. This is the context in which claims about optional safeguards, user‑removable controls, or downstream responsibility must be evaluated.

Why This Perspective Sharpens the Critique

This standpoint does not soften the critique that follows—it sharpens it. Arguments that rely on voluntary restraint, good intentions, or post‑deployment vigilance appear plausible until one has attempted to ship a system where safety is non‑optional. From that vantage point, it becomes clear how often safety rhetoric abstracts away the hardest part of the problem: absorbing the cost of constraint upfront rather than deferring it downstream.
The critique offered here is therefore not directed at motives. It is directed at mechanisms. If safety measures are described in ways that make them cheap to remove, easy to bypass, or someone else’s responsibility to implement, then their persistence should not be assumed—regardless of how seriously they are discussed.

Scope and Limits

This argument is not a claim of moral superiority, nor an assertion that any single architecture or product has “solved” AI safety. Worka’s approach reflects one set of trade‑offs, informed by a particular product philosophy and threat model. Other systems will make different choices, and some risks discussed elsewhere may lie outside the scope of this analysis.
What this essay does claim is narrower and more concrete: safety claims that do not survive incentive pressure are not safety guarantees. Any framework for responsible AI that ignores the commercial and architectural forces shaping real systems will systematically overestimate the protection offered by good intentions alone.
The sections that follow examine how this mismatch between stated concern and operational reality manifests—and why it remains the most immediate source of avoidable risk in deployed AI systems.

The Adolescence Metaphor as Accountability Evasion

Metaphors are not decorative. In institutional discourse, they do real work. They shape how responsibility is allocated, how urgency is perceived, and which failures are treated as understandable versus unacceptable. In discussions of advanced AI, the metaphor of adolescence has become particularly influential—and particularly problematic.

Adolescence as Inevitability

To describe a technology as being in its “adolescence” is to frame its risks as the natural byproduct of growth. Adolescence implies impulsiveness, experimentation, incomplete judgment, and mistakes that are unfortunate but expected. Crucially, it suggests that turbulence is a phase to be endured rather than a condition to be actively constrained.
Applied to AI systems, this framing subtly shifts responsibility. Harms are no longer the result of specific design choices, deployment decisions, or incentive structures. They become the predictable growing pains of a powerful technology maturing faster than society can adapt. In this view, missteps are regrettable but unavoidable—something to manage around rather than prevent through stricter limits.
This reframing matters because inevitability is a powerful moral solvent. If instability is natural, then restraint begins to look naive. If accidents are developmental, then accountability softens into patience.

Foresight Versus Developmental Excuses

The adolescence metaphor also introduces a deeper contradiction. The same institutions invoking it often emphasise their exceptional foresight: their awareness of historical precedent, their investment in safety research, their seriousness about long-term risk. They claim to see the dangers early and clearly—sometimes earlier than anyone else.
One cannot coherently hold both positions at once.
Either the risks are genuinely foreseeable, in which case they are the product of choices made with eyes open; or they are the unpredictable excesses of an immature system, in which case claims of unusual foresight are overstated. Framing harms as developmental while simultaneously asserting superior understanding allows institutions to claim credit for caution while deflecting responsibility for outcomes.
This tension is not merely rhetorical. It has practical consequences. If leaders believe that instability is inherent to the phase rather than contingent on design, then pressure to embed hard constraints weakens. Why over-constrain a system if its rough edges are simply part of growing up?

Why Metaphors Matter

Institutional metaphors shape policy, product decisions, and public expectations. Describing AI as adolescent invites tolerance for risk in precisely the period when deployment is accelerating and stakes are increasing. It normalises the idea that safety failures are transitional rather than structural.
More grounded metaphors exist. One might compare advanced AI to critical infrastructure, where immature systems are not excused for failure but kept offline until they meet strict standards. Or to financial instruments, where early volatility is not romanticised but tightly regulated because downstream consequences are severe.
The choice of metaphor signals the choice of responsibility model. Adolescence implies indulgence and eventual maturity. Infrastructure implies duty, constraint, and liability.
In the context of deployed AI systems, the latter framing is the more accurate—and the more honest. The risks at issue are not the misadventures of a teenager finding its way. They are the foreseeable consequences of deploying powerful systems into environments shaped by strong commercial incentives and weak structural limits.
Treating those risks as developmental inevitabilities does not make them less real. It merely delays the moment at which institutions must fully own the outcomes of their decisions.

Alignment as Abstraction: When Safety Stops at the Whitepaper

Much of the contemporary AI safety discourse treats alignment as a property of models: something to be trained, tuned, measured, and iteratively improved through research. Alignment, in this framing, lives inside the system—encoded in weights, objectives, and internal representations. Progress is reported in papers, benchmarks, and technical blog posts. Failures are framed as gaps in understanding or incomplete optimisation.
This abstraction is convenient. It allows safety to be discussed without meaningfully constraining deployment.

Alignment as an Internal Property

When alignment is framed primarily as an internal model attribute, responsibility subtly shifts away from institutions and toward artifacts. The question becomes whether a system has the right values or reasoning patterns, rather than whether the organisation deploying it has imposed the right limits on what it can do. Safety is treated as something to be improved over time, alongside capability, rather than something that must be satisfied before capability is exposed.
This framing also encourages deferral. If alignment is a research problem, then shortcomings are expected. If it is probabilistic and emergent, then failures can be attributed to residual uncertainty. The system is “mostly aligned,” “improving,” or “better than before”—all claims that may be true while remaining insufficient for safe deployment.

The Missing Layer: Products and APIs

What is notably absent from this discourse is alignment at the product and API layer. Real-world impact is not determined by a model’s internal dispositions alone, but by how its capabilities are packaged, exposed, and defaulted. APIs grant authority. Interfaces imply permission. Defaults define normal use. Look no further than the mid 2000s internet and the Facebook Cambridge Analytica scandal for a more concrete example. Defaults matter. yet as we've seen, Facebook today is larger than it has ever been, more restricted than it has ever been. Clearly then, restrictions for safety are not the scaling metric we should measure by.
When alignment discussions fail to address these layers, they leave untouched the mechanisms that most directly shape behaviour at scale. A model may be carefully trained, yet deployed behind an API that assumes trust, offers broad ambient authority, and places the burden of constraint on downstream users. In such cases, alignment exists on paper while risk is realised in practice.
This gap is not accidental. Product- and API-level constraints are where safety most directly collides with commercial incentives. They introduce friction, reduce flexibility, and complicate integration. As a result, they are often framed as optional, configurable, or the responsibility of someone else to implement correctly.

Alignment That Constrains

Real alignment is not merely descriptive; it is restrictive. It shows up as limits that cannot be bypassed casually, as defaults that resist misuse, and as architectures that make certain classes of harm difficult rather than merely discouraged. Alignment that survives contact with reality constrains behaviour even when doing so is inconvenient.
By contrast, alignment that exists primarily in documentation, research agendas, or public statements is fragile. It depends on continuous vigilance, perfect downstream implementation, and sustained good faith across many actors. History offers little reason to believe such conditions will reliably hold.
If alignment is to be taken seriously as a safety concept, it must be evaluated where it matters most: in the concrete affordances systems provide and the constraints they enforce. Anything less risks turning alignment into a reassuring abstraction—one that signals concern without materially reducing risk.
In deployed AI systems, the question is not whether alignment has been thoughtfully articulated. It is whether it has been operationalised in ways that meaningfully limit what systems can do by default. Where it has not, alignment has stopped at the whitepaper.

Safety Theater vs. Safety Engineering

As AI systems have moved from research environments into production, a familiar pattern has emerged: safety measures are most often implemented where they are cheapest, easiest to reverse, and least disruptive to core capabilities. This pattern produces the appearance of responsibility without imposing meaningful constraint. The result is safety theatre—visible, legible, and reassuring—standing in for safety engineering that actually changes system behaviour.

Reversible, Low-Cost Safeguards

The dominant safety mechanisms in deployed AI systems tend to share a common trait: reversibility. They can be added late in the development process, tuned without architectural change, and removed with minimal impact on performance or adoption. From an organisational perspective, this makes them attractive. They allow teams to demonstrate responsiveness to risk while preserving optionality.
Reversible safeguards are not useless. They can reduce harm at the margins and provide valuable signals. But they are structurally weak. Because they do not fundamentally alter what a system is capable of doing, they rely on continuous enforcement, perfect configuration, and sustained institutional will. When incentives shift—as they inevitably do—these controls are the first to be relaxed.

Classifiers, Policies, and Optional Controls

Content classifiers, moderation layers, and policy-based restrictions are the most visible examples of this approach. They operate by inspecting inputs or outputs and intervening when certain thresholds are crossed. Importantly, they sit around the system rather than within its execution model.
This placement matters. Classifiers can be disabled, bypassed, or scoped down when they interfere with desired use cases. Policy layers can be loosened in response to customer demand or competitive pressure. Optional controls can be left unconfigured by default and justified as “available” rather than enforced.
The economic logic is straightforward: these mechanisms minimise friction. They preserve a permissive core while offering a safety narrative that can be adjusted as needed. From the outside, this looks like careful governance. From the inside, it is often an optimisation problem—how to reduce risk signals without constraining capability. For a business, this makes perfect sense and if you think it does not then you misunderstand the purpose of any valuable business.

The Missing Layer: Structural Constraints

What is largely absent from mainstream safety discourse is the layer that is hardest to change and therefore most effective: structural and capability-based constraints. These are limits embedded in architecture rather than applied at the perimeter. They restrict what agents can do by default, not just what they are allowed to say.
Structural constraints take many forms: sandboxed execution environments, explicit capability grants, least-privilege models, and architectures that eliminate ambient authority. Unlike policy layers, these constraints cannot be silently relaxed without deliberate redesign. They impose costs upfront and resist erosion over time.
The relative scarcity of such measures is not accidental. Structural safety is expensive. It slows development, complicates APIs, and forecloses entire categories of opportunistic features. It is also difficult to market, because its success is measured in what does not happen...and this is critically important.
Safety engineering, properly understood, is the practice of absorbing these costs early in order to prevent larger failures later. Safety theatre, by contrast, optimises for appearance and flexibility. In a competitive environment that rewards rapid capability expansion, the latter will always be easier.
The distinction matters. Systems protected primarily by reversible safeguards may look responsible, but they remain fragile under pressure. Systems shaped by structural constraints behave differently—not because they are more virtuous, but because they are harder to misuse. In high-stakes domains, that difference is the difference between safety as a promise and safety as a property.

Defaults Are Destiny: How Product Choices Shape Risk at Scale

In complex systems, defaults matter more than intentions. They shape behaviour quietly, persistently, and at scale. Most users do not reconfigure systems extensively; most developers do not override defaults unless forced to. What ships as the “normal” path becomes the dominant path, and over time it defines what is considered acceptable, expected, and safe.
In this sense, defaults are not a secondary design concern. They are the primary mechanism through which risk is distributed.

Secure-by-Default as a Mature Industry Norm

In many mature technical domains, secure-by-default is no longer controversial. Network services default to encrypted connections. Browsers warn aggressively on invalid certificates. Operating systems sandbox applications and require explicit permission for sensitive capabilities. Cloud platforms assume least privilege and force operators to opt into broader access deliberately.
These defaults exist not because misuse is impossible, but because it is costly. They raise the baseline level of safety by ensuring that insecure configurations require explicit action, justification, and accountability. Importantly, they also shape norms: developers come to expect friction around dangerous operations, and tooling evolves accordingly.
Crucially, these defaults were not free. They slowed early adoption, complicated onboarding, and generated resistance from users accustomed to permissive systems. They persisted only because the industries involved accepted that the long-term cost of insecurity was higher than the short-term cost of friction.

Unsafe-by-Default as Commercial Optimisation

By contrast, many AI systems are shipped unsafe-by-default—not accidentally, but deliberately. Permissive defaults reduce onboarding friction, shorten time-to-value, and broaden the range of immediately accessible use cases. They make systems feel powerful and flexible from first contact, which is commercially attractive in competitive markets.
In this model, safety is something users are expected to add later: by configuring filters, implementing guardrails, or writing additional logic around the system. Responsibility is deferred downstream, where it is fragmented across thousands of developers and organisations, each with different incentives and levels of expertise.
This is not framed as negligence. It is framed as empowerment. Users are given “full control,” and safety features are described as available rather than enforced. But the effect is predictable: the path of least resistance becomes the path of least safety, and risk accumulates where oversight is weakest.

Defaults, Not Intentions

The difference between these approaches cannot be explained by stated values alone. Many teams deploying unsafe-by-default systems articulate sincere concern about risk. The divergence arises from incentives. Defaults that maximise adoption and minimise friction align cleanly with commercial success. Defaults that impose constraint do not.
Over time, intentions recede and defaults persist.
This is why debates about AI safety that focus on ethics, alignment statements, or long-term intentions often miss the most immediate lever for risk reduction. What matters most is not what systems are meant to do in ideal hands, but what they do by default in ordinary ones.
At scale, defaults determine outcomes. They encode priorities more honestly than mission statements ever will. Any serious attempt to reduce systemic risk in AI must therefore be judged less by what safeguards exist in theory, and more by what protections users encounter without asking for them.
In the end, destiny is not written in intentions. It is shipped in defaults.

Delegating Risk: The Myth of User Responsibility

A recurring theme in the deployment of powerful AI systems is the invocation of user responsibility. Providers supply flexible tools, extensive documentation, and optional safety features, then assert that it is up to developers and end users to apply them correctly. In low-stakes domains, this approach can be reasonable. In high-stakes systems described as posing existential or civilisational risk, it is deeply incoherent.

Externalising Safety Downstream

Delegating safety to users is attractive because it minimises friction at the point of sale. Providers can ship powerful, permissive systems quickly while framing themselves as enablers rather than gatekeepers. Responsibility is dispersed across a vast downstream ecosystem: startups, internal tooling teams, individual developers, and non-technical users adapting systems to local needs.
This dispersion has predictable effects. Safety practices become uneven. Incentives diverge. The actors closest to the risk often have the least leverage, the least information, and the least capacity to absorb failure. Meanwhile, the institutions with the greatest ability to impose structural constraints have already opted not to.
From an institutional perspective, this is not presented as abdication but as respect for user autonomy. Yet autonomy without guardrails is indistinguishable from abandonment when the underlying system is complex, opaque, and rapidly evolving.

Existential Risk and Delegated Control

The contradiction becomes stark when juxtaposed with claims about the severity of AI risk. Systems are described as potentially transformative, destabilising, or even existential—capable of reshaping labor markets, concentrating power, or escaping meaningful oversight. These claims are used to justify extraordinary attention, urgency, and investment.
And yet, the proposed mitigation is often ordinary delegation.
If the risks are truly as profound as described, then assigning primary responsibility for safety to end users is not merely insufficient; it is indefensible. One does not mitigate nuclear risk by asking operators to “be careful,” nor systemic financial risk by trusting every market participant to self-regulate. In every other high-risk domain, responsibility concentrates upstream, where leverage and visibility are greatest.
The persistence of delegated control suggests one of two possibilities: either the risks are overstated, or the mitigation strategy is misaligned with the stated threat model. Both cannot be true simultaneously.

Why Responsibility Does Not Scale

Responsibility scales poorly downstream because incentives fragment faster than accountability can follow. Each additional layer—API consumer, application developer, organisational integrator—introduces new pressures to prioritise speed, cost, and local optimisation over systemic safety. Even well-intentioned actors face competitive and organisational constraints that make sustained vigilance unlikely.
Moreover, downstream actors cannot redesign core architectures. They cannot change defaults they do not control, remove ambient authority they did not grant, or impose constraints the system was not built to support. They can only patch around the edges, hoping their local fixes hold under conditions they do not fully understand.
Delegating risk in this way does not eliminate it. It multiplies it.
A safety strategy that relies primarily on user responsibility is therefore not a strategy at all. It is a transfer of liability disguised as empowerment. In systems described as powerful enough to warrant existential concern, such transfers should be treated not as pragmatic compromises, but as warning signs of unresolved incentive misalignment.

The Incentive Gradient: Why Capability Always Wins

In competitive technology markets, progress is not neutral. It follows gradients—directions in which effort reliably produces reward. In AI, the steepest and most reliable gradient points toward capability. Systems that can do more, faster, and with less human involvement command attention, investment, and adoption. Safety, by contrast, occupies a far flatter slope.
This asymmetry is not incidental. It is structural.

Capability as Differentiator and Valuation Driver

Capability is easy to demonstrate and easy to price. It can be shown in benchmarks, demos, and product launches. It translates directly into marketing claims and investor narratives. More capable systems promise automation, leverage, and competitive advantage—outcomes that are legible to buyers and funders alike.
As a result, incremental capability gains compound. Each improvement attracts more users, more capital, and more data, which in turn enable further improvements. This feedback loop rewards speed and ambition. It also creates pressure to expose new capabilities as quickly and broadly as possible, lest competitors do so first.
Within this environment, restraint is costly. Capabilities that exist but are not exposed do not generate revenue. Capabilities that are gated, scoped, or slowed are harder to sell. Even when teams recognise the risks, the incentive gradient continues to pull toward release. If you need some evidence of this, look no further than Google. They developed the modern AI and internally had this technology in testing for years. They attempted the safety first approach but also published enough for it to be reproduced externally. Enter OpenAI. Today's Google is very different to the one of four years ago, they are arguably now selling the best "frontier" LLM available. Safety is expensive, safety is a choice. Safety (unfortunately) is not aligned with the current feedback loop that motivates growth.

Why Safety Does Not Compound

Safety improvements behave differently. They rarely compound in the same way capabilities do. Adding a safeguard does not automatically make the next safeguard cheaper or more effective. In many cases, it does the opposite: each additional constraint introduces complexity, edge cases, and maintenance burden.
Moreover, safety is difficult to demonstrate. Its success is measured in the absence of incidents, which makes it hard to market and easy to discount. A prevented failure generates no headline. A constrained system does not demo as well as an unconstrained one.
This creates a persistent lag. Safety work is often framed as necessary but secondary—something to be addressed after core capabilities are established, or once the market position is secure. By the time risks become obvious, systems are already entrenched, and retrofitting constraints becomes politically and technically difficult.

The One-Way Ratchet

Taken together, these dynamics produce a one-way ratchet. Capabilities are exposed quickly and widely, because doing so is rewarded. Safety measures are added incrementally, because doing so is costly. When trade-offs arise, capability almost always wins.
Importantly, this ratchet operates even in organisations that take safety seriously. Good intentions do not neutralise incentive gradients. Without structural counterweights—hard constraints, enforced defaults, and governance mechanisms that can say no—the path of least resistance leads toward escalation.
This is why appeals to balance or responsibility, while well-meaning, are insufficient on their own. As long as capability advancement is the primary driver of value, and safety remains a non-compounding cost centre, the system will drift in a predictable direction.
Understanding this gradient is essential. It explains why risks recur across organisations, products, and generations of models. It also clarifies why meaningful safety progress requires more than better alignment research or clearer policy statements. It requires mechanisms that can oppose the gradient—mechanisms that make restraint durable even when capability is the easier choice.

Rhetoric as Risk: Public Claims and Market Conditioning

In discussions of AI risk, rhetoric is often treated as incidental—an imperfect translation of technical realities for public consumption. In practice, public claims made by influential actors are not merely descriptive. They are performative. They shape expectations, behaviour, and market dynamics in ways that directly affect how systems are adopted and used.
When the technology in question is already powerful and rapidly spreading, rhetoric itself becomes a risk vector.

Displacement Narratives as Accelerants

Claims that AI systems will soon replace large classes of skilled workers—software engineers in particular—are often presented as sober forecasts. Framed this way, they appear neutral: uncomfortable truths spoken plainly. But in market contexts, such statements do not function as detached observations. They act as accelerants.

Displacement narratives create urgency. They encourage organisations to adopt early to avoid falling behind, workers to reorient their careers under pressure, and investors to double down on systems framed as inevitable. The more authoritative the speaker, the stronger this effect becomes.

Crucially, these narratives collapse uncertainty. They present contested futures as foregone conclusions, reducing the perceived legitimacy of slower, more cautious approaches. In doing so, they narrow the space for deliberation precisely when it is most needed.

I take particular issue with Anthropic in this regard as they are one of, if not the most vocal here it irks not because they do it necessarily; I'm trying to start a business so I understand their motivation. It is more so the blatant hypocrisy and contradiction of it all, if it is as you claim, why can't you build a simple TUI with all the GPUs at your disposal...why do you need over 2K employees. It misinforms the less technical, drives boardroom decisions that affect the lives of many only to find reality simply doesn't align with these outlandish claims (think of what Klarna did and has quietly been undoing).

Rhetoric, Adoption, and Dependency

Public claims about labor replacement also reshape dependency structures. When AI systems are framed as imminent substitutes rather than complementary tools, organizations are incentivized to reorganize around them quickly. Skills atrophy. Redundancies are removed. Human fallback paths disappear.
This dynamic increases systemic risk. The faster dependency grows, the higher the cost of failure becomes. Yet this effect is rarely acknowledged in safety discourse, which tends to focus on technical misuse rather than socio-economic lock-in.
Importantly, this is not an argument against discussing impact. It is an argument about timing and framing. When claims about displacement outpace operational reliability and safety maturity, they function less as warnings and more as pressure.

Rhetoric as Deployment

For companies building and selling AI systems, public statements by leadership are part of the deployment surface. They influence customer behaviour, regulatory posture, and internal priorities. Treating them as separate from product decisions creates a false boundary.
If leaders claim that systems are capable of replacing core human roles, then those systems will be used—and trusted—in ways that reflect that claim, regardless of whether tooling, safeguards, or reliability justify it. Responsibility for the resulting outcomes cannot be disclaimed as mere misinterpretation.
In high-stakes domains, rhetoric should therefore be evaluated with the same scrutiny as APIs or defaults. It is another interface through which power is exercised and risk is propagated.
To describe rhetoric as harmless commentary is to ignore how markets actually function. In practice, claims about capability and inevitability shape behavior long before the technology itself is ready to bear the weight placed upon it. When that happens, the risk is no longer hypothetical. It has already been deployed.

When Claims Outpace Competence: An Operational Reality Check

Extraordinary claims demand extraordinary execution. When organizations assert that AI systems are approaching or surpassing human-level performance in complex professional domains, those claims implicitly set expectations about reliability, robustness, and operational maturity. In this context, the quality of developer tooling, APIs, and supporting infrastructure is not a secondary concern—it is a signal of epistemic credibility.
When that execution lags behind the rhetoric, the gap itself becomes a source of risk.

Transformative Claims, Ordinary Tooling

Claims that AI systems can replace or significantly outperform skilled professionals suggest a high degree of internal coherence and external usability. If systems are genuinely capable of taking on such roles, then the surrounding tooling should reflect that maturity: interfaces should be clear, abstractions coherent, failure modes predictable, and integration paths well thought through.
In practice, this is often not the case. Developer tools are frequently brittle, inconsistently designed, and under-documented. APIs change rapidly without clear migration paths. Basic ergonomics—error reporting, debugging, composability—lag far behind the capabilities being advertised.
This mismatch matters. It suggests that the systems are being positioned as more reliable and autonomous than the operational environment actually supports. When users are encouraged to treat AI outputs as substitutes for human judgment while simultaneously compensating for fragile tooling, risk increases rather than decreases.

Execution as Evidence

In safety-critical domains, execution quality is evidence. Aviation software, medical devices, and financial infrastructure are judged not only by what they claim to do, but by how rigorously they are built, tested, and maintained. Sloppy execution undermines trust precisely because it signals a lack of internal discipline.
The same standard should apply to AI systems making transformative claims. If organizations struggle to deliver stable tooling for developers, it raises legitimate questions about their readiness to manage far more complex sociotechnical risks. This is not a moral indictment; it is an operational one.
Execution discipline matters because it reflects how an organization handles uncertainty. Teams that anticipate failure build systems that surface errors clearly, constrain damage, and degrade gracefully. Teams that assume success tend to rely on optimistic narratives and reactive fixes.

Overconfidence as a Risk Factor

Overconfidence is itself a form of misalignment. When claims about capability race ahead of demonstrable operational competence, they create conditions in which systems are trusted beyond their actual reliability. Users adapt behavior accordingly, delegating judgment, removing safeguards, and reorganizing workflows around expectations that may not hold.
This dynamic is particularly dangerous in systems that are still evolving rapidly. Early failures are interpreted as temporary glitches rather than structural warnings. Each success reinforces confidence, while near-misses are quietly absorbed.
A sober assessment of AI risk must therefore include not just what systems can do in controlled demonstrations, but how well they are engineered in everyday use. When competence lags behind claims, restraint is not pessimism—it is prudence.
The question is not whether ambitious visions are wrong in principle. It is whether institutions have earned the trust implied by the futures they are publicly describing. Where execution falls short, scaling back claims is not a communications failure. It is a safety measure.

“We’re Thinking About It”: Intent as a Substitute for Constraint

As AI systems scale into widespread use, a familiar reassurance accompanies their deployment: we are thinking carefully about the risks. This framing emphasises reflection, deliberation, and good faith. It signals awareness and responsibility. Yet in high-stakes systems, intent is not a control mechanism. When contemplation substitutes for constraint, safety becomes retrospective rather than preventative.

Post-Deployment Contemplation

Much contemporary safety discourse unfolds after systems are already deployed. Risks are acknowledged in essays, interviews, and research agendas while products continue to ship, APIs continue to expand, and adoption accelerates. Safety is framed as an ongoing process—something to be refined in parallel with use.
This sequencing matters. Once systems are integrated into workflows, the cost of imposing new constraints rises sharply. Backwards compatibility becomes a justification for permissiveness. Customers resist changes that limit functionality they have come to rely on. What could have been a default becomes an opt-in. What could have been prohibited becomes merely discouraged.
Post-deployment contemplation may be sincere, but it is structurally weak. It operates in the shadow of sunk costs and established dependencies. At that point, reflection competes not only with commercial incentives, but with inertia.

When Delay Equals Absence

In safety-critical domains, delayed safeguards are often equivalent to missing ones. A brake installed after a vehicle has entered traffic does not reduce the risk of the journey already underway. Similarly, safety mechanisms introduced only after misuse or failure has been observed are necessarily reactive. They respond to known problems while leaving unknown ones untouched.
This is particularly concerning in systems described as unprecedented in scope or impact. If risks are novel and poorly understood, then deferring safeguards until after deployment guarantees that the first line of defense will be real-world harm.
The common rejoinder—that it is impossible to anticipate every risk in advance—is true but incomplete. The impossibility of perfect foresight is not an argument against constraint; it is an argument for conservatism. Where uncertainty is high, the burden of proof should fall on those seeking to expand capability, not on those urging restraint.

The Limits of Good Faith

Good faith is not a substitute for structure. Institutions can act with sincere concern and still produce unsafe outcomes if their systems are not designed to resist pressure. Appeals to trust place the burden on outsiders to believe that vigilance will be maintained indefinitely, even as incentives evolve and leadership changes.
History suggests otherwise. High-stakes failures rarely arise from malice. They arise from drift: small compromises accumulating under pressure, justified by urgency, competition, or optimism. Without hard boundaries, good intentions erode.
In this context, repeated assurances of careful thought should be treated cautiously. They may indicate awareness, but they do not indicate control. Safety that depends on perpetual attentiveness is fragile by design.
In deployed AI systems, the relevant question is therefore not whether risks are being considered, but whether they have been made expensive to ignore. Where intent is allowed to stand in for constraint, safety exists only as long as circumstances remain favourable. That is not a margin robust enough for systems operating at scale.

Anticipated Objections and Why They Fail

Critiques of incentive misalignment in AI safety tend to attract a familiar set of objections. These arguments are not frivolous; they often sound pragmatic, experience‑based, and reasonable in isolation. The problem is that none of them address the core issue raised throughout this essay: systems behave according to incentives, not intentions, and rhetorical defenses do not alter those incentives.

“Flexibility Requires Unsafe Defaults”

This objection frames permissive defaults as a necessary condition for innovation. Developers, it is argued, need freedom to experiment, adapt systems to diverse use cases, and move quickly. Safety constraints, in this view, are acceptable only if they can be removed or bypassed without friction.
The flaw here is not the desire for flexibility, but the assumption that flexibility and safety are opposites. In mature technical domains, flexibility is achieved within constrained environments, not by eliminating constraints altogether. Capability‑scoped permissions, sandboxed execution, and explicit opt‑in expansion of authority all preserve flexibility while preventing accidental or casual misuse.
Unsafe defaults are not a prerequisite for innovation; they are a shortcut that shifts cost away from the platform and onto downstream actors. What is being defended is not flexibility per se, but convenience.

“Bad Actors Will Always Find a Way”

This argument appeals to futility. Because perfect prevention is impossible, the reasoning goes, strong safeguards are of limited value. Determined adversaries will bypass them regardless, so imposing constraints primarily inconveniences legitimate users.
This misunderstands the purpose of safety engineering. The goal is not absolute prevention, but cost shaping. Effective safeguards raise the effort required to misuse systems, reduce the blast radius of failures, and prevent casual or accidental harm. Most real‑world damage does not come from highly resourced adversaries; it comes from ordinary users operating under default conditions.
Designing for inevitability guarantees exposure. Designing for resistance changes outcomes, even when it does not eliminate risk entirely. This argument is hardly any different to saying "people die anyway so we might as well make murder legal".

“This Is Better Than Competitors”

Comparative safety arguments lower the bar rather than raising it. Being marginally better than peers does not address whether a system is safe enough for the risks it introduces. In domains where failures scale rapidly and globally, relative improvement is not an adequate standard.
More importantly, competition is itself one of the forces driving unsafe defaults. Invoking competitors as justification implicitly concedes the incentive problem rather than solving it. If market pressure makes meaningful safeguards untenable, that is evidence of misalignment—not an excuse for perpetuating it.

“Regulation Will Catch Up”

Deferring responsibility to future regulation externalises accountability in time rather than space. While regulation is important, it is slow, uneven, and reactive. By the time rules are formalised, systems are often already entrenched, and the most permissive defaults have shaped usage patterns and expectations.
Moreover, companies building and deploying AI systems are not passive subjects of regulation. They actively shape standards, norms, and policy agendas. Waiting for external constraint while accelerating internal deployment is not neutrality; it is strategic delay.

Why None of These Resolve the Core Issue

Each of these objections sidesteps the same underlying problem: incentives that reward rapid capability exposure and penalise structural restraint. Flexibility arguments ignore who pays the cost. Futility arguments ignore how risk actually manifests. Comparative arguments normalise escalation. Regulatory deferral postpones accountability.
None of them change the fact that, absent enforced constraints, safety measures will erode under pressure. None of them realign incentives such that doing the safe thing is also the easy thing.
Until that misalignment is addressed, these objections function less as defenses of safety and more as explanations for why it remains optional.

Institutional Alignment Is the Real Alignment Problem

Discussions of AI alignment often focus on the moral or cognitive properties of systems themselves: whether models understand human values, reason correctly about consequences, or internalise appropriate constraints. While these questions are not meaningless, they are increasingly misdirected. In deployed systems, the dominant source of misalignment is not the model. It is the institution.

How Moral Framing Obscures Incentives

By framing alignment as a problem of AI morality—what the system “believes,” “wants,” or is “trying” to do—attention is drawn away from the structures that actually determine outcomes. Institutions decide what gets built, what gets shipped, what defaults apply, and who bears responsibility when things go wrong. Models execute within those decisions; they do not originate them.
This moral framing is comforting. It suggests that risk can be addressed through better training, better objectives, or more sophisticated internal reasoning. It allows organisations to present alignment as a technical frontier rather than an organisational constraint. But it also obscures the most tractable lever for safety: changing the incentives that govern deployment.
When systems behave in risky ways, the question is rarely whether the model lacked the right values. It is whether the institution exposed capabilities it was not prepared to govern.

The Risks of Anthropomorphising Systems

Anthropomorphism compounds this confusion. When AI systems are described as having intentions, agency, or moral compasses, responsibility subtly shifts away from the humans and organisations operating them. Failures become attributed to emergent behaviour rather than foreseeable design choices.
This shift has practical consequences. It encourages investment in ever more elaborate alignment techniques while leaving defaults permissive and architectures unconstrained. It frames harm as a failure of understanding rather than a failure of governance.
In reality, AI systems do not negotiate trade-offs or respond to incentives. Organisations do. Models do not decide when to ship, how much friction to impose, or which safeguards are optional. Treating systems as moral actors distracts from the fact that the most consequential decisions are made well before any model generates an output.

Aligning Organisations

If alignment is to mean anything operational, it must apply to institutions as much as—or more than—to models. An aligned organisation is one whose incentives, governance structures, and product decisions make unsafe behaviour difficult and costly, even when doing so conflicts with short-term advantage.
This kind of alignment is unglamorous. It involves saying no to features that would increase adoption but expand risk. It involves enforcing defaults that slow users down. It involves designing systems that resist misuse rather than trusting downstream actors to compensate.
Crucially, it also involves aligning internal success metrics with safety outcomes, not just capability milestones. As long as teams are rewarded primarily for shipping faster and doing more, alignment will remain aspirational.
The real alignment problem, then, is not whether AI systems can be taught to behave responsibly in the abstract. It is whether the organisations building them are willing to bind themselves to constraints that persist under pressure. Until that question is answered, progress on model-level alignment will continue to outpace progress on actual safety.

What Responsible AI Would Actually Look Like

Critique without construction risks becoming merely oppositional. If responsibility is to be more than a rhetorical posture, it must be expressed in concrete design choices that reshape incentives and constrain behaviour by default. Fortunately, many of the elements of genuinely responsible AI are neither speculative nor exotic. They are well understood—just rarely adopted consistently, because they impose real costs.

Safe-by-Default Agentic Systems

A responsible AI system begins from the assumption that agency is dangerous unless explicitly bounded. Agentic behaviour should therefore be earned, not assumed. Systems should default to minimal authority, narrow scope, and constrained execution environments, expanding only when users deliberately and explicitly grant additional capabilities.
In practice, this means architectures where agents cannot freely access networks, files, credentials, or external tools unless those capabilities are declared, reviewed, and enabled. Sandboxed execution—such as WASM-based runtimes—ensures that even when agents reason autonomously, their effects remain contained. This shifts safety from policy enforcement to architectural fact.
Safe-by-default does not eliminate agency; it disciplines it. It ensures that autonomy grows through intentional design rather than accidental exposure.

Capability-Scoped Authorisation and Explicit Permissions

Responsible systems make authority legible. Every meaningful capability—sending messages, executing code, controlling browsers, accessing secrets—should be gated behind explicit, scope-limited permissions. These permissions should be composable, inspectable, and revocable.
Crucially, permission models should be capability-based rather than identity-based. Instead of trusting an agent wholesale, systems should grant narrowly defined powers tied to specific tasks and contexts. This mirrors best practices in operating systems and distributed systems, where least privilege is enforced not by trust but by structure.
When permissions are explicit, misuse becomes diagnosable rather than mysterious. Failures can be traced to granted authority rather than inferred intent.

Friction as a Deliberate Design Choice

In many AI products, friction is treated as a defect to be minimised. Responsible systems treat friction as a tool. Where actions are irreversible, high-impact, or poorly understood, slowing users down is not paternalism—it is risk management.
Deliberate friction can take many forms: confirmation steps for expanding authority, visible warnings when enabling dangerous capabilities, or defaults that require configuration before use. These moments force reflection at exactly the points where incentives would otherwise encourage haste.
Importantly, friction also creates accountability. It distinguishes deliberate choices from accidental ones, making responsibility traceable rather than diffuse.

Making Unsafe Configurations Visible, Auditable, and Costly

Unsafe configurations should not be silent. When users disable safeguards, broaden permissions, or bypass constraints, those actions should be visible and recorded. Auditability changes behaviour. It makes risk legible to operators, reviewers, and—in regulated environments—external stakeholders.
Moreover, unsafe configurations should carry explicit cost. That cost may be operational, reputational, or procedural, but it should be real. Systems that allow users to remove protections effortlessly invite normalisation of risk. Systems that require justification, logging, and review discourage casual erosion.
Responsible AI is not defined by the absence of danger, but by how systems behave when danger is present. Designs that surface risk, slow its expansion, and attach consequences to unsafe choices do not rely on perpetual vigilance or good faith. They encode responsibility into the system itself.
This is what it looks like when safety is treated not as an aspiration, but as an engineering discipline.

Conclusion: Safety That Survives Contact With Reality

The recurring theme throughout contemporary AI safety discourse is moral seriousness. Leaders speak earnestly about responsibility, long-term risk, and the need for caution. These signals matter—but they are not enough. In complex systems operating at scale, seriousness of intent does not translate automatically into safety of outcome.
What ultimately governs behaviour is not what institutions believe, but what their systems permit.
Moral framing without constraint creates a fragile form of safety—one that holds only so long as incentives remain aligned with caution. History suggests this is a narrow window. As competition intensifies and capabilities expand, pressure mounts to relax safeguards, defer hard decisions, and externalise responsibility. In those moments, intent offers little resistance.
This is why alignment that vanishes under pressure is not alignment at all. Alignment that depends on continuous restraint, perfect downstream implementation, or perpetual good faith is indistinguishable from alignment that never existed. Real alignment persists when it is inconvenient—when it constrains action even as incentives pull in the opposite direction.
Until incentives change, safety rhetoric will remain aspirational. Essays, policies, and research agendas can articulate ideals, but they cannot substitute for architectures, defaults, and governance mechanisms that make unsafe behaviour difficult by design. Where safety competes directly with growth, speed, or market share, it will lose unless it is structurally enforced.
The central risk, then, is not that AI technology is immature or adolescent. It is that institutions deploying it are behaving as though responsibility can be postponed—managed later through reflection, iteration, or goodwill. In high-stakes systems, delay is not neutrality. It is a decision with consequences.
Safety that survives contact with reality is not accidental. It is built into systems, absorbed into incentives, and enforced by defaults that resist erosion. Anything less is a promise that holds only until it is tested. And at the scale AI now operates, that test has already begun.

Now...if you got this far and thinking to yourself, what the hell is he talking about...it's in response to Dario Amodei's "The Adolescence of Technology"...I did consider titling this "Sorry Dario" but I thought it too on the nose.

Comments