AI Safety Paradox
Under several reasonable assumptions, super intelligence will actually help defenders in attacker-defender asymmetries that arise in biological or cyber warfare.
As the marginal cost of intelligence goes way down, many more attack vectors can be found via red-teaming and systems can get hardened or inoculated until all relevant attack vectors are covered. While this used to be impossible for most complex systems, super intelligence will make this much more viable.
Assumptions:
The marginal cost of intelligence is going down.
There is a large but finite number of attack vectors N_d for a given system to be defended against.
Attackers have a budget B_a and defenders have a budget B_d.
There is a cost c to defend or exploit an attack vector and it’s much smaller than the total budget on each side c << B_a and B_d so theoretically many attacks can be found.
This cost is larger driven by the cost of intelligence.
Initial Intuition:
The cost of one attack is much smaller than the budget on each side. Attackers only have to be right once, whereas defenders have to be right c * N_d.
Hence, many people currently assume the attacker-defender asymmetry exists and just gets worse with more intelligence becoming cheaper.
Paradox:
What will likely happen instead is the following. If you divide the total defender budget by c, you get the number of currently available threat vectors you can pro-actively defend against: N = B_d/c.
As the price of c -> 0, N approaches N_d. In other words, at that point, we can inoculate a system and defend against all potential attacks. The lower c, the more defenses can be built and the more we can harden a system.
Examples:
Biological Defense: An ASI could model every possible mutation of pathogens that can enter the human body and preemptively develop countermeasures—vaccines, treatments, or containment strategies—before an outbreak occurs.
Cybersecurity: In the realm of digital systems, an ASI could simulate all conceivable attack strategies against a network, identify bugs and vulnerabilities in real time, and patch them instantaneously.
Additional Notes:
If attackers and defenders have access to the same open source intelligence or a similar level of reasoning, it will be even easier to predict potential attack vectors and then secure systems proactively. Open source makes it much more likely this paradox will play out benevolently.
However, if attackers could reduce their costs massively and defenders cannot, this breaks down.
In most cases, we can assume that B_d >B_a: the budget for defense is higher than for attack, eg. more money is available to save humanity from deadly viruses than is being spent on creating deadly viruses.
It is also fair to assume that the cost of an attack is lower than that of a defense. This doesn’t matter in the limit though if they both go down towards zero.
If one assumed there are infinitely many attack vectors for a network or the human body, the paradox cannot play out well. But I’d argue that each has a finite limit, eg.
While the maximum number of theoretical viruses reach into the millions, viable variations are limited to approximately 40,000 due to structural constraints, binding requirements, and evolutionary stability factors (according to a ydc compute agent analysis). https://you.com/search?q=How+many+viruses+could+there+be+that+can+attack+the+human+body+given+the+number+of+binding+sites+or...&cid=c1_0e4c322b-dd52-457b-bd3a-e35cc411b5e3
In p(doom) sci-fi stories, fiction authors sometimes assume near magical capabilities, like a virus that can be dormant for years until it reaches all humans, stays completely undetected by everyone and then over night get triggered to kill everybody instantly. If such a near magical AI is available, I’d have used it to create a near magical vaccine against all such viruses given the cost is very low. Near magical capabilities are generally highly unrealistic.
This is somewhat connected to Jevon’s paradox (https://www.socher.org/thoughts/jevons-paradox-of-llms)
I look forward to well reasoned arguments for where this reasoning might be wrong.