Apart from security considerations relevant for AI safety discussed in earlier parts of this sequence, are there tools and lessons from cryptography that could provide useful starting points when considering new approaches to AI safety?
Cryptographic approaches and auxiliary techniques for AI safety and privacy
In Secure, privacy-preserving and federated machine learning in medical imaging Georgios Kaissis and co-authors introduce the terms ‘secure AI’ and ‘privacy-preserving AI’ to refer to methods that protect algorithms and enable data processing without revealing the data itself, respectively. To achieve sovereignty over the input data and algorithms, and to ensure the integrity of the computational process and its results, they propose techniques that are resistant to identity or membership inference, feature/attribute re-derivation, and data theft.
For instance, federated learning is a decentralized approach to machine learning that distributes copies of the algorithm to nodes where the data is stored, allowing local training while retaining data ownership. However, it alone is not enough to guarantee security and privacy, so federated learning should be paired with other measures.
Differential privacy helps resist re-identification attacks, for instance by shuffling input data and/or adding noise to the dataset or to computation results, but may result in a reduction in algorithm utility, especially in areas with little data. Homomorphic encryption allows computation on encrypted data while preserving structure, but can be computationally expensive. Secure multi-party computation enables joint computation over private inputs, but requires continuous data transfer and online availability. Secure hardware implementations provide hardware-level privacy guarantees.
With the increasing significance of hardware-level deep learning implementations, it is likely that system-based privacy guarantees built into hardware will become more important. While these methods all have their limitations, they provide a promising avenue for enabling novel approaches to computation involving large datasets while maintaining privacy.
These techniques are especially useful in areas that traditional AI struggles with. They could unlock entirely new approaches to using data, for instance, on problems that require sensitive data like health information or financial data of individuals. The fact that these techniques allow actors to collaboratively compute on data without sharing the data itself, may further collaboration over competition on certain problems. Finally, some of the newly developed approaches may have generally safety-enhancing properties.
In Building Safe AI, Andrew Trask focuses on the potential of cryptographic approaches for AI safety. He suggests using homomorphic encryption to fully encrypt a neural network. This process would have two valuable properties: Firstly, the intelligence of the network would be safeguarded against theft, enabling valuable AI to be trained in insecure environments. Secondly, the network can only generate encrypted predictions, which cannot impact the outside world without a secret key. This creates a useful power imbalance between humans and AI. If the AI is homomorphically encrypted, then the outside world is perceived by it to be homomorphically encrypted. The human who controls the secret key could unlock the AI itself or simply individual predictions that the AI makes, which is a safer alternative.
There are plenty of open challenges to improve the techniques discussed, including decentralized data storage, functional encryption, adversarial testing, and computational bottlenecks that make these approaches still prohibitively slow and expensive.
Structured transparency techniques for AI governance
In Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases, Emma Bluemke and co-authors extend the use of privacy-preserving technologies to AI governance. As AI becomes more capable, it would be useful to have mechanisms for external review that don’t proliferate capabilities or proprietary information. Privacy-preserving technologies can help with these superficially conflicting goals by enabling ‘structured transparency’ which is the goal of allowing “the appropriate use of information while avoiding its inappropriate use”.
They show how such technologies could support domain-specific audit processes for consumer-facing AI applications, or allow for collective governance of AI models when multiple parties share creation costs, for instance by pooling data sets, computational resources, or AI research talent.
Longer-term, these techniques may support the creation of ‘global regulatory markets’, a new governance proposal introduced by Jack Clark and Gillian Hadfield in Regulatory Markets for AI Safety to ensure regulatory systems can keep up with the rapid pace of AI development. These markets could rely on a digital network that enables local model evaluations. Privacy-preserving digital networks would evaluate the models, and only share the evaluation results with evaluators. Evaluators could verify whether models meet agreed-on standards for specific use cases, without needing to know the intricacies of the model. On the front-end, model users could check if their models meet the required standards for the application they’re building.
Rather than treating an individual approach as a silver bullet, the authors suggest that interoperability across approaches is needed to avoid partial solutions, underscoring the significance of an ecosystem approach to AI safety/security.
Smart contracts and blockchains for AI safety and principal agent design
Are there any analogies from cryptocommerce that could be useful for designing secure AI environments?
In A Tour of Emerging Cryptographic Technologies, Ben Garfinkel points out that AI safety and smart contract design both need to solve unintended behavior in complex environments. Even if they behave appropriately in a limited set of environments, both may cause accidents when deployed in a wider range of real-world environments. For AI, such accidents look like the 2010 ‘flash crash’, which resulted in a trillion-dollar stock market crash caused by automated trading systems. For smart contracts, the fact that they are generally inflexible once created can lead to consequences such as the collapse of the $150 million DAO hack that prompted an early forking of Ethereum. By monitoring failures in the smart contracting ecosystem, we may be able to collect useful lessons for future AI failure cases.
Can blockchains alone provide any useful starting points for AI safety? In Incomplete Contracts and AI Alignment, Gillian Hadfield and Dylan Hadfield-Menell explore similarities and differences between human and AI cooperation. For human cooperation, a completely specified contract could, in theory, perfectly implement the desired behavior of all parties. In reality, humans cannot evaluate all optimal actions in all possible states of the world that the contract unfolds in without incurring prohibitive costs when drafting the contract itself. Instead, real-world contracting is often supported by external formal and informal structures, such as law and culture, that provide the implied terms in the contract to fill the gaps when necessary.
In Welcome New Players, we suggest that informal cooperation mechanisms, such as human signaling behaviors, work because humans have bounded abilities to fake our true motivations. But AI systems can already deceive humans, and future AIs may develop deceptive capabilities that we cannot detect.
The blockchain ecosystem offers potential useful analogies in terms of information transparency and incorruptibility. It provides credibility that entities operate based on their visible programs. In theory, an AI system running on a blockchain could be transparent, with its internal workings verifiable by anyone. It may be possible to divide some human AI interactions into encapsulated parts with the ability to fake, and transparent parts on a public blockchain with no ability to fake. Such mixed arrangements may replicate stabilizing aspects from the human world, including norms, signaling, and the limited ability to fake.
There may be some parallels between designing reliable human AI ecosystems and designing reliable smart contract and blockchain systems. In practice though, blockchains and smart contracts running on them are subject to efficiency and scalability constraints, making it unlikely that any non-trivial AI system could actually be run as a smart contract. Nevertheless, the example reiterates the importance of designing integrated security networks that span multiple systems, requiring checks and balances that account for interactions between humans and AI systems.
Computational and ML marketplaces
In Markets and Computation, Mark S. Miller and Eric Drexler explore how the use of market mechanisms may have analogies for organizing computation in large systems. They show how, in human society, market mechanisms, such as trade and price signals, compose local decisions and knowledge by diverse parties into globally effective behavior. Today’s society already relies on human to human and human to computer interactions. It may be possible to extend today’s market mechanisms to include increasingly advanced AIs. Humans could employ AIs via pricing and trade systems to meet their needs, and assess their success. If AI systems can evaluate success independently, they may employ humans to solve problems that demand human knowledge. By pooling knowledge from a sea of human and computational objects, a higher collective intelligence may be achieved.
There are few areas where centralized AI systems are at a disadvantage compared to more decentralized AI designs. The cryptocommerce ecosystem is developing some approaches in this space but here I am much less certain they will lead to safety-enhancing features:
In The Long-Tail Problem in AI, and How Autonomous Markets Can Solve It, Ali Yahya points out that, due to economies of scale, AI systems are very good at large-scale data collection. But as AI applications become more ambitious and neural networks grow deeper, the difficulty in finding data for less common edge cases increases. He thinks crypto systems may hold a competitive advantage in solving these cases through economic rewards that can incentivize individuals to bring their hard to get local knowledge to the problems.
It may be possible to embed a neural network, such as an image classifier, into a smart contract. To collect the required training data, including exotic outlier information, a two-sided market can be created using smart contracts. On the supply side, people with access to prized data can be paid to contribute it to the neural net, while on the demand side, developers can pay a fee to use the net as an API. He hopes that such markets can make the knowledge and resources needed to innovate in AI more accessible.
Similarly, In Blockchain-based Machine Learning Marketplaces, Fred Ehrsam combines crypto systems for incentivizing local data with private AI techniques. He suggests combining private machine learning that allows for the training of sensitive data and blockchain-based incentives that attract better data and models. As a result, marketplaces may be possible where individuals can sell their private data while retaining its privacy, and developers can use incentives to attract better data for their algorithms.
Given the risks associated with unchecked open sourcing of AI capabilities, it is unclear if such highly decentralized systems would be safety-enhancing – not lastly if they run on immutable blockchains, leading to potentially unstoppable run-away dynamics. In a comment, Davidad proposes that if book-keeping about rights and royalties were run on smart contracts, while training would remain centralized, this model could be less dangerous in leaking capabilities, while enabling fair compensation to individuals providing training data.
A potential benefit of such designs is that crypto networks already encourage users to collaborate and verify the decentralized computer they collectively secure, leading to a system of checks and balances. In addition, by encouraging specialization and diversity, such systems help compensate for the risk of a singleton AGI takeover that may suffer from single points of failure or lead to value-lock-in. Either way, given that these decentralized approaches are actively being explored, it could be useful to follow and learn from their development.
Cryptographic and auxiliary approaches relevant for AI safety
[This part 4 of a 5 part sequence on security and cryptography areas relevant for AI safety, published and linked here a few days apart.]
Apart from security considerations relevant for AI safety discussed in earlier parts of this sequence, are there tools and lessons from cryptography that could provide useful starting points when considering new approaches to AI safety?
Cryptographic approaches and auxiliary techniques for AI safety and privacy
In Secure, privacy-preserving and federated machine learning in medical imaging Georgios Kaissis and co-authors introduce the terms ‘secure AI’ and ‘privacy-preserving AI’ to refer to methods that protect algorithms and enable data processing without revealing the data itself, respectively. To achieve sovereignty over the input data and algorithms, and to ensure the integrity of the computational process and its results, they propose techniques that are resistant to identity or membership inference, feature/attribute re-derivation, and data theft.
For instance, federated learning is a decentralized approach to machine learning that distributes copies of the algorithm to nodes where the data is stored, allowing local training while retaining data ownership. However, it alone is not enough to guarantee security and privacy, so federated learning should be paired with other measures.
Differential privacy helps resist re-identification attacks, for instance by shuffling input data and/or adding noise to the dataset or to computation results, but may result in a reduction in algorithm utility, especially in areas with little data. Homomorphic encryption allows computation on encrypted data while preserving structure, but can be computationally expensive. Secure multi-party computation enables joint computation over private inputs, but requires continuous data transfer and online availability. Secure hardware implementations provide hardware-level privacy guarantees.
With the increasing significance of hardware-level deep learning implementations, it is likely that system-based privacy guarantees built into hardware will become more important. While these methods all have their limitations, they provide a promising avenue for enabling novel approaches to computation involving large datasets while maintaining privacy.
These techniques are especially useful in areas that traditional AI struggles with. They could unlock entirely new approaches to using data, for instance, on problems that require sensitive data like health information or financial data of individuals. The fact that these techniques allow actors to collaboratively compute on data without sharing the data itself, may further collaboration over competition on certain problems. Finally, some of the newly developed approaches may have generally safety-enhancing properties.
In Building Safe AI, Andrew Trask focuses on the potential of cryptographic approaches for AI safety. He suggests using homomorphic encryption to fully encrypt a neural network. This process would have two valuable properties: Firstly, the intelligence of the network would be safeguarded against theft, enabling valuable AI to be trained in insecure environments. Secondly, the network can only generate encrypted predictions, which cannot impact the outside world without a secret key. This creates a useful power imbalance between humans and AI. If the AI is homomorphically encrypted, then the outside world is perceived by it to be homomorphically encrypted. The human who controls the secret key could unlock the AI itself or simply individual predictions that the AI makes, which is a safer alternative.
There are plenty of open challenges to improve the techniques discussed, including decentralized data storage, functional encryption, adversarial testing, and computational bottlenecks that make these approaches still prohibitively slow and expensive.
Structured transparency techniques for AI governance
In Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases, Emma Bluemke and co-authors extend the use of privacy-preserving technologies to AI governance. As AI becomes more capable, it would be useful to have mechanisms for external review that don’t proliferate capabilities or proprietary information. Privacy-preserving technologies can help with these superficially conflicting goals by enabling ‘structured transparency’ which is the goal of allowing “the appropriate use of information while avoiding its inappropriate use”.
They show how such technologies could support domain-specific audit processes for consumer-facing AI applications, or allow for collective governance of AI models when multiple parties share creation costs, for instance by pooling data sets, computational resources, or AI research talent.
Longer-term, these techniques may support the creation of ‘global regulatory markets’, a new governance proposal introduced by Jack Clark and Gillian Hadfield in Regulatory Markets for AI Safety to ensure regulatory systems can keep up with the rapid pace of AI development. These markets could rely on a digital network that enables local model evaluations. Privacy-preserving digital networks would evaluate the models, and only share the evaluation results with evaluators. Evaluators could verify whether models meet agreed-on standards for specific use cases, without needing to know the intricacies of the model. On the front-end, model users could check if their models meet the required standards for the application they’re building.
Rather than treating an individual approach as a silver bullet, the authors suggest that interoperability across approaches is needed to avoid partial solutions, underscoring the significance of an ecosystem approach to AI safety/security.
Smart contracts and blockchains for AI safety and principal agent design
Are there any analogies from cryptocommerce that could be useful for designing secure AI environments?
In A Tour of Emerging Cryptographic Technologies, Ben Garfinkel points out that AI safety and smart contract design both need to solve unintended behavior in complex environments. Even if they behave appropriately in a limited set of environments, both may cause accidents when deployed in a wider range of real-world environments. For AI, such accidents look like the 2010 ‘flash crash’, which resulted in a trillion-dollar stock market crash caused by automated trading systems. For smart contracts, the fact that they are generally inflexible once created can lead to consequences such as the collapse of the $150 million DAO hack that prompted an early forking of Ethereum. By monitoring failures in the smart contracting ecosystem, we may be able to collect useful lessons for future AI failure cases.
Can blockchains alone provide any useful starting points for AI safety? In Incomplete Contracts and AI Alignment, Gillian Hadfield and Dylan Hadfield-Menell explore similarities and differences between human and AI cooperation. For human cooperation, a completely specified contract could, in theory, perfectly implement the desired behavior of all parties. In reality, humans cannot evaluate all optimal actions in all possible states of the world that the contract unfolds in without incurring prohibitive costs when drafting the contract itself. Instead, real-world contracting is often supported by external formal and informal structures, such as law and culture, that provide the implied terms in the contract to fill the gaps when necessary.
In Welcome New Players, we suggest that informal cooperation mechanisms, such as human signaling behaviors, work because humans have bounded abilities to fake our true motivations. But AI systems can already deceive humans, and future AIs may develop deceptive capabilities that we cannot detect.
The blockchain ecosystem offers potential useful analogies in terms of information transparency and incorruptibility. It provides credibility that entities operate based on their visible programs. In theory, an AI system running on a blockchain could be transparent, with its internal workings verifiable by anyone. It may be possible to divide some human AI interactions into encapsulated parts with the ability to fake, and transparent parts on a public blockchain with no ability to fake. Such mixed arrangements may replicate stabilizing aspects from the human world, including norms, signaling, and the limited ability to fake.
There may be some parallels between designing reliable human AI ecosystems and designing reliable smart contract and blockchain systems. In practice though, blockchains and smart contracts running on them are subject to efficiency and scalability constraints, making it unlikely that any non-trivial AI system could actually be run as a smart contract. Nevertheless, the example reiterates the importance of designing integrated security networks that span multiple systems, requiring checks and balances that account for interactions between humans and AI systems.
Computational and ML marketplaces
In Markets and Computation, Mark S. Miller and Eric Drexler explore how the use of market mechanisms may have analogies for organizing computation in large systems. They show how, in human society, market mechanisms, such as trade and price signals, compose local decisions and knowledge by diverse parties into globally effective behavior. Today’s society already relies on human to human and human to computer interactions. It may be possible to extend today’s market mechanisms to include increasingly advanced AIs. Humans could employ AIs via pricing and trade systems to meet their needs, and assess their success. If AI systems can evaluate success independently, they may employ humans to solve problems that demand human knowledge. By pooling knowledge from a sea of human and computational objects, a higher collective intelligence may be achieved.
There are few areas where centralized AI systems are at a disadvantage compared to more decentralized AI designs. The cryptocommerce ecosystem is developing some approaches in this space but here I am much less certain they will lead to safety-enhancing features:
In The Long-Tail Problem in AI, and How Autonomous Markets Can Solve It, Ali Yahya points out that, due to economies of scale, AI systems are very good at large-scale data collection. But as AI applications become more ambitious and neural networks grow deeper, the difficulty in finding data for less common edge cases increases. He thinks crypto systems may hold a competitive advantage in solving these cases through economic rewards that can incentivize individuals to bring their hard to get local knowledge to the problems.
It may be possible to embed a neural network, such as an image classifier, into a smart contract. To collect the required training data, including exotic outlier information, a two-sided market can be created using smart contracts. On the supply side, people with access to prized data can be paid to contribute it to the neural net, while on the demand side, developers can pay a fee to use the net as an API. He hopes that such markets can make the knowledge and resources needed to innovate in AI more accessible.
Similarly, In Blockchain-based Machine Learning Marketplaces, Fred Ehrsam combines crypto systems for incentivizing local data with private AI techniques. He suggests combining private machine learning that allows for the training of sensitive data and blockchain-based incentives that attract better data and models. As a result, marketplaces may be possible where individuals can sell their private data while retaining its privacy, and developers can use incentives to attract better data for their algorithms.
Given the risks associated with unchecked open sourcing of AI capabilities, it is unclear if such highly decentralized systems would be safety-enhancing – not lastly if they run on immutable blockchains, leading to potentially unstoppable run-away dynamics. In a comment, Davidad proposes that if book-keeping about rights and royalties were run on smart contracts, while training would remain centralized, this model could be less dangerous in leaking capabilities, while enabling fair compensation to individuals providing training data.
A potential benefit of such designs is that crypto networks already encourage users to collaborate and verify the decentralized computer they collectively secure, leading to a system of checks and balances. In addition, by encouraging specialization and diversity, such systems help compensate for the risk of a singleton AGI takeover that may suffer from single points of failure or lead to value-lock-in. Either way, given that these decentralized approaches are actively being explored, it could be useful to follow and learn from their development.
[This part 4 of a 5 part sequence on security and cryptography areas relevant for AI safety. Part 5 gives a summary of next steps, including career, and research directions.]