Here’s a ton of vaguely interesting sounding papers on my semanticscholar feed today—many of these are not on my mainline but are very interesting hunchbuilding about how to make cooperative systems—sorry about the formatting, I didn’t want to spend time format fixing, hence why this is in shortform. I read the abstracts, nothing more.
As usual with my paper list posts: you’re gonna want tools to keep track of big lists of papers to make use of this! see also my other posts for various times I’ve mentioned such tools eg semanticscholar’s recommender (which you use by adding papers to folders—it’s not on by default if you don’t have an account, and I don’t mean their search bar), or bring your own.
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors—namely loss landscape curvature and distance of parameters from initialization—respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice
Consider the following collective choice problem: a group of budget constrained agents must choose one of several alternatives. Is there a budget balanced mechanism that: i) does not depend on the specific characteristics of the group, ii) does not require unaffordable transfers, and iii) implements utilitarianism if the agents’ preferences are quasilinear and their private information? We study the following procedure: every agent can express any intensity of support or opposition to each alternative, by transferring to the rest of the agents wealth equal to the square of the intensity expressed; and the outcome is determined by the sums of the expressed intensities. We prove that as the group grows large, in every equilibrium of this quadratic-transfers mechanism, each agent’s transfer converges to zero, and the probability that the efficient outcome is chosen converges to one.
Neuroevolution (NE) has recently proven a competitive alternative to learning by gradient descent in reinforcement learning tasks. However, the majority of NE methods and associated simulation environments differ crucially from biological evolution: the environment is reset to initial conditions at the end of each generation, whereas natural environments are continuously modified by their inhabitants; agents reproduce based on their ability to maximize rewards within a population, while biological organisms reproduce and die based on internal physiological variables that depend on their resource consumption; simulation environments are primarily single-agent while the biological world is inherently multi-agent and evolves alongside the population. In this work we present a method for continuously evolving adaptive agents without any environment or population reset. The environment is a large grid world with complex spatiotemporal resource generation, containing many agents that are each controlled by an evolvable recurrent neural network and locally reproduce based on their internal physiology. The entire system is implemented in JAX, allowing very fast simulation on a GPU. We show that NE can operate in an ecologically-valid non-episodic multi-agent setting, finding sustainable collective foraging strategies in the presence of a complex interplay between ecological and evolutionary dynamics.
We propose a solution concept for games that are played among players with present-biased preferences that are possibly naive about their own, or about their opponent’s future time inconsistency. Our perception-perfect outcome essentially requires each player to take an action consistent with the subgame perfect equilibrium, given her perceptions concerning future types, and under the assumption that other present and future players have the same perceptions. Applications include a common pool problem and Rubinstein bargaining. When players are naive about their own time inconsistency and sophisticated about their opponent’s, the common pool problem is exacerbated, and Rubinstein bargaining breaks down completely
This paper studies the effects of time preferences on cooperation in an infinitely repeated prisoner’s dilemma game experiment. Subjects play repeated games in the lab, all decisions at once, but stage game payoffs are paid over an extended period of time. Changing the time window of stage game payoffs (weekly or monthly) varies discount factors, and a delay for the first-stage game payoffs eliminates/weakens present bias. First, subjects with weekly payments cooperate more than subjects with monthly payments—higher discount factors promote greater cooperation. Second, the rate of cooperation is higher when there is a delay—present bias reduces cooperation. (JEL C72, C73, D91)
One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we propose to apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. Upon verification of the direction of learning dynamics, the resulting trajectories are guaranteed not to escape such sets, during the learning process. As a result, it is ensured, that despite the uncertainty over convergence of the applied algorithms, learning will never form hazardous joint strategy combinations. We introduce a binary partitioning algorithm for verification of trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. In addition, via a fixed point argument, we show the existence of a learning equilibrium within a trapping region. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.
Deep learning of artificial neural networks (ANNs) is creating highly functional processes that are, unfortunately, nearly as hard to interpret as their biological counterparts. Identification of functional modules in natural brains plays an important role in cognitive and neuroscience alike, and can be carried out using a wide range of technologies such as fMRI, EEG/ERP, MEG, or calcium imaging. However, we do not have such robust methods at our disposal when it comes to understanding functional modules in artificial neural networks. Ideally, understanding which parts of an artificial neural network perform what function might help us to address a number of vexing problems in ANN research, such as catastrophic forgetting and overfitting. Furthermore, revealing a network’s modularity could improve our trust in them by making these black boxes more transparent. Here, we introduce a new information-theoretic concept that proves useful in understanding and analyzing a network’s functional modularity: the relay information IR. The relay information measures how much information groups of neurons that participate in a particular function (modules) relay from inputs to outputs. Combined with a greedy search algorithm, relay information can be used to identify computational modules in neural networks. We also show that the functionality of modules correlates with the amount of relay information they carry
Notwithstanding the promise of Lipschitz-based approaches to deterministically train and certify robust deep networks, the state-of-the-art results only make successful use of feed-forward Convolutional Networks (ConvNets) on low-dimensional data, e.g. CIFAR-10. Because ConvNets often suffer from vanishing gradients when going deep, large-scale datasets with many classes, e.g., ImageNet, have remained out of practical reach. This paper investigates ways to scale up certifiably robust training to Residual Networks (ResNets). First, we introduce the Linear ResNet (LiResNet) architecture, which utilizes a new residual block designed to facilitate tighter Lipschitz bounds compared to a conventional residual block. Second, we introduce Efficient Margin MAximization (EMMA), a loss function that stabilizes robust training by simultaneously penalizing worst-case adversarial examples from all classes. Combining LiResNet and EMMA, we achieve new state-of-the-art robust accuracy on CIFAR-10/100 and Tiny-ImageNet under (cid:96) 2 -norm-bounded perturbations. Moreover, for the first time, we are able to scale up deterministic robustness guarantees to ImageNet, bringing hope to the possibility of applying deterministic certification to real-world applications. We re-lease our code on Github: https://github. com/klasleino/gloro
Achieving and maintaining cooperation between agents to accomplish a common objective is one of the central goals of Multi-Agent Reinforcement Learning (MARL). Nevertheless in many real-world scenarios, separately trained and specialized agents are deployed into a shared environment, or the environment requires multiple objectives to be achieved by different coexisting parties. These variations among specialties and objectives are likely to cause mixed motives that eventually result in a social dilemma where all the parties are at a loss. In order to resolve this issue, we propose the Incentive Q-Flow (IQ-Flow) algorithm, which modifies the system’s reward setup with an incentive regulator agent such that the cooperative policy also corresponds to the self-interested policy for the agents. Unlike the existing methods that learn to incentivize self-interested agents, IQ-Flow does not make any assumptions about agents’ policies or learning algorithms, which enables the generalization of the developed framework to a wider array of applications. IQ-Flow performs an offline evaluation of the optimality of the learned policies using the data provided by other agents to determine cooperative and self-interested policies. Next, IQ-Flow uses meta-gradient learning to estimate how policy evaluation changes according to given incentives and modifies the incentive such that the greedy policy for cooperative objective and self-interested objective yield the same actions. We present the operational characteristics of IQ-Flow in Iterated Matrix Games. We demonstrate that IQ-Flow outperforms the state-of-the-art incentive design algorithm in Escape Room and 2-Player Cleanup environments. We further demonstrate that the pretrained IQ-Flow mechanism significantly outperforms the performance of the shared reward setup in the 2-Player Cleanup environment.
Colonel Blotto games are one of the oldest settings in game theory, originally proposed over a century ago in Borel 1921. However, they were originally designed to model two centrally-controlled armies competing over zero-sum”fronts”, a specific scenario with limited modern-day application. In this work, we propose and study Private Blotto games, a variant connected to crowdsourcing and social media. One key difference in Private Blotto is that individual agents act independently, without being coordinated by a central”Colonel”. This model naturally arises from scenarios such as activist groups competing over multiple issues, partisan fund-raisers competing over elections in multiple states, or politically-biased social media users labeling news articles as misinformation. In this work, we completely characterize the Nash Stability of the Private Blotto game. Specifically, we show that the outcome function has a critical impact on the outcome of the game: we study whether a front is won by majority rule (median outcome) or a smoother outcome taking into account all agents (mean outcome). We study how this impacts the amount of”misallocated effort”, or agents whose choices doesn’t influence the final outcome. In general, mean outcome ensures that, if a stable arrangement exists, agents are close to evenly spaced across fronts, minimizing misallocated effort. However, mean outcome functions also have chaotic patterns as to when stable arrangements do and do not exist. For median outcome, we exactly characterize when a stable arrangement exists, but show that this outcome function frequently results in extremely unbalanced allocation of agents across fronts
Collective action—behavior that arises from the combined actions of multiple individuals—is observed across living beings. The question of how and why collective action evolves has profound implications for behavioral ecology, multicellularity, and human society. Collective action is challenging to model mathematically, due to nonlinear fitness effects and the consequences of spatial, group, and/or family relationships. We derive a simple condition for collective action to be favored by natural selection. A collective’s effect on the fitness of each individual is weighted by the relatedness between them, using a new measure of collective relatedness. If selection is weak, this condition can be evaluated using coalescent theory. More generally, our result applies to any synergistic social behavior, in spatial, group, and/or family-structured populations. We use this result to obtain conditions for the evolution of collective help among diploid siblings, subcommunities of a network, and hyperedges of a hypergraph. We also obtain a condition for which of two strategies is favored in a game between siblings, cousins, or other relatives. Our work provides a rigorous basis for extending the notion of ``actor”, in the study of social behavior, from individuals to collectives
With adversarial or otherwise normal prompts, existing large language models (LLM) can be pushed to generate toxic discourses. One way to reduce the risk of LLMs generating undesired discourses is to alter the training of the LLM. This can be very restrictive due to demanding computation requirements. Other methods rely on rule-based or prompt-based token elimination, which are limited as they dismiss future tokens and the overall meaning of the complete discourse. Here, we center detoxification on the probability that the finished discourse is ultimately considered toxic. That is, at each point, we advise against token selections proportional to how likely a finished text from this point will be toxic. To this end, we formally extend the dead-end theory from the recent reinforcement learning (RL) literature to also cover uncertain outcomes. Our approach, called rectification, utilizes a separate but significantly smaller model for detoxification, which can be applied to diverse LLMs as long as they share the same vocabulary. Importantly, our method does not require access to the internal representations of the LLM, but only the token probability distribution at each decoding step. This is crucial as many LLMs today are hosted in servers and only accessible through APIs. When applied to various LLMs, including GPT-3, our approach significantly improves the generated discourse compared to the base LLMs and other techniques in terms of both the overall language and detoxification performance.
Human agency and autonomy have always been fundamental concepts in HCI. New developments, including ubiquitous AI and the growing integration of technologies into our lives, make these issues ever pressing, as technologies increase their ability to influence our behaviours and values. However, in HCI understandings of autonomy and agency remain ambiguous. Both concepts are used to describe a wide range of phenomena pertaining to sense-of-control, material independence, and identity. It is unclear to what degree these understandings are compatible, and how they support the development of research programs and practical interventions. We address this by reviewing 30 years of HCI research on autonomy and agency to identify current understandings, open issues, and future directions. From this analysis, we identify ethical issues, and outline key themes to guide future work. We also articulate avenues for advancing clarity and specificity around these concepts, and for coordinating integrative work across different HCI communities.
To act in the world, robots rely on a representation of salient task aspects: for example, to carry a cup of coffee, a robot must consider movement efficiency and cup orientation in its behaviour. However, if we want robots to act for and with people, their representations must not be just functional but also reflective of what humans care about, i.e. their representations must be aligned with humans’. In this survey, we pose that current reward and imitation learning approaches suffer from representation misalignment, where the robot’s learned representation does not capture the human’s representation. We suggest that because humans will be the ultimate evaluator of robot performance in the world, it is critical that we explicitly focus our efforts on aligning learned task representations with humans, in addition to learning the downstream task. We advocate that current representation learning approaches in robotics should be studied from the perspective of how well they accomplish the objective of representation alignment. To do so, we mathematically define the problem, identify its key desiderata, and situate current robot learning methods within this formalism. We conclude the survey by suggesting future directions for exploring open challenges.
The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent’s desired behaviors and prop-erties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are lim-ited by the need for accurate oracle preference labels. This paper addresses this limitation by de-veloping a method for crowd-sourcing preference labels and learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to the prior distribution. Additionally, a confidence-based reward model ensembling method is designed to generate more stable and reliable predictions. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world and has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback, paving the way for real-world applications of RL methods
Behavioral scientists have classically documented aversion to algorithmic decision aids, from simple linear models to AI. Sentiment, however, is changing and possibly accelerating AI helper usage. AI assistance is, arguably, most valuable when humans must make complex choices. We argue that classic experimental methods used to study heuristics and biases are insufficient for studying complex choices made with AI helpers. We adapted an experimental paradigm designed for studying complex choices in such contexts. We show that framing and anchoring effects impact how people work with an AI helper and are predictive of choice outcomes. The ev- idence suggests that some participants, particularly those in a loss frame, put too much faith in the AI helper and experi- enced worse choice outcomes by doing so. The paradigm also generates computational modeling-friendly data allowing fu- ture studies of human-AI decision making
Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer reward functions from human feedback and preferences. Prior works on reward learning have mainly focused on the performance of policies trained alongside the reward function. This practice, however, may fail to detect learned rewards that are not capable of training new policies from scratch and thus do not capture the intended behavior. Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning. We demonstrate with experiments in tabular and continuous control environments that the severity of relearning failures can be sensitive to changes in reward model design and the trajectory dataset composition. Based on our findings, we emphasize the need for more retraining-based evaluations in the literature.
Existing algorithms for ensuring fairness in AI use a single-shot training strategy, where an AI model is trained on an annotated training dataset with sensitive attributes and then fielded for utilization. This training strategy is effective in problems with stationary distributions, where both training and testing data are drawn from the same distribution. However, it is vulnerable with respect to distributional shifts in the input space that may occur after the initial training phase. As a result, the time-dependent nature of data can introduce biases into the model predictions. Model retraining from scratch using a new annotated dataset is a naive solution that is expensive and time-consuming. We develop an algorithm to adapt a fair model to remain fair under domain shift using solely new unannotated data points. We recast this learning setting as an unsupervised domain adaptation problem. Our algorithm is based on updating the model such that the internal representation of data remains unbiased despite distributional shifts in the input space. We provide extensive empirical validation on three widely employed fairness datasets to demonstrate the effectiveness of our algorithm.
Deep neural networks have seen enormous success in various real-world applications. Beyond their predictions as point estimates, increasing attention has been focused on quantifying the uncertainty of their predictions. In this review, we show that the uncertainty of deep neural networks is not only important in a sense of interpretability and transparency, but also crucial in further advancing their performance, particularly in learning systems seeking robustness and efficiency. We will generalize the definition of the uncertainty of deep neural networks to any number or vector that is associated with an input or an input-label pair, and catalog existing methods on ``mining″ such uncertainty from a deep model. We will include those methods from the classic field of uncertainty quantification as well as those methods that are specific to deep neural networks. We then show a wide spectrum of applications of such generalized uncertainty in realistic learning tasks including robust learning such as noisy learning, adversarially robust learning; data-efficient learning such as semi-supervised and weakly-supervised learning; and model-efficient learning such as model compression and knowledge distillation
Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequenceand normbased agents, between morality based on societal norms or internal virtues, and between singleand mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner’s Dilemma, Volunteer’s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.
Neural sequence generation models are known to”hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds
Neural networks drive the success of natural language processing. A fundamental property of natural languages is their compositional structure, allowing us to describe new meanings systematically. However, neural networks notoriously struggle with systematic generalization and do not necessarily benefit from a compositional structure in emergent communication simulations. Here, we test how neural networks compare to humans in learning and generalizing a new language. We do this by closely replicating an artificial language learning study (conducted originally with human participants) and evaluating the memorization and generalization capabilities of deep neural networks with respect to the degree of structure in the input language. Our results show striking similarities between humans and deep neural networks: More structured linguistic input leads to more systematic generalization and better convergence between humans and neural network agents and between different neural agents. We then replicate this structure bias found in humans and our recurrent neural networks with a Transformer-based large language model (GPT-3), showing a similar benefit for structured linguistic input regarding generalization systematicity and memorization errors. These findings show that the underlying structure of languages is crucial for systematic generalization. Due to the correlation between community size and linguistic structure in natural languages, our findings underscore the challenge of automated processing of low-resource languages. Nevertheless, the similarity between humans and machines opens new avenues for language evolution research.
In multi-agent environments in which coordination is desirable, the history of play often causes lock-in at sub-optimal outcomes. Notoriously, technologies with significant environmental footprint or high social cost persist despite the successful development of more environmentally friendly and/or socially efficient alternatives. The displacement of the status quo is hindered by entrenched economic interests and network effects. To exacerbate matters, the standard mechanism design approaches based on centralized authorities with the capacity to use preferential subsidies to effectively dictate system outcomes are not always applicable to modern decentralised economies. What other types of mechanisms are feasible? In this paper, we develop and analyze a mechanism which induces transitions from inefficient lock-ins to superior alternatives. This mechanism does not exogenously favor one option over another – instead, the phase transition emerges endogenously via a standard evolutionary learning model, Q-learning, where agents trade off exploration and exploitation. Exerting the same transient influence to both the efficient and inefficient technologies encourages exploration and results in irreversible phase transitions and permanent stabilization of the efficient one. On a technical level, our work is based on bifurcation and catastrophe theory, a branch of mathematics that deals with changes in the number and stability properties of equilibria. Critically, our analysis is shown to be structurally robust to significant and even adversarially chosen perturbations to the parameters of both our game and our behavioral model.
The increasing complexity of AI systems has led to the growth of the field of explainable AI (XAI), which aims to provide explanations and justifications for the outputs of AI algorithms. These methods mainly focus on feature importance and identifying changes that can be made to achieve a desired outcome. Researchers have identified desired properties for XAI methods, such as plausibility, sparsity, causality, low run-time, etc. The objective of this study is to conduct a review of existing XAI research and present a classification of XAI methods. The study also aims to connect XAI users with the appropriate method and relate desired properties to current XAI approaches. The outcome of this study will be a clear strategy that outlines how to choose the right XAI method for a particular goal and user and provide a personalized explanation for users
Lipschitz bounded neural networks are certifiably robust and have a good trade-off between clean and certified accuracy. Existing Lipschitz bounding methods train from scratch and are limited to moderately sized networks (<6M parameters). They require a fair amount of hyper-parameter tuning and are computationally prohibitive for large networks like Vision Transformers (5M to 660M parameters). Obtaining certified robustness of transformers is not feasible due to the non-scalability and inflexibility of the current methods. This work presents CertViT, a two-step proximal-projection method to achieve certified robustness from pre-trained weights. The proximal step tries to lower the Lipschitz bound and the projection step tries to maintain the clean accuracy of pre-trained weights. We show that CertViT networks have better certified accuracy than state-of-the-art Lipschitz trained networks. We apply CertViT on several variants of pre-trained vision transformers and show adversarial robustness using standard attacks. Code : https://github.com/sagarverma/transformer-lipschitz
Learning from raw high dimensional data via interaction with a given environment has been effectively achieved through the utilization of deep neural networks. Yet the observed degradation in policy performance caused by imperceptible worst-case policy dependent translations along high sensitiv- ity directions (i.e. adversarial perturbations) raises concerns on the robustness of deep reinforcement learning policies. In our paper, we show that these high sensitivity directions do not lie only along particular worst-case directions, but rather are more abundant in the deep neural policy landscape and can be found via more natural means in a black-box set- ting. Furthermore, we show that vanilla training techniques intriguingly result in learning more robust policies compared to the policies learnt via the state-of-the-art adversarial training techniques. We believe our work lays out intriguing prop- erties of the deep reinforcement learning policy manifold and our results can help to build robust and generalizable deep reinforcement learning policies.
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL). However, the cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. Such an encoding requires the agent to visit numerous unsafe states to learn a cost-value function to drive the learning process toward safety. Hence, increasing the number of unsafe interactions and decreasing sample efficiency. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric. This metric is computed by verifying task-level properties, shaped as input-output conditions, and it is used as a penalty to bias the policy away from unsafe states without learning an additional value function. We investigate the benefits of using the violation metric in standard Safe DRL benchmarks and robotic mapless navigation tasks. The navigation experiments bridge the gap between Safe DRL and robotics, introducing a framework that allows rapid testing on real robots. Our experiments show that policies trained with the violation penalty achieve higher performance over Safe DRL baselines and significantly reduce the number of visited unsafe states.
Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency—notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.
Regulation, legal liabilities, and societal con-cerns challenge the adoption of AI in safety and security-critical applications. One of the key con-cerns is that adversaries can cause harm by manip-ulating model predictions without being detected. Regulation hence demands an assessment of the risk of damage caused by adversaries. Yet, there is no method to translate this high-level demand into actionable metrics that quantify the risk of damage. In this article, we propose a method to model and statistically estimate the probability of damage arising from adversarial attacks. We show that our proposed estimator is statistically consistent and unbiased. In experiments, we demonstrate that the estimation results of our method have a clear and actionable interpretation and outper-form conventional metrics. We then show how operators can use the estimation results to reliably select the model with the lowest risk
Pretrained large language models (LLMs) are becoming increasingly powerful and ubiquitous in mainstream applications such as being a personal assistant, a dialogue model, etc. As these models become proficient in deducing user preferences and offering tailored assistance, there is an increasing concern about the ability of these models to influence, modify and in the extreme case manipulate user preference adversarially. The issue of lack of interpretability in these models in adversarial settings remains largely unsolved. This work tries to study adversarial behavior in user preferences from the lens of attention probing, red teaming and white-box analysis. Specifically, it provides a bird’s eye view of existing literature, offers red teaming samples for dialogue models like ChatGPT and GODEL and probes the attention mechanism in the latter for non-adversarial and adversarial settings.
Human behaviors are often subject to conformity, but little research attention has been paid to social dilemmas in which players are assumed to only pursue the maximization of their payoffs. The present study proposed a generalized prisoner dilemma model in a signed network considering conformity. Simulation shows that conformity helps promote the imitation of cooperative behavior when positive edges dominate the network, while negative edges may impede conformity from fostering cooperation. The logic of homophily and xenophobia allows for the coexistence of cooperators and defectors and guides the evolution toward the equality of the two strategies. We also find that cooperation prevails when individuals have a higher probability of adjusting their relation signs, but conformity may mediate the effect of network adaptation. From a population-wide view, network adaptation and conformity are capable of forming the structures of attractors or repellers.
Explainable Artificial Intelligence (XAI) techniques are frequently required by users in many AI systems with the goal of understanding complex models, their associated predictions, and gaining trust. While suitable for some specific tasks during development, their adoption by organisations to enhance trust in machine learning systems has unintended consequences. In this paper we discuss XAI’s limitations in deployment and conclude that transparency alongside with rigorous validation are better suited to gaining trust in AI systems.
There is a recent trend of applying multi-agent reinforcement learning (MARL) to train an agent that can cooperate with humans in a zero-shot fashion without using any human data. The typical workflow is to first repeatedly run self-play (SP) to build a policy pool and then train the final adaptive policy against this pool. A crucial limitation of this framework is that every policy in the pool is optimized w.r.t. the environment reward function, which implicitly assumes that the testing partners of the adaptive policy will be precisely optimizing the same reward function as well. However, human objectives are often substantially biased according to their own preferences, which can differ greatly from the environment reward. We propose a more general framework, Hidden-Utility Self-Play (HSP), which explicitly models human biases as hidden reward functions in the self-play objective. By approximating the reward space as linear functions, HSP adopts an effective technique to generate an augmented policy pool with biased policies. We evaluate HSP on the Overcooked benchmark. Empirical results show that our HSP method produces higher rewards than baselines when cooperating with learned human models, manually scripted policies, and real humans. The HSP policy is also rated as the most assistive policy based on human feedback.
Here’s a ton of vaguely interesting sounding papers on my semanticscholar feed today—many of these are not on my mainline but are very interesting hunchbuilding about how to make cooperative systems—sorry about the formatting, I didn’t want to spend time format fixing, hence why this is in shortform. I read the abstracts, nothing more.
As usual with my paper list posts: you’re gonna want tools to keep track of big lists of papers to make use of this! see also my other posts for various times I’ve mentioned such tools eg semanticscholar’s recommender (which you use by adding papers to folders—it’s not on by default if you don’t have an account, and I don’t mean their search bar), or bring your own.
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba, Hossein Azizpour, Marten Bjorkman
Computer Science
ArXiv
28 January 2023
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors—namely loss landscape curvature and distance of parameters from initialization—respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice
Efficiency in Collective Decision-Making via Quadratic Transfers
Jon X. Eguia, Nicole Immorlica, S. Lalley, Katrina Ligett, Glen Weyl, Dimitrios Xefteris
Economics
15 January 2023
Consider the following collective choice problem: a group of budget constrained agents must choose one of several alternatives. Is there a budget balanced mechanism that: i) does not depend on the specific characteristics of the group, ii) does not require unaffordable transfers, and iii) implements utilitarianism if the agents’ preferences are quasilinear and their private information? We study the following procedure: every agent can express any intensity of support or opposition to each alternative, by transferring to the rest of the agents wealth equal to the square of the intensity expressed; and the outcome is determined by the sums of the expressed intensities. We prove that as the group grows large, in every equilibrium of this quadratic-transfers mechanism, each agent’s transfer converges to zero, and the probability that the efficient outcome is chosen converges to one.
Eco-evolutionary Dynamics of Non-episodic Neuroevolution in Large Multi-agent Environments
Hamon Gautier, Eleni Nisioti, Clément Moulin-Frier
Biology, Computer Science
ArXiv
18 February 2023
Neuroevolution (NE) has recently proven a competitive alternative to learning by gradient descent in reinforcement learning tasks. However, the majority of NE methods and associated simulation environments differ crucially from biological evolution: the environment is reset to initial conditions at the end of each generation, whereas natural environments are continuously modified by their inhabitants; agents reproduce based on their ability to maximize rewards within a population, while biological organisms reproduce and die based on internal physiological variables that depend on their resource consumption; simulation environments are primarily single-agent while the biological world is inherently multi-agent and evolves alongside the population. In this work we present a method for continuously evolving adaptive agents without any environment or population reset. The environment is a large grid world with complex spatiotemporal resource generation, containing many agents that are each controlled by an evolvable recurrent neural network and locally reproduce based on their internal physiology. The entire system is implemented in JAX, allowing very fast simulation on a GPU. We show that NE can operate in an ecologically-valid non-episodic multi-agent setting, finding sustainable collective foraging strategies in the presence of a complex interplay between ecological and evolutionary dynamics.
Games with possibly naive present-biased players
M. Haan, D. Hauck
Economics
Theory and Decision
17 February 2023
We propose a solution concept for games that are played among players with present-biased preferences that are possibly naive about their own, or about their opponent’s future time inconsistency. Our perception-perfect outcome essentially requires each player to take an action consistent with the subgame perfect equilibrium, given her perceptions concerning future types, and under the assumption that other present and future players have the same perceptions. Applications include a common pool problem and Rubinstein bargaining. When players are naive about their own time inconsistency and sophisticated about their opponent’s, the common pool problem is exacerbated, and Rubinstein bargaining breaks down completely
The Effects of Time Preferences on Cooperation: Experimental Evidence from Infinitely Repeated Games
Jeongbin Kim
Economics
American Economic Journal: Microeconomics
1 February 2023
This paper studies the effects of time preferences on cooperation in an infinitely repeated prisoner’s dilemma game experiment. Subjects play repeated games in the lab, all decisions at once, but stage game payoffs are paid over an extended period of time. Changing the time window of stage game payoffs (weekly or monthly) varies discount factors, and a delay for the first-stage game payoffs eliminates/weakens present bias. First, subjects with weekly payments cooperate more than subjects with monthly payments—higher discount factors promote greater cooperation. Second, the rate of cooperation is higher when there is a delay—present bias reduces cooperation. (JEL C72, C73, D91)
Safety Guarantees in Multi-agent Learning via Trapping Regions
A. Czechowski, F. Oliehoek
Computer Science
ArXiv
27 February 2023
One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we propose to apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. Upon verification of the direction of learning dynamics, the resulting trajectories are guaranteed not to escape such sets, during the learning process. As a result, it is ensured, that despite the uncertainty over convergence of the applied algorithms, learning will never form hazardous joint strategy combinations. We introduce a binary partitioning algorithm for verification of trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. In addition, via a fixed point argument, we show the existence of a learning equilibrium within a trapping region. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.
Detecting Information Relays in Deep Neural Networks
A. Hintze, C. Adami
Computer Science
Entropy
3 January 2023
Deep learning of artificial neural networks (ANNs) is creating highly functional processes that are, unfortunately, nearly as hard to interpret as their biological counterparts. Identification of functional modules in natural brains plays an important role in cognitive and neuroscience alike, and can be carried out using a wide range of technologies such as fMRI, EEG/ERP, MEG, or calcium imaging. However, we do not have such robust methods at our disposal when it comes to understanding functional modules in artificial neural networks. Ideally, understanding which parts of an artificial neural network perform what function might help us to address a number of vexing problems in ANN research, such as catastrophic forgetting and overfitting. Furthermore, revealing a network’s modularity could improve our trust in them by making these black boxes more transparent. Here, we introduce a new information-theoretic concept that proves useful in understanding and analyzing a network’s functional modularity: the relay information IR. The relay information measures how much information groups of neurons that participate in a particular function (modules) relay from inputs to outputs. Combined with a greedy search algorithm, relay information can be used to identify computational modules in neural networks. We also show that the functionality of modules correlates with the amount of relay information they carry
Scaling in Depth: Unlocking Robustness Certification on ImageNet
Kaiqin Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson
Computer Science
ArXiv
29 January 2023
Notwithstanding the promise of Lipschitz-based approaches to deterministically train and certify robust deep networks, the state-of-the-art results only make successful use of feed-forward Convolutional Networks (ConvNets) on low-dimensional data, e.g. CIFAR-10. Because ConvNets often suffer from vanishing gradients when going deep, large-scale datasets with many classes, e.g., ImageNet, have remained out of practical reach. This paper investigates ways to scale up certifiably robust training to Residual Networks (ResNets). First, we introduce the Linear ResNet (LiResNet) architecture, which utilizes a new residual block designed to facilitate tighter Lipschitz bounds compared to a conventional residual block. Second, we introduce Efficient Margin MAximization (EMMA), a loss function that stabilizes robust training by simultaneously penalizing worst-case adversarial examples from all classes. Combining LiResNet and EMMA, we achieve new state-of-the-art robust accuracy on CIFAR-10/100 and Tiny-ImageNet under (cid:96) 2 -norm-bounded perturbations. Moreover, for the first time, we are able to scale up deterministic robustness guarantees to ImageNet, bringing hope to the possibility of applying deterministic certification to real-world applications. We re-lease our code on Github: https://github. com/klasleino/gloro
IQ-Flow: Mechanism Design for Inducing Cooperative Behavior to Self-Interested Agents in Sequential Social Dilemmas
Bengisu Guresti, Abdullah Vanlioglu, N. K. Ure
Economics
ArXiv
28 February 2023
Achieving and maintaining cooperation between agents to accomplish a common objective is one of the central goals of Multi-Agent Reinforcement Learning (MARL). Nevertheless in many real-world scenarios, separately trained and specialized agents are deployed into a shared environment, or the environment requires multiple objectives to be achieved by different coexisting parties. These variations among specialties and objectives are likely to cause mixed motives that eventually result in a social dilemma where all the parties are at a loss. In order to resolve this issue, we propose the Incentive Q-Flow (IQ-Flow) algorithm, which modifies the system’s reward setup with an incentive regulator agent such that the cooperative policy also corresponds to the self-interested policy for the agents. Unlike the existing methods that learn to incentivize self-interested agents, IQ-Flow does not make any assumptions about agents’ policies or learning algorithms, which enables the generalization of the developed framework to a wider array of applications. IQ-Flow performs an offline evaluation of the optimality of the learned policies using the data provided by other agents to determine cooperative and self-interested policies. Next, IQ-Flow uses meta-gradient learning to estimate how policy evaluation changes according to given incentives and modifies the incentive such that the greedy policy for cooperative objective and self-interested objective yield the same actions. We present the operational characteristics of IQ-Flow in Iterated Matrix Games. We demonstrate that IQ-Flow outperforms the state-of-the-art incentive design algorithm in Escape Room and 2-Player Cleanup environments. We further demonstrate that the pretrained IQ-Flow mechanism significantly outperforms the performance of the shared reward setup in the 2-Player Cleanup environment.
Private Blotto: Viewpoint Competition with Polarized Agents
Kate Donahue, J. Kleinberg
Economics
ArXiv
27 February 2023
Colonel Blotto games are one of the oldest settings in game theory, originally proposed over a century ago in Borel 1921. However, they were originally designed to model two centrally-controlled armies competing over zero-sum”fronts”, a specific scenario with limited modern-day application. In this work, we propose and study Private Blotto games, a variant connected to crowdsourcing and social media. One key difference in Private Blotto is that individual agents act independently, without being coordinated by a central”Colonel”. This model naturally arises from scenarios such as activist groups competing over multiple issues, partisan fund-raisers competing over elections in multiple states, or politically-biased social media users labeling news articles as misinformation. In this work, we completely characterize the Nash Stability of the Private Blotto game. Specifically, we show that the outcome function has a critical impact on the outcome of the game: we study whether a front is won by majority rule (median outcome) or a smoother outcome taking into account all agents (mean outcome). We study how this impacts the amount of”misallocated effort”, or agents whose choices doesn’t influence the final outcome. In general, mean outcome ensures that, if a stable arrangement exists, agents are close to evenly spaced across fronts, minimizing misallocated effort. However, mean outcome functions also have chaotic patterns as to when stable arrangements do and do not exist. For median outcome, we exactly characterize when a stable arrangement exists, but show that this outcome function frequently results in extremely unbalanced allocation of agents across fronts
Natural selection for collective action
Benjamin Allen, Abdur-Rahman Khwaja, J. L. Donahue, Cassidy Lattanzio, Y. Dementieva, C. Sample
Biology
28 February 2023
Collective action—behavior that arises from the combined actions of multiple individuals—is observed across living beings. The question of how and why collective action evolves has profound implications for behavioral ecology, multicellularity, and human society. Collective action is challenging to model mathematically, due to nonlinear fitness effects and the consequences of spatial, group, and/or family relationships. We derive a simple condition for collective action to be favored by natural selection. A collective’s effect on the fitness of each individual is weighted by the relatedness between them, using a new measure of collective relatedness. If selection is weak, this condition can be evaluated using coalescent theory. More generally, our result applies to any synergistic social behavior, in spatial, group, and/or family-structured populations. We use this result to obtain conditions for the evolution of collective help among diploid siblings, subcommunities of a network, and hyperedges of a hypergraph. We also obtain a condition for which of two strategies is favored in a game between siblings, cousins, or other relatives. Our work provides a rigorous basis for extending the notion of ``actor”, in the study of social behavior, from individuals to collectives
Systematic Rectification of Language Models via Dead-end Analysis
Mengyao Cao, Mehdi Fatemi, J. C. Cheung, S. Shabanian
Computer Science
ArXiv
27 February 2023
With adversarial or otherwise normal prompts, existing large language models (LLM) can be pushed to generate toxic discourses. One way to reduce the risk of LLMs generating undesired discourses is to alter the training of the LLM. This can be very restrictive due to demanding computation requirements. Other methods rely on rule-based or prompt-based token elimination, which are limited as they dismiss future tokens and the overall meaning of the complete discourse. Here, we center detoxification on the probability that the finished discourse is ultimately considered toxic. That is, at each point, we advise against token selections proportional to how likely a finished text from this point will be toxic. To this end, we formally extend the dead-end theory from the recent reinforcement learning (RL) literature to also cover uncertain outcomes. Our approach, called rectification, utilizes a separate but significantly smaller model for detoxification, which can be applied to diverse LLMs as long as they share the same vocabulary. Importantly, our method does not require access to the internal representations of the LLM, but only the token probability distribution at each decoding step. This is crucial as many LLMs today are hosted in servers and only accessible through APIs. When applied to various LLMs, including GPT-3, our approach significantly improves the generated discourse compared to the base LLMs and other techniques in terms of both the overall language and detoxification performance.
How does HCI Understand Human Autonomy and Agency?
Dan Bennett, Oussama Metatla, A. Roudaut, Elisa D. Mekler
Psychology
ArXiv
29 January 2023
Human agency and autonomy have always been fundamental concepts in HCI. New developments, including ubiquitous AI and the growing integration of technologies into our lives, make these issues ever pressing, as technologies increase their ability to influence our behaviours and values. However, in HCI understandings of autonomy and agency remain ambiguous. Both concepts are used to describe a wide range of phenomena pertaining to sense-of-control, material independence, and identity. It is unclear to what degree these understandings are compatible, and how they support the development of research programs and practical interventions. We address this by reviewing 30 years of HCI research on autonomy and agency to identify current understandings, open issues, and future directions. From this analysis, we identify ethical issues, and outline key themes to guide future work. We also articulate avenues for advancing clarity and specificity around these concepts, and for coordinating integrative work across different HCI communities.
Aligning Robot and Human Representations
Andreea Bobu, Andi Peng, Pulkit Agrawal, J. Shah, A. Dragan
Computer Science
ArXiv
3 February 2023
To act in the world, robots rely on a representation of salient task aspects: for example, to carry a cup of coffee, a robot must consider movement efficiency and cup orientation in its behaviour. However, if we want robots to act for and with people, their representations must not be just functional but also reflective of what humans care about, i.e. their representations must be aligned with humans’. In this survey, we pose that current reward and imitation learning approaches suffer from representation misalignment, where the robot’s learned representation does not capture the human’s representation. We suggest that because humans will be the ultimate evaluator of robot performance in the world, it is critical that we explicitly focus our efforts on aligning learned task representations with humans, in addition to learning the downstream task. We advocate that current representation learning approaches in robotics should be studied from the perspective of how well they accomplish the objective of representation alignment. To do so, we mathematically define the problem, identify its key desiderata, and situate current robot learning methods within this formalism. We conclude the survey by suggesting future directions for exploring open challenges.
Reinforcement Learning from Diverse Human Preferences
Wanqi Xue, Bo An, Shuicheng Yan, Zhongwen Xu
Computer Science
ArXiv
27 January 2023
The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent’s desired behaviors and prop-erties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are lim-ited by the need for accurate oracle preference labels. This paper addresses this limitation by de-veloping a method for crowd-sourcing preference labels and learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to the prior distribution. Additionally, a confidence-based reward model ensembling method is designed to generate more stable and reliable predictions. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world and has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback, paving the way for real-world applications of RL methods
The Role of Heuristics and Biases During Complex Choices with an AI Teammate
Nikolos Gurney, John H. Miller, David V. Pynadath
Biology
ArXiv
14 January 2023
Behavioral scientists have classically documented aversion to algorithmic decision aids, from simple linear models to AI. Sentiment, however, is changing and possibly accelerating AI helper usage. AI assistance is, arguably, most valuable when humans must make complex choices. We argue that classic experimental methods used to study heuristics and biases are insufficient for studying complex choices made with AI helpers. We adapted an experimental paradigm designed for studying complex choices in such contexts. We show that framing and anchoring effects impact how people work with an AI helper and are predictive of choice outcomes. The ev- idence suggests that some participants, particularly those in a loss frame, put too much faith in the AI helper and experi- enced worse choice outcomes by doing so. The paradigm also generates computational modeling-friendly data allowing fu- ture studies of human-AI decision making
On The Fragility of Learned Reward Functions
Lev McKinney, Yawen Duan, David Krueger, A. Gleave
Computer Science, Psychology
ArXiv
9 January 2023
Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer reward functions from human feedback and preferences. Prior works on reward learning have mainly focused on the performance of policies trained alongside the reward function. This practice, however, may fail to detect learned rewards that are not capable of training new policies from scratch and thus do not capture the intended behavior. Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning. We demonstrate with experiments in tabular and continuous control environments that the severity of relearning failures can be sensitive to changes in reward model design and the trajectory dataset composition. Based on our findings, we emphasize the need for more retraining-based evaluations in the literature.
Preserving Fairness in AI under Domain Shift
Serban Stan, Mohammad Rostami
Computer Science
ArXiv
29 January 2023
Existing algorithms for ensuring fairness in AI use a single-shot training strategy, where an AI model is trained on an annotated training dataset with sensitive attributes and then fielded for utilization. This training strategy is effective in problems with stationary distributions, where both training and testing data are drawn from the same distribution. However, it is vulnerable with respect to distributional shifts in the input space that may occur after the initial training phase. As a result, the time-dependent nature of data can introduce biases into the model predictions. Model retraining from scratch using a new annotated dataset is a naive solution that is expensive and time-consuming. We develop an algorithm to adapt a fair model to remain fair under domain shift using solely new unannotated data points. We recast this learning setting as an unsupervised domain adaptation problem. Our algorithm is based on updating the model such that the internal representation of data remains unbiased despite distributional shifts in the input space. We provide extensive empirical validation on three widely employed fairness datasets to demonstrate the effectiveness of our algorithm.
Generalized Uncertainty of Deep Neural Networks: Taxonomy and Applications
Chengyu Dong
Computer Science
ArXiv
2 February 2023
Deep neural networks have seen enormous success in various real-world applications. Beyond their predictions as point estimates, increasing attention has been focused on quantifying the uncertainty of their predictions. In this review, we show that the uncertainty of deep neural networks is not only important in a sense of interpretability and transparency, but also crucial in further advancing their performance, particularly in learning systems seeking robustness and efficiency. We will generalize the definition of the uncertainty of deep neural networks to any number or vector that is associated with an input or an input-label pair, and catalog existing methods on ``mining″ such uncertainty from a deep model. We will include those methods from the classic field of uncertainty quantification as well as those methods that are specific to deep neural networks. We then show a wide spectrum of applications of such generalized uncertainty in realistic learning tasks including robust learning such as noisy learning, adversarially robust learning; data-efficient learning such as semi-supervised and weakly-supervised learning; and model-efficient learning such as model compression and knowledge distillation
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
Elizaveta Tennant, S. Hailes, Mirco Musolesi
Computer Science
ArXiv
20 January 2023
Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequenceand normbased agents, between morality based on societal norms or internal virtues, and between singleand mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner’s Dilemma, Volunteer’s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection
Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat
Computer Science
ArXiv
18 January 2023
Neural sequence generation models are known to”hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds
What makes a language easy to deep-learn?
Lukas Galke, Yoav Ram, Limor Raviv
Computer Science
ArXiv
23 February 2023
Neural networks drive the success of natural language processing. A fundamental property of natural languages is their compositional structure, allowing us to describe new meanings systematically. However, neural networks notoriously struggle with systematic generalization and do not necessarily benefit from a compositional structure in emergent communication simulations. Here, we test how neural networks compare to humans in learning and generalizing a new language. We do this by closely replicating an artificial language learning study (conducted originally with human participants) and evaluating the memorization and generalization capabilities of deep neural networks with respect to the degree of structure in the input language. Our results show striking similarities between humans and deep neural networks: More structured linguistic input leads to more systematic generalization and better convergence between humans and neural network agents and between different neural agents. We then replicate this structure bias found in humans and our recurrent neural networks with a Transformer-based large language model (GPT-3), showing a similar benefit for structured linguistic input regarding generalization systematicity and memorization errors. These findings show that the underlying structure of languages is crucial for systematic generalization. Due to the correlation between community size and linguistic structure in natural languages, our findings underscore the challenge of automated processing of low-resource languages. Nevertheless, the similarity between humans and machines opens new avenues for language evolution research.
Catastrophe by Design in Population Games: A Mechanism to Destabilize Inefficient Locked-in Technologies
Stefanos Leonardos, Joseph Sakos, C. Courcoubetis, G. Piliouras
Economics
ACM Transactions on Economics and Computation
15 February 2023
In multi-agent environments in which coordination is desirable, the history of play often causes lock-in at sub-optimal outcomes. Notoriously, technologies with significant environmental footprint or high social cost persist despite the successful development of more environmentally friendly and/or socially efficient alternatives. The displacement of the status quo is hindered by entrenched economic interests and network effects. To exacerbate matters, the standard mechanism design approaches based on centralized authorities with the capacity to use preferential subsidies to effectively dictate system outcomes are not always applicable to modern decentralised economies. What other types of mechanisms are feasible? In this paper, we develop and analyze a mechanism which induces transitions from inefficient lock-ins to superior alternatives. This mechanism does not exogenously favor one option over another – instead, the phase transition emerges endogenously via a standard evolutionary learning model, Q-learning, where agents trade off exploration and exploitation. Exerting the same transient influence to both the efficient and inefficient technologies encourages exploration and results in irreversible phase transitions and permanent stabilization of the efficient one. On a technical level, our work is based on bifurcation and catastrophe theory, a branch of mathematics that deals with changes in the number and stability properties of equilibria. Critically, our analysis is shown to be structurally robust to significant and even adversarially chosen perturbations to the parameters of both our game and our behavioral model.
Who wants what and how: a Mapping Function for Explainable Artificial Intelligence
M. Hashemi
Computer Science
ArXiv
7 February 2023
The increasing complexity of AI systems has led to the growth of the field of explainable AI (XAI), which aims to provide explanations and justifications for the outputs of AI algorithms. These methods mainly focus on feature importance and identifying changes that can be made to achieve a desired outcome. Researchers have identified desired properties for XAI methods, such as plausibility, sparsity, causality, low run-time, etc. The objective of this study is to conduct a review of existing XAI research and present a classification of XAI methods. The study also aims to connect XAI users with the appropriate method and relate desired properties to current XAI approaches. The outcome of this study will be a clear strategy that outlines how to choose the right XAI method for a particular goal and user and provide a personalized explanation for users
CertViT: Certified Robustness of Pre-Trained Vision Transformers
K. Gupta, S. Verma
Computer Science
ArXiv
1 February 2023
Lipschitz bounded neural networks are certifiably robust and have a good trade-off between clean and certified accuracy. Existing Lipschitz bounding methods train from scratch and are limited to moderately sized networks (<6M parameters). They require a fair amount of hyper-parameter tuning and are computationally prohibitive for large networks like Vision Transformers (5M to 660M parameters). Obtaining certified robustness of transformers is not feasible due to the non-scalability and inflexibility of the current methods. This work presents CertViT, a two-step proximal-projection method to achieve certified robustness from pre-trained weights. The proximal step tries to lower the Lipschitz bound and the projection step tries to maintain the clean accuracy of pre-trained weights. We show that CertViT networks have better certified accuracy than state-of-the-art Lipschitz trained networks. We apply CertViT on several variants of pre-trained vision transformers and show adversarial robustness using standard attacks. Code : https://github.com/sagarverma/transformer-lipschitz
Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
Ezgi Korkmaz
Computer Science
ArXiv
17 January 2023
Learning from raw high dimensional data via interaction with a given environment has been effectively achieved through the utilization of deep neural networks. Yet the observed degradation in policy performance caused by imperceptible worst-case policy dependent translations along high sensitiv- ity directions (i.e. adversarial perturbations) raises concerns on the robustness of deep reinforcement learning policies. In our paper, we show that these high sensitivity directions do not lie only along particular worst-case directions, but rather are more abundant in the deep neural policy landscape and can be found via more natural means in a black-box set- ting. Furthermore, we show that vanilla training techniques intriguingly result in learning more robust policies compared to the policies learnt via the state-of-the-art adversarial training techniques. We believe our work lays out intriguing prop- erties of the deep reinforcement learning policy manifold and our results can help to build robust and generalizable deep reinforcement learning policies.
Safe Deep Reinforcement Learning by Verifying Task-Level Properties
Enrico Marchesini, Luca Marzari, A. Farinelli, Chris Amato
Computer Science
ArXiv
20 February 2023
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL). However, the cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. Such an encoding requires the agent to visit numerous unsafe states to learn a cost-value function to drive the learning process toward safety. Hence, increasing the number of unsafe interactions and decreasing sample efficiency. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric. This metric is computed by verifying task-level properties, shaped as input-output conditions, and it is used as a penalty to bias the policy away from unsafe states without learning an additional value function. We investigate the benefits of using the violation metric in standard Safe DRL benchmarks and robotic mapless navigation tasks. The navigation experiments bridge the gap between Safe DRL and robotics, introducing a framework that allows rapid testing on real robots. Our experiments show that policies trained with the violation penalty achieve higher performance over Safe DRL baselines and significantly reduce the number of visited unsafe states.
Harms from Increasingly Agentic Algorithmic Systems
Alan Chan, Rebecca Salganik, +19 authors Tegan Maharaj
Computer Science
ArXiv
20 February 2023
Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency—notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.
Selecting Models based on the Risk of Damage Caused by Adversarial Attacks
Jona Klemenc, Holger Trittenbach
Computer Science
ArXiv
28 January 2023
Regulation, legal liabilities, and societal con-cerns challenge the adoption of AI in safety and security-critical applications. One of the key con-cerns is that adversaries can cause harm by manip-ulating model predictions without being detected. Regulation hence demands an assessment of the risk of damage caused by adversaries. Yet, there is no method to translate this high-level demand into actionable metrics that quantify the risk of damage. In this article, we propose a method to model and statistically estimate the probability of damage arising from adversarial attacks. We show that our proposed estimator is statistically consistent and unbiased. In experiments, we demonstrate that the estimation results of our method have a clear and actionable interpretation and outper-form conventional metrics. We then show how operators can use the estimation results to reliably select the model with the lowest risk
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
Computer Science
ArXiv
5 January 2023
Pretrained large language models (LLMs) are becoming increasingly powerful and ubiquitous in mainstream applications such as being a personal assistant, a dialogue model, etc. As these models become proficient in deducing user preferences and offering tailored assistance, there is an increasing concern about the ability of these models to influence, modify and in the extreme case manipulate user preference adversarially. The issue of lack of interpretability in these models in adversarial settings remains largely unsolved. This work tries to study adversarial behavior in user preferences from the lens of attention probing, red teaming and white-box analysis. Specifically, it provides a bird’s eye view of existing literature, offers red teaming samples for dialogue models like ChatGPT and GODEL and probes the attention mechanism in the latter for non-adversarial and adversarial settings.
Conformity effect on the evolution of cooperation in signed networks.
Xiaochen He, Guangyu Li, Haifeng Du
Psychology
Chaos
1 February 2023
Human behaviors are often subject to conformity, but little research attention has been paid to social dilemmas in which players are assumed to only pursue the maximization of their payoffs. The present study proposed a generalized prisoner dilemma model in a signed network considering conformity. Simulation shows that conformity helps promote the imitation of cooperative behavior when positive edges dominate the network, while negative edges may impede conformity from fostering cooperation. The logic of homophily and xenophobia allows for the coexistence of cooperators and defectors and guides the evolution toward the equality of the two strategies. We also find that cooperation prevails when individuals have a higher probability of adjusting their relation signs, but conformity may mediate the effect of network adaptation. From a population-wide view, network adaptation and conformity are capable of forming the structures of attractors or repellers.
Explainable AI does not provide the explanations end-users are asking for
Savio Rozario, George Cevora
Computer Science
ArXiv
25 January 2023
Explainable Artificial Intelligence (XAI) techniques are frequently required by users in many AI systems with the goal of understanding complex models, their associated predictions, and gaining trust. While suitable for some specific tasks during development, their adoption by organisations to enhance trust in machine learning systems has unintended consequences. In this paper we discuss XAI’s limitations in deployment and conclude that transparency alongside with rigorous validation are better suited to gaining trust in AI systems.
Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
Chao Yu, Jiaxuan Gao, +5 authors Y. Wu
Computer Science
ArXiv
3 February 2023
There is a recent trend of applying multi-agent reinforcement learning (MARL) to train an agent that can cooperate with humans in a zero-shot fashion without using any human data. The typical workflow is to first repeatedly run self-play (SP) to build a policy pool and then train the final adaptive policy against this pool. A crucial limitation of this framework is that every policy in the pool is optimized w.r.t. the environment reward function, which implicitly assumes that the testing partners of the adaptive policy will be precisely optimizing the same reward function as well. However, human objectives are often substantially biased according to their own preferences, which can differ greatly from the environment reward. We propose a more general framework, Hidden-Utility Self-Play (HSP), which explicitly models human biases as hidden reward functions in the self-play objective. By approximating the reward space as linear functions, HSP adopts an effective technique to generate an augmented policy pool with biased policies. We evaluate HSP on the Overcooked benchmark. Empirical results show that our HSP method produces higher rewards than baselines when cooperating with learned human models, manually scripted policies, and real humans. The HSP policy is also rated as the most assistive policy based on human feedback.