Research
CHAI aims to reorient the foundations of AI research toward the development of provably beneficial systems. Currently, it is not possible to specify a formula for human values in any form that we know would provably benefit humanity, if that formula were instated as the objective of a powerful AI system. In short, any initial formal specification of human values is bound to be wrong in important ways. This means we need to somehow represent uncertainty in the objectives of AI systems. This way of formulating objectives stands in contrast to the standard model for AI, in which the AI system's objective is assumed to be known completely and correctly.
Therefore, much of CHAI's research efforts to date have focussed on developing and communicating a new model of AI development, in which AI systems should be uncertain of their objectives, and should be deferent to humans in light of that uncertainty. However, our interests extend to a variety of other problems in the development of provably beneficial AI systems. Our areas of greatest focus so far have been the foundations of rational agency and causality, value alignment and inverse reinforcement learning, human-robot cooperation, multi-agent perspectives and applications, and models of bounded or imperfect rationality. Other areas of interest to our mission include adversarial training and testing for ML systems, various AI capabilities, topics in cognitive science, ethics for AI and AI development robust inference and planning, security problems and solutions, and transparency and interpretability methods.
In addition to purely academic work, CHAI strives to produce intellectual outputs for general audiences as well. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.
Below is a list of CHAI's publications since we began operating in 2016. Many of our publications are collaborations with other AI research groups; we view collaborations as key to integrating our perspectives into mainstream AI research.
0. NOTE:
0. NOTE:
- Jonathan Stray. 2022. Risk Ratios. NICAR 2022
- J Stray. 2022. Better Conflict Bulletin.
- OA Dada, G Obaido, IT Sanusi, K Aruleba, AA Yunusa. 2022. Hidden Gold for IT Professionals, Educators, and Students: Insights From Stack Overflow Survey. IEEE Transactions on Computational Social Systems
- OA Dada, K Aruleba, AA Yunusa, IT Sanusi, G Obaido. 2022. Information Technology Roles and Their Most-Used Programming Languages. Oluwaseun Alexander Dada, Kehinde Aruleba, Abdullahi Abubakar Yunusa, Ismaila Temitayo Sanusi, George Obaido
1. Overviews
1.1. Books
- C Giffin, T Lombrozo. 2022. Mens Rea in Moral Judgment and Criminal Law. Oxford Academic
- Stuart Russell. 2021. Human-Compatible Artificial Intelligence. Human-Like Machine Intelligence
- Stuart Russell. 2020. Artificial Intelligence: A Modern Approach (Textbook, 4th Edition). Pearson
- Stuart Russell. 2019. Human Compatible: Artificial Intelligence and The Problem of Control. Penguin Random House
- Joseph Y. Halpern. 2017. Actual Causality (Book). MIT Press 2016
1.2. Overviews of societal-scale risks from AI
- Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Günes Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann. 2023. Managing AI Risks in an Era of Rapid Progress. arXiv:2310.17688
- Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob Steinhardt. 2023. Automatically Auditing Large Language Models via Discrete Optimization. arXiv:2303.04381
- Stuart Russell. 2022. The promises and perils of AI. interview by Kay Firth-Butterfield and Robin Pomeroy, World Economic Forum Radio Davos
- Stuart Russell. 2022. Is the rise of killer machines closer than we think?. interview by Damian Whitworth, The Times
- Stuart Russell. 2022. If we succeed. Daedalus
- Stuart Russell. 2022. The best of Radio Davos over the last year. World Economic Forum, August 4, 2022
- Stuart Russell. 2022. The Foundations of Artificial Intelligence. interview by Daniel bashir, The Gradient podcast
- Stuart Russell. 2022. Politicians must prepare for AI or face the consequences. The House (magazine of the UK Houses of Parliament)
- Stuart Russell. 2022. AI experts are increasingly afraid of what they’re creating. by Kelsey Pieper, Vox
- Stuart Russell. 2022. Rethinking the purpose of AI. interview by Mark Leonard, The World in 30 Minutes, European Council on Foreign Relations
- Stuart Russell. 2022. Are we living in an AGI World?. interview by Kay Firth-Butterfield, In AI We Trust? podcast
- Stuart Russell. 2022. Banning Lethal Autonomous Weapons: An Education. Issues in Science and Technology, XXXVIII(3)
- Stuart Russell. 2022. Why we need to regulate non-state use of arms. Global Agenda, World Economic Forum
- Stuart Russell. 2022. Lethal Autonomous Weapons. interview by Anna Höhn, Deutsche Welle
- Stuart Russell. 2022. Robotic Weapons Are Coming: What Should We Do About It. by Charlie Burton, Wired,
- Stuart Russell. 2022. Microdrones: the AI assassins set to become weapons of mass destruction. interview by Henry Bodkin and Aisling O'Leary
- Stuart Russell. 2022. Defense Primer: U.S. Policy on Lethal Autonomous Weapon Systems. Congressional Research Service
- Stuart Russell. 2022. Israel’s Autonomous Urban Quadcopter Brings ‘Search & Attack In One’. by David Hambling, Forbes Magazine
- J Stray. 2022. Understanding Recommenders..
- J Stray. 2022. Democratic Control of Recommender Systems. Metagovernance seminar
- Dan Hendrycks. 2022. Natural Selection Favors AIs over Humans. [arxiv]
- Dan Hendrycks, Mantas Mazeika. 2022. X-Risk Analysis for AI Research. arXiv:2206.05862
- Anthony M. Barrett, Dan Hendrycks, Jessica Newman, Brandie Nonnecke. 2022. Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks. arXiv:2206.08966
- Thomas Krendl Gilbert, S Dean, N Lambert, T Zick; A Snoswell. 2022. Reward Reports for Reinforcement Learning. Responsible Decision Making in Dynamic Environments workshop, ICML 2022
- Thomas Krendl Gilbert , Aaron J. Snoswell , Michael Dennis , Rowan McAllister , and Cathy Wu. 2022. Sociotechnical Specification for the Broader Impacts of Autonomous Vehicles. Fresh Perspectives on the Future of Autonomous Driving workshop, ICRA 2022
- Nathaniel Lubinarchive pageThomas Krendl Gilbertarchive page. 2022. Social media is polluting society. Moderation alone won’t fix the problem. MIT Technology Review
- Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert. 2022. Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems. Center for Long-Term Cybersecurity Whitepaper Series
- McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick. 2021. AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks.
- Simon Zhuang, Dylan Hadfield-Menell. 2021. Consequences of Misaligned AI. NeurIPS 2020
- Raja Chatila, Virginia Dignum, Michael Fisher, Fosca Giannotti, Katharina Morik, Stuart Russell, Karen Yeung. 2021. Trustworthy AI. Reflections on Artificial Intelligence for Humanity
- Stuart Russell. 2021. The history and future of AI. Oxford Review of Economic Policy
- Jonathan Stray. 2021. Beyond Engagement: Aligning Algorithmic Recommendations With Prosocial Goals.
- Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt. 2021. Unsolved Problems in ML Safety.
- Andrew Critch, David Krueger. 2020. AI Research Considerations for Human Existential Safety (ARCHES). (Preprint)
- Olaf Graf, Mark Nitzberg. 2018. Solomon’s Code: Humanity in a World with Thinking Machines. Pegasus Books
- Stuart Russell. 2018. The new weapons of mass destruction?. The Security Times
- Stuart Russell. . Artificial Intelligence and the Problem of Control. Perspectives on Digital Humanism
1.3. Overviews of beneficial AI applications
- Anca Dragan, Andrew Alleyne, Frank Allgöwer, Aaron Ames, Saurabh Amin, James Anderson, Anuradha Annaswamy, Panos Antsaklis, Neda Bagheri, Hamsa Balakrishnan, Bassam Bamieh, John Baras, Margret Bauer, Alexandre Bayen, Paul Bogdan, Steven Brunton, Francesco Bullo, Etienne Burdet, Joel Burdick, Laurent Burlion, Carlos Canudas de Wit, Ming Cao, Christos Cassandras, Aranya Chakrabortty, Giacomo Como, Marie Csete, Fabrizio Dabbene, Munther Dahleh, Amritam Das, Eyal Dassau, Claudio De Persis, Mario di Bernardo, Stefano Di Cairano, Dimos Dimarogonas, Florian Dörfler, John Doyle, Francis Doyle III, Magnus Egerstedt, Johan Eker, Sarah Fay, Dimitar Filev, Angela Fontan, Elisa Franco, Masayuki Fujita, Mario Garcia-Sanz, Dennice Gayme, WPMH Heemels, João Hespanha, Sandra Hirche, Anette Hosoi, Jonathan How, Gabriela Hug, Marija Ilić, Hideaki Ishii, Ali Jadbabaie, Matin Jafarian, Samuel Qing-Shan Jia, Tor Johansen, Karl Johansson, Dalton Jones, Mustafa Khammash, Pramod Khargonekar, Mykel Kochenderfer, Andreas Krause, Anthony Kuh, Dana Kulić, Françoise Lamnabhi-Lagarrigue, Naomi Leonard, Frederick Leve, Na Li, Steven Low, John Lygeros, Iven Mareels, Sonia Martinez, Nikolai Matni, Tommaso Menara, Katja Mombaur, Kevin Moore, Richard Murray, Toru Namerikawa, Angelia Nedich, Sandeep Neema, Mariana Netto, Timothy O’Leary, Marcia O’Malley, Lucy Pao, Antonis Papachristodoulou, George Pappas, Philip Paré, Thomas Parisini, Fabio Pasqualetti, Marco Pavone, Akshay Rajhans, Gireeja Ranade, Anders Rantzer, Lillian Ratliff, J Anthony Rossiter, Dorsa Sadigh, Tariq Samad, Henrik Sandberg, Sri Sarma, Luca Schenato, Jacquelien Scherpen, Angela Schoellig, Rodolphe Sepulchre, Jeff Shamma, Robert Shorten, Bruno Sinopoli, Koushil Sreenath, Jakob Stoustrup, Jing Sun, Paulo Tabuada, Emma Tegling, Dawn Tilbury, Claire Tomlin, Jana Tumova, Kevin Wise, Dan Work, Junaid Zafar, Melanie Zeilinger. 2023. Control for Societal-scale Challenges: Road Map 2030. IEEE Control Systems Society Publication, 2023
- Jonathan Stray. 2022. Designing Recommender Systems to Depolarize. First Monday
- Raphael Taiwo Aruleba, Tayo Alex Adekiya, Nimibofa Ayawei, George Obaido, Kehinde Aruleba, Ibomoiye Domor Mienye, Idowu Aruleba, Blessing Ogbuokiri. 2022. COVID-19 diagnosis: a review of rapid antigen, RT-PCR and artificial intelligence methods. Bioengineering 9 (4), 153
- Jocelyn Maclure, Stuart Russell. 2021. AI for Humanity: The Global Challenges. Reflections on Artificial Intelligence for Humanity
2. Core topics
1.2. Overviews of societal-scale risks from AI
- Andrew Critch, Stuart Russell. 2023. TASRA: A Taxonomy of Societal-Scale Risks from AI. arXiv
2.1. Foundations of rational agency & causality
- Hanlin Zhu, Baihe Huang, Stuart Russell. 2023. On Representation Complexity of Model-based and Model-free Reinforcement Learning. arXiv:2310.01706
- Carlos G Correa, Mark K Ho, Frederick Callaway, Nathaniel D Daw, Thomas L Griffiths. 2023. Humans decompose tasks by trading off utility and computational cost. Journal PLOS Computational Biology
- Cameron Rouse Turner, Thomas Morgan, Tom Griffiths. 2023. The joint evolution of sensory systems and decision policy allows cognition. Proceedings of the Annual Meeting of the Cognitive Science Society
- Jian-Qiao Zhu, Adam Sanborn, Nick Chater, Tom Griffiths. 2023. Computation-Limited Bayesian Updating. Proceedings of the Annual Meeting of the Cognitive Science Society
- Hanlin Zhu, Baihe Huang, Stuart Russell. 2023. On Representation Complexity of Model-based and Model-free Reinforcement Learning. arXiv:2310.01706
- Hanlin Zhu, Ruosong Wang, Jason Lee. 2023. Provably Efficient Reinforcement Learning via Surprise Bound. International Conference on Artificial Intelligence and Statistics
- Hanlin Zhu, Amy Zhang. 2023. Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability. arXiv:2302.03770
- Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao. 2023. Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning. Advances in Neural Information Processing Systems (NeurIPS)
- Ted Moskovitz, Brendan O’Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy. 2023. Reload: Reinforcement learning with optimistic ascent-descent for last-iterate convergence in constrained mdps. In Proc. 40 th International Conference on Machine Learning
- Qi Zhang, Edmund H Durfee, Satinder Singh. 2023. Risk-aware analysis for interpretations of probabilistic achievement and maintenance commitments. Artificial Intelligence Volume 317
- Sander Beckers, Joseph Halpern, Christopher Hitchcock. 2023. Causal Models with Constraints. Conference on Causal Learning and Reasoning
- Sander Beckers, Hana Chockler, Joseph Y Halpern. 2023. Quantifying harm. Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023)
- Adam Bjorndahl, Joseph Y Halpern. 2023. Sequential Language-based Decisions. R. Verbrugge (Ed.): Theoretical Aspects of Rationality and Knowledge 2023 (TARK 2023)
- Oliver E Richardson, Joseph Y Halpern, Christopher De Sa. 2023. Inference for probabilistic dependency graphs. Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)
- Cassidy Laidlaw, Stuart Russell, and Anca Dragan. 2023. Bridging RL Theory and Practice with the Effective Horizon. ICML 2023 Workshop on New Frontiers in Learning, Control, and Dynamical Systems; Proc. NeurIPS-23
- Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, and Jiantao Jiao. 2023. Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian. Published as a conference paper at ICLR 2023
- Weirui Ye, Pieter Abbeel, Yang Gao. 2022. Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions. Neural Information Processing Systems (NeurIPS), 2023
- H. Geffner and R. Dechter. 2022. Probabilistic and Causal Inference: The Works of Judea Pearl. ACM Press, 2022
- J Halpern, M. Soloviev. 2022. Information acquisition under resource limitations in a noisy environment. Journal of the ACM
- J Halpern, H. Chockler. 2022. On testing for discrimination using causal models. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-21)
- J Halpern, S. Peters. 2022. Reasoning about causal models with infinitely many variables. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-21)
- T Blanchard, D Murray, T Lombrozo. 2022. Experiments on causal exclusion. Mind & Language 37 (5), 1067-1089
- N Vasil, T Lombrozo. 2022. Explanations and Causal Judgments Are Differentially Sensitive to Covariation and Mechanism Information. Frontiers in Psychology 13
- T Vrantsidis, T Lombrozo. 2022. Simplicity beyond probability: Simplicity’s role in evaluating explanations goes beyond providing cues to priors and likelihoods. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
- Smitha Milli, Luca Belli, Moritz Hardt. 2022. Causal Inference Struggles with Agency on Online Platforms. FaccT 2022
- Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell. 2021. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism.
- David Silver, Satinder Singh, Doina Precup, and Richard Sutton.. 2021. Reward is Enough. Artificial Intelligence 2021
- David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh. 2021. On the Expressivity of Markov Reward. NeurIPS 2021
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, and Satinder Singh. 2021. Proper Value Equivalence.
- Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh. 2021. Reward is Enough for Convex MDPs.
- Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths. 2021. Extending rational models of communication from beliefs to actions.
- Smitha Milli, Luca Belli, Moritz Hardt. 2021. Causal Inference Struggles with Agency on Online Platforms.
- Sander Beckers, Frederick Eberhardt, Joseph Y Halpern. 2020. Approximate Causal Abstractions. PMLR
- Dalal Alrajeh, Hana Chockler, Joseph Y Halpern. 2020. Combining experts’ causal judgments. AAAI; Elsevier
- Andrew Critch. 2019. A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory. The Journal of Symbolic Logic, Cambridge University Press
- Joseph Y. Halpern, Evan Piermont. 2019. Partial Awareness. AAAI 2019
- Joseph Y. Halpern, Rafael Pass. 2019. A Conceptually Well-Founded Characterization of Iterated Admissibility Using an ”All I Know” Operator. TARK 2019
- Sander Beckers, Frederick Eberhardt, Joseph Y. Halpern. 2019. Approximate Causal Abstraction. UAI 2019
- Sander Beckers, Joseph Y. Halpern. 2019. Abstracting causal models. AAAI 2019
- Joseph Y. Halpern. 2018. A Note on the Existence of Ratifiable Acts. Review of Symbolic Logic
- Meir Friedenberg, Joseph Y. Halpern. 2018. Combining the Causal Judgments of Experts with Possibly Different Focus Areas. International Conference on Principles of Knowledge Representation and Reasoning
- Gadi Aleksandrowicz, Hana Chockler, Joseph Y. Halpern, Alexander Ivrii. 2017. The Computational Complexity of Structure-Based Causality. JAIR
- Joseph Y. Halpern. 2016. Sufficient Conditions for Causality to be Transitive. Philosophy of Science, 83, 213--226
- Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter. 2016. Self-Modification of Policy and Utility Function in Rational Agents. AGI 2016
2.2. Value alignment and inverse reinforcement learning
- Sunayana Rane, Mark Ho, Ilia Sucholutsky, Thomas L. Griffiths. 2023. Concept Alignment as a Prerequisite for Value Alignment. arXiv:2310.20059
- Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell. 2023. Active teacher selection for reinforcement learning from human feedback. arXiv:2310.15288
- Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell. 2023. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arXiv:2307.15217
- Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate. 2023. STARC: A General Framework For Quantifying Differences Between Reward Functions. arXiv:2309.15257
- Manan Tomar, Dibya Ghosh, Vivek Myers, Anca Dragan, Matthew E. Taylor, Philip Bachman, Sergey Levine. 2023. Video-Guided Skill Discovery. ICML 2023 Workshop The Many Facets of Preference-Based Learning
- Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D Dragan, Daniel S Brown. 2023. Causal Confusion and Reward Misidentification in Preference-Based Reward Learning. arXiv:2204.06601
- Khanh Nguyen. 2023. Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective. In ToM workshop @ ICML, 2023
- Changyeon Kim, Younggyo Seo, Hao Liu, Lisa Lee, Jinwoo Shin, Honglak Lee, Kimin Lee. 2023. Guide Your Agent with Adaptive Multimodal Rewards. arXiv:2309.10790
- Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Hanlin Zhang, Scott Emmons, Dan Hendrycks. 2023. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark. International Conference on Machine Learning
- Micah Carroll, Alan Chan, Henry Ashton, David Krueger. 2023. Characterizing Manipulation from AI Systems. Journal EEAMO 2023
- Richard Ngo, Lawrence Chan, Sören Mindermann. 2023. The alignment problem from a deep learning perspective. arXiv:2209.00626
- Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, Kate Woolverton. 2023. Conditioning Predictive Models: Risks and Strategies. arXiv:2302.00805
- Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel. 2023. Language reward modulation for pretraining reinforcement learning. arXiv:2308.12270
- Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee. 2023. DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models. arXiv:2305.16381
- Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel. 2023. Video Prediction Models as Rewards for Reinforcement Learning. arXiv:2305.14343
- Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee. 2023. Preference transformer: Modeling human preferences using transformers for RL. arXiv:2303.00957
- Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu. 2023. Aligning text-to-image models using human feedback. arXiv:2302.12192
- Ted Moskovitz, Aaditya K Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D Dragan, Stephen McAleer. 2023. Confronting Reward Model Overoptimization with Constrained RLHF. arXiv:2310.04373
- Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell. 2023. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217
- Cassidy Laidlaw, Shivam Singhal, Anca Dragan. 2023. Preventing Reward Hacking with Occupancy Measure Regularization. ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
- Gaurav R Ghosal, Matthew Zurek, Daniel S Brown, Anca D Dragan. 2023. The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. In Proc. AAAI Conference on Artificial Intelligence
- Daniel Shin, Anca D Dragan, Daniel S Brown. 2023. Benchmarks and algorithms for offline preference-based reward learning. arXiv:2301.01392
- M Srivastava, E Biyik, S Mirchandani, N Goodman, D Sadigh. 2022. Assistive Teaching of Motor Control Tasks to Humans. arXiv preprint arXiv:2211.14003
- M. Carroll, D. Hadfield-Menell, S. Russell, and A.D. Dragan.. 2022. Estimating and Penalizing Induced Preference Shifts in Recommender Systems. . International Conference on Machine Learning (ICML), 2022.
- J. Lin, D. Fried, D. Klein, and A.D. Dragan. 2022. Inducing Structure in Reward Learning by Learning Features. International Journal of Robotics Research, 2022.
- Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil Iscen, Ken Goldberg, Pieter Abbeel. 2022. Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel. 2022. Towards more Generalizable One-shot Visual Imitation Learning. IEEE International Conference on Robotics and Automation (ICRA)
- Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel. 2022. Reward Uncertainty for Exploration in Preference-based Reinforcement Learning. 8th International Conference on Learning Representations (ICLR
- Jongjin_Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee. 2022. SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning. 8th International Conference on Learning Representations (ICLR)
- Jonathan Stray, Alon Halevy, Parisa Assar, Dylan Hadfield-Menell, Craig Boutilier, Amar Ashar, Lex Beattie, Michael Ekstrand, Claire Leibowicz, Connie Moon Sehat, Sara Johansen, Lianne Kerlin, David Vickrey, Spandana Singh, Sanne Vrijenhoek, Amy Zhang, McKane Andrus, Natali Helberger, Polina Proutskova, Tanushree Mitra, Nina Vasan. 2022. Building Human Values into Recommender Systems: An Interdisciplinary Synthesis.
- E Bıyık, DP Losey, M Palan, NC Landolfi, G Shevchuk, D Sadigh. 2022. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research 41 (1), 45-67
- V Myers, E Biyik, N Anari, D Sadigh. 2022. Learning multimodal rewards from rankings. Conference on Robot Learning, 342-352
- Erdem Bıyık, Aditi Talati, Dorsa Sadigh. 2022. APReL: A Library for Active Preference-based Reward Learning Algorithms. Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction
- Peter Barnett1, , Rachel Freedman , Justin Svegliato and Stuart Russell. 2022. Active Reward Learning from Multiple Teachers. SafeAI at AAAI
- Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks. 2022. How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios. NeurIPS 2022
- Donald Joseph Hejna III, Dorsa Sadigh. 2022. Few-Shot Preference Learning for Human-in-the-Loop RL. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
- R Shah, V Varma, R Kumar, M Phuong, V Krakovna, J Uesato, Z Kenton. 2022. Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals. arXiv preprint arXiv:2210.01790
- Rohin Shah, Steven H Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G Goecks, Nicholas Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Jun Shern Chan, Daniel del Castillo, Tom Lieberum . 2022. Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback. NeurIPS 2021 Competitions and Demonstrations Track, 259-272
- David Zhang, Micah Carroll, Andreea Bobu, Anca Dragan. 2022. Time-Efficient Reward Learning via Visually Assisted Cluster Ranking. Human-in-the-loop Learning (HILL) Workshop, NeurIPS 2022.
- Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell. 2022. imitation: Clean Imitation Learning Implementations. arXiv
- Erik Jenner, Herke Van Hoof, Adam Gleave. 2022. Calculus on MDPs: Potential Shaping as a Gradient. arXiv.
- Adam Gleave, Sam Toyer. 2022. A Primer on Maximum Causal Entropy Inverse Reinforcement Learning. arXiv
- Adam Gleave, Geoffrey Irving. 2022. Uncertainty Estimation for Language Reward Models. arXiv
- Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave. 2022. Invariance in Policy Optimisation and Partial Identifiability in Reward Learning. arXiv
- T Westenbroek, A Siththaranjan, M Sarwari, CJ Tomlin, S Sastry. 2022. On the computational consequences of cost function design in nonlinear optimal control. 2022 IEEE 61st Conference on Decision and Control (CDC), 7423-7430
- E Jenner, JMV Skalse, A Gleave. 2022. A general framework for reward function distances. NeurIPS ML Safety Workshop
- Daniel Fried, Dan Klein, Anca Dragan. 2022. Inferring Rewards from Language in Context. ACL 2022.
- Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell. 2022. Active Reward Learning from Multiple Teachers. SafeAI Workshop (at AAAI 2022)
- Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan. 2022. Estimating and Penalizing Induced Preference Shifts in Recommender Systems. ICML 2022
- Cassidy Laidlaw and Stuart Russell. 2022. Uncertain Decisions Facilitate Better Preference Learning. In Advances in Neural Information Processing Systems 34, 2022
- E Bıyık, N Anari, D Sadigh . 2022. Batch Active Learning of Reward Functions from Human Preferences. ACM Transactions on Human-Robot Interaction (THRI)
- Vivek Myers, Erdem Biyik, Nima Anari, Dorsa Sadigh. 2022. Learning Multimodal Rewards from Rankings. 5th Conference on Robot Learning
- Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike. 2021. Quantifying Differences in Reward Functions. ICLR 2021
- Smitha Milli, Luca Belli, Moritz Hardt. 2021. From Optimizing Engagement to Measuring Value. FaccT 2021
- Vael Gates, Frederick Callaway, Mark K Ho, Tom Griffiths. 2021. A rational model of people’s inferences about others’ preferences based on response times.
- David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan. 2021. Learning What To Do by Simulating the Past. ICLR 2021
- Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt. 2021. Agnostic Learning with Unknown Utilities. ITCS 2021
- Cassidy Laidlaw, Stuart Russell. 2021. Uncertain Decisions Facilitate Better Preference Learning. NeurIPS 2021
- Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan. 2021. The MineRL BASALT Competition on Learning from Human Feedback. NEurIPS 2021
- Kimin Lee, Laura Smith, Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. ICML 2021
- Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin. 2021. Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback.
- Olivia Watkins, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Jacob Andreas . 2021. Teachable Reinforcement Learning via Advice Distillation. NeurIPS 2021
- Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel. 2021. B-Pref: Benchmarking Preference-Based Reinforcement Learning. NeurIPS 2021
- Dylan P. Losey, Andrea Bajcsy, Marcia K. O’Malley, Anca D. Dragan. 2021. Physical interaction as communication: Learning robot objectives online from human corrections.
- Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum. 2021. Value Alignment Verification. ICML 2021
- Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan. 2021. Optimal Cost Design for Model Predictive Control. L4DC 2021
- Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input.
- Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan. 2021. Estimating and Penalizing Preference Shift in Recommender Systems. RecSys 2021
- Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos. 2021. Cross-Domain Imitation Learning via Optimal Transport.
- Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt. 2021. What Would Jiminy Cricket Do? Towards Agents That Behave Morally. NeurIPS 2021
- Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah. 2021. An Empirical Investigation of Representation Learning for Imitation. NeurIPS 2021
- Justin Svegliato, Samer B Nashed, Shlomo Zilberstein. 2021. Ethically compliant sequential decision making.
- Samer B Nashed, Justin Svegliato, Shlomo Zilberstein. 2021. Ethically compliant planning within moral communities.
- Justin Svegliato. 2021. Building efficient, reliable, and ethical autonomous systems.
- Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel. 2021. Towards More Generalizable One-shot Visual Imitation Learning.
- Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. 2020. The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020
- Alexander Matt Turner, Dylan Hadfield-Menell, Prasad Tadepalli. 2020. Conservative agency via attainable utility preservation.. AIES 2020
- Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan. 2020. LESS is More: Rethinking Probabilistic Models of Human Behavior. HRI 2020
- Dylan Hadfield-Menell, Gillian K. Hadfield. 2020. Incomplete Contracting and AI Alignment. AIES 2020
- Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan. 2020. Scaled Autonomy: Enabling Human Operators to Control Robot Fleets. ICRA 2020
- Rachel Freedman, Jana Schaich Borg, Walter Sinnott-Armstrong, John P. Dickerson, Vincent Conitzer. 2020. Adapting a kidney exchange algorithm to align with human values. Artificial Intelligence, 283
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Regularized Behavioral Cloning.. ICLR 2020
- Smitha Milli, Pieter Abbeel, Igor Mordatch. 2020. Interpretable and Pedagogical Examples. (Preprint)
- Eric J. Michaud, Adam Gleave, Stuart Russell. 2020. Understanding Learned Reward Functions. Deep RL Workshop, NeurIPS 2020
- Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell. 2020. DERAIL: Diagnostic Environments for Reward And Imitation Learning. Deep RL Workshop, NeurIPS 2020
- Rachel Freedman, Rohin Shah, Anca Dragan. 2020. Choice Set Misspecification in Reward Inference. IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety
- Rachel Freedman. 2020. Aligning with Heterogeneous Preferences for Kidney Exchange. IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety
- Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell. 2020. The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020
- Michele Fedrizzi, Nino Civolani, Andrew Critch. 2020. Inconsistency evaluation in pairwise comparison using norm-based distances. Decisions in Economics and Finance
- Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright. 2020. Preference learning along multiple criteria: A game-theoretic perspective. NeurIPS 2020
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2020. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards.. ICLR 2020
- Anna N. Rafferty, Rachel Jansen, Thomas L. Griffiths. 2020. Assessing Mathematics Misunderstandings via Bayesian Inverse Planning. Cognitive Science
- Jonathan Stray, Steven Adler, Dylan Hadfield-Menell. 2020. What are you optimizing for? Aligning Recommender Systems with Human Values. ICML 2020
- Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.
- Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan. 2019. Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections. IEEE Transactions on Robotics
- Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof. 2019. Combining reward information from multiple sources. NeurIPS 2019 Learning with Rich Experience Workshop
- Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto. 2019. Hierarchically Decoupled Imitation for Morphological Transfer. (Preprint)
- Dylan Hadfield-Menell, McKane Andrus, Gillian Hadfield. 2019. Legible Normativity for AI Alignment: The Value of Silly Rules. AIES 2019
- Hong Jun Jeon, Smitha Milli, Anca D. Dragan. 2019. Reward-rational (implicit) choice: A unifying formalism for reward learning. (Preprint)
- Jason Y. Zhang, Anca D. Dragan. 2019. Learning from Extrapolated Corrections. ICRA 2019
- Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. ICML 2019
- Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan. 2019. The Assistive Multi-Armed Bandit. HRI 2019
- Matthew Rahtz, James Fang, Anca D. Dragan, Dylan Hadfield-Menell. 2019. An Extensible Interactive Interface for Agent Design. ICML 2019 Human-in-the-Loop Learning Workshop
- Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2019. Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning. CogSci 2019
- Ori Plonsky, Reut Apel, Eyal Ert, Moshe Tennenholtz, David Bourgin, Joshua C. Peterson, Daniel Reichman, Thomas L. Griffiths, Stuart J. Russell, Evan C. Carter, James F. Cavanagh, Ido Erev. 2019. Predicting human decisions with behavioral theories and machine learning. (Preprint)
- Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan. 2019. Preferences Implicit in the State of the World. ICLR 2019
- Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan. 2019. On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. ICML 2019
- Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan. 2019. Nonverbal Robot Feedback for Human Teachers. CoRL 2019
- Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike. 2019. Learning Human Objectives by Evaluating Hypothetical Behavior. (Preprint)
- Smitha Milli, Anca D. Dragan. 2019. Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning. UAI 2019
- Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine. 2019. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow. ICLR 2019
- Aaron Tucker, Adam Gleave, Stuart Russell. 2018. Inverse reinforcement learning for video games. NeurIPS 2018 Deep RL Workshop
- Adam Gleave, Oliver Habryka. 2018. Multi-task Maximum Entropy Inverse Reinforcement Learning. ICML 2018 Goals RL Workshop
- Chandrayee Basu, Mukesh Singhal, Anca D. Dragan. 2018. Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries. HRI 2018
- Chris Cundy, Daniel Filan. 2018. Exploring Hierarchy-Aware Inverse Reinforcement Learning. Unpublished (ICML 2018 Goals RL Workshop)
- Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan. 2018. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning. ICML 2018
- Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan. 2018. Simplifying Reward Design through Divide-and-Conquer. RSS 2018
- Nicholas C. Landolfi, Anca D. Dragan. 2018. Social Cohesion in Autonomous Driving. IROS 2018
- Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell. 2018. Active Inverse Reward Design. ICML 2018 GoalsRL workshop
- Zeyu Zheng, Junhyuk Oh, Satinder Singh. 2018. On Learning Intrinsic Rewards for Policy Gradient Methods. NeurIPS 2018
- Dorsa Sadigh, Anca Dragan, S. Shankar Sastry, Sanjit Seshia. 2017. Active Preference-Based Learning of Reward Functions. RSS 2017
- Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan. 2017. Inverse Reward Design. NeurIPS 2017
- Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan. 2017. Pragmatic-Pedagogic Value Alignment. ISRR 2017
- Kareem Amin, Nan Jiang, Satinder Singh. 2017. Repeated Inverse Reinforcement Learning. NIPS 2017
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2016. Cooperative Inverse Reinforcement Learning. NeurIPS 2016
2.3. Human-robot cooperation
- Jessy Lin, Nicholas Tomlin, Jacob Andreas, Jason Eisner. 2023. Decision-Oriented Dialogue for Human-AI Collaboration. arXiv:2305.20076
- Mason Nakamura, Justin Svegliato, Samer B Nashed, Shlomo Zilberstein, Stuart Russell. 2023. Formal Composition of Robotic Systems as Contract Programs. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, Pieter Abbeel. 2023. GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators. arXiv:2309.13037
- Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D Dragan, Sergey Levine. 2023. Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning. arXiv:2309.03839
- Jerry Zhi-Yang He, Daniel S Brown, Zackory Erickson, Anca Dragan. 2023. Quantifying Assistive Robustness Via the Natural-Adversarial Frontier. arXiv:2310.10610
- Ran Tian, Masayoshi Tomizuka, Anca D Dragan, Andrea Bajcsy. 2023. Towards Modeling and Influencing the Dynamics of Human Learning. In Proc. 2023 ACM/IEEE International Conference on Human-Robot Interaction
- Joey Hong, Anca Dragan, Sergey Levine. 2023. Learning to Influence Human Behavior with Offline Reinforcement Learning. arXiv:2303.02265
- Andreea Bobu, Andi Peng, Pulkit Agrawal, Julie Shah, Anca D Dragan. 2023. Aligning Robot and Human Representations. arXiv:2302.01928
- J.Z.Y. He, A. Raghunathan, D.S. Brown, Z. Erickson, and A.D. Dragan. 2022. Learning Representations that Enable Generalization in Assistive Tasks. Conference on Robot Learning (CORL), 2022.
- S. Reddy, S. Levine, and A.D. Dragan. 2022. First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization. . Neural Information Processing Systems (NeurIPS), 2022
- A. Sripathy, A. Bobu, Z. Li, K. Sreenath, D.S. Brown, and A.D. Dragan. 2022. Teaching Robots to Span the Space of Functional Expressive Motion. . International Conference on Intelligent Robots and Systems (IROS), 2022
- R. Tian, L. Sun, A. Bajcsy, M. Tomizuka, and A.D. Dragan. 2022. Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models. International Conference on Robotics and Automation (ICRA), 2022.
- S. Chen*, J. Gao*, S. Reddy, G. Berseth, A.D. Dragan, and S. Levine. 2022. ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning. . * International Conference on Robotics and Automation (ICRA), 2022
- Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg.. 2022. Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision. Conference on Robot Learning (CoRL)
- Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto. 2022. Playful Interactions for Representation Learning. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin. 2022. Hierarchical Few-Shot Imitation with Skill Transition Models. 8th International Conference on Learning Representations (ICLR)
- Sumers, T. R., Hawkins, R. D., Ho, M. K., Griffiths, T. L., & Hadfield-Menell, D.. 2022. How to talk so your robot will learn: Instructions, descriptions, and pragmatics. Advances in Neural Information Processing Systems, 36
- M Srivastava, E Biyik, S Mirchandani, N Goodman, D Sadigh. 2022. Assistive Teaching of Motor Control Tasks to Humans. NeurIPS 2022
- E Brockbank, H Wang, J Yang, S Mirchandani, E Bıyık, D Sadigh, JE Fan. 2022. How do people incorporate advice from artificial agents when making physical judgments?. CogSci 2022
- E Bıyık. 2022. Learning Preferences for Interactive Autonomy. Stanford University
- E Bıyık. 2022. Learning from Humans for Adaptive Interaction. 2022 Piooneers Workshop at the 17th ACM/IEEE International Conference on Human-Robot Interaction
- Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh . 2022. Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
- Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh. 2022. Eliciting Compatible Demonstrations for Multi-Human Imitation Learning. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
- Jennifer Grannen*, Yilin Wu*, Suneel Belkhale, Dorsa Sadigh. 2022. Learning Bimanual Scooping Policies for Food Acquisition. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
- Siddharth Karamcheti*, Raj Palleti*, Yuchen Cui, Percy Liang, Dorsa Sadigh. 2022. Shared Autonomy for Robotic Manipulation with Language Corrections. Workshop on Learning with Natural Language Supervision @ ACL, May 2022
- Suneel Belkhale, Ethan Kroll Gordon, Yuxiao Chen, Siddhartha Srinivasa, Tapomayukh Bhattacharjee, Dorsa Sadigh. 2022. Balancing Efficiency and Comfort in Robot-Assisted Bite Transfer. International Conference on Robotics and Automation (ICRA), May 2022
- Zhangjie Cao*, Zihan Wang*, Dorsa Sadigh. 2022. Learning from Imperfect Demonstrations via Adversarial Confidence Transfer. International Conference on Robotics and Automation (ICRA), May 2022
- Dylan Losey, Hong Jun Jeon, Mengxi Li, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Jeannette Bohg, Dorsa Sadigh. 2022. Learning Latent Actions to Control Assistive Robots. Journal of Autonomous Robots (AURO), 2022
- TR Sumers, RD Hawkins, MK Ho, TL Griffiths, D Hadfield-Menell. 2022. How to talk so your robot will learn: Instructions, descriptions, and pragmatics. arXiv preprint arXiv:2206.07870
- T Sumers, RD Hawkins, MK Ho, TL Griffiths, D Hadfield-Menell. 2022. How to talk so AI will learn: Instructions, descriptions, and autonomy. Advances in Neural Information Processing Systems
- H Hu, JF Fisac. 2022. Active uncertainty reduction for human-robot interaction: An implicit dual control approach. Algorithmic Foundations of Robotics XV: Proceedings of the Fifteenth Workshop on the Algorithmic Foundations of Robotics
- H Hu, K Nakamura, JF Fisac. 2022. SHARP: Shielding-aware robust planning for safe and efficient human-robot interaction. IEEE Robotics and Automation Letters 7 (2), 5591-5598
- H Hu, JF Fisac. 2022. Active uncertainty learning for human-robot interaction: An implicit dual control approach. arXiv preprint arXiv:2202.07720
- Mesut Yang, Micah Carroll, Anca Dragan. 2022. Optimal Behavior Prior: Improving Human-AI Collaboration Through Generalizable Human Models.. Human-in-the-loop Learning (HILL) Workshop, NeurIPS 2022
- Naman Shah, Pulkit Verma, Trevor Angle, Siddharth Srivastava. 2022. JEDAI: A System for Skill-Aligned Explainable Robot Planning. AAAMS 2022
- Mehdi Dadvar, Keyvan Majd, Elena Oikonomou, Georgios Fainekos, Siddharth Srivastava. 2022. Joint Communication and Motion Planning for Cobots.
- Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, AD Dragan, Rohin Shah. 2021. Evaluating the Robustness of Collaborative Agents.
- Andrea Bajcsy, Somil Bansal, Ellis Ratner, Claire J. Tomlin, Anca D. Dragan. 2021. A Robust Control Framework for Human Motion Prediction. IEEE Robotics and Automation Letters
- Siddharth Srivastava. 2021. Unifying Principles and Metrics for Safe and Assistive AI. AAAI 2021
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2021. Pragmatic Image Compression for Human-in-the-Loop Decision-Making.
- Liting Sun, Xiaogang Jia, Anca D. Dragan. 2021. On complementing end-to-end human behavior predictors with planning.
- Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan. 2021. Analyzing Human Models that Adapt Online.
- Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan. 2021. Dynamically Switching Human Prediction Models for Efficient Planning. ICRA 2021
- Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. ICRA 2021
- Jensen Gao, Siddharth Reddy, Glen Berseth, Nicholas Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca Dragan, Sergey Levine. 2021. X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback. ICLR 2021
- Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell. 2020. Multi-Principal Assistance Games.
- David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D. Dragan, Claire J. Tomlin. 2020. Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games. ICRA 2020
- Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin. 2020. A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning. ICRA 2020
- Vael Gates, Thomas L. Griffiths, Anca D. Dragan. 2020. How to Be Helpful to Multiple People at Once. Other cognitive science 44(6)
- Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell. 2020. Benefits of Assistance over Reward Learning. NeurIPS 2020 Workshop on Cooperative AI
- Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell. 2020. Multi-Principal Assistance Games: Definition and Collegial Mechanisms. Cooperative AI Workshop, NeurIPS 2020
- Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca D. Dragan. 2020. AvE: Assistance via Empowerment. NeurIPS 2020
- Andrew Critch, Stuart Russell. 2019. Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making. AIES 2019
- Elis Stefansson, Jaime F. Fisac, Dorsa Sadigh, S. Shankar Sastry, Karl H. Johansson. 2019. Human-robot interaction for truck platooning using hierarchical dynamic games. European Control Conference 2019
- Micah Carroll, Rohin Shah, Mark Ho, Thomas Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan. 2019. On the Utility of Learning about Humans for Human-AI Coordination. NeurIPS 2019
- Rohan Choudhury, Gokul Swamy, Dylan Hadfield-Menell, Anca D. Dragan. 2019. On the Utility of Model Learning in HRI. HRI 2019
- Sarath Sreedharan, Siddharth Srivastava, David Smith, Subbarao Kambhampati. 2019. Why Can’t You Do That, HAL? Explaining Unsolvability of Planning Tasks. IJCAI 2019
- Shihui Li, Yi Wu, Xinyue Cui, Honghua Dong, Fei Fang, Stuart Russell. 2019. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. AAAI 2019
- Aaron Bestick, Ravi Pandya, Ruzena Bajcsy, Anca D. Dragan. 2018. Learning Human Ergonomic Preferences for Handovers. ICRA 2018
- Allan Zhou, Anca D. Dragan. 2018. Cost Functions for Robot Motion Style. IROS 2018
- Andrea Bajcsy, Dylan P. Losey, Marcia K. O'Malley, Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. HRI 2018
- David Fridovich-Keil, Andrea Bajcsy, Jaime F. Fisac, Sylvia L. Herbert, Steven Wang, Anca D. Dragan, Claire J. Tomlin. 2018. Confidence-aware motion prediction for real-time collision avoidance. International Journal of Robotics Research
- Dorsa Sadigh, Nick Landolfi, Shankar S. Sastry, Sanjit A. Seshia, Anca D. Dragan. 2018. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots
- Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, Anca D. Dragan. 2018. Probabilistically Safe Robot Planning with Confidence-Based Human Predictions. RSS 2018
- Liting Sun, Wei Zhan, Masayoshi Tomizuka, Anca D. Dragan. 2018. Courteous Autonomous Cars. IROS 2018
- Minae Kwon, Sandy H. Huang, Anca D. Dragan. 2018. Expressing Robot Incapability. HRI 2018
- Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan. 2018. Establishing Appropriate Trust via Critical States. IROS 2018
- Shun Zhang, Edmund H. Durfee, Satinder P. Singh. 2018. Minimax-regret querying on side effects for safe optimality in factored Markov decision processes. IJCAI 2018
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Shared Autonomy via Deep Reinforcement Learning. RSS 2018
- Siddharth Reddy, Anca D. Dragan, Sergey Levine. 2018. Where Do You Think You’re Going?: Inferring Beliefs about Dynamics from Behavior. NeurIPS 2018
- Allan Zhou, Dylan Hadfield-Menell, Anusha Nagabaudi, Anca Dragan. 2017. Expressive Robot Motion Timing. HRI 2017
- Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Singhal, Anca Dragan. 2017. Do You Want Your Autonomous Car to Drive Like You?. HRI 2017
- Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, Thomas L. Griffiths. 2017. Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration. AAMAS 2017
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell. 2017. The Off-Switch Game. IJCAI 2017
- Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg. 2017. Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations. ICRA 2017
- Sandy H. Huang, David Held, Pieter Abbeel, Anca Dragan. 2017. Enabling Robots to Communicate their Objectives. RSS 2017
- Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell. 2017. Should Robots be Obedient?. IJCAI 2017
- Aaron Bestick, Ruzena Bajcsy, Anca Dragan. 2016. Implicitly Assisting Humans to Choose Good Grasps in Robot to Human Handovers. 2016 International Symposium on Experimental Robotics
- Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, Anca Dragan. 2016. Information Gathering Actions Over Human Internal State. IROS 2016
- Dorsa Sadigh, Shankar Sastry, Sanjit Seshia, Anca Dragan. 2016. Planning for Autonomous Cars that Leverage Effects on Human Actions. RSS 2016
- Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, Claire J. Tomlin. 2016. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems. IEEE Transactions on Automatic Control
- Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, Anca D. Dragan. 2016. Generating Plans that Predict Themselves. CDC 2016
- Negar Mehr, Roberto Horowitz, Anca Dragan. 2016. Inferring and Assisting with Constraints in Shared Autonomy. CDC 2016
2.4. Multi-agent perspectives and applications
- Kelsey Rebecca Allen, Franziska Brändle, Matthew Botvinick, Judith Fan, Samuel J Gershman, Thomas L Griffiths, Joshua Hartshorne, Tobias U Hauser, Mark K Ho, Joshua de Leeuw, Wei Ji Ma, Kou Murayama, Jonathan D Nelson, Bas van Opheusden, H Thomas Pouncy, Janet Rafner, Iyad Rahwan, Robb Rutledge, Jacob Sherson, Ozgur Simsek, Hugo Spiers, Christopher Summerfield, Mirko Thalmann, Natalia Velez, Andrew Watrous, Joshua Tenenbaum, Eric Schulz. 2023. Using Games to Understand the Mind. PsyArXiv - Preprint
- Johannes Treutlein. 2023. Modeling evidential cooperation in large worlds. arXiv:2307.04879
- Caspar Oesterheld, Johannes Treutlein, Emery Cooper, Rubi Hudson. 2023. Incentivizing honest performative predictions with proper scoring rules. Conference on Uncertainty in Artificial Intelligence
- Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois Knoll, Yaodong Yang. 2023. Safe multi-agent reinforcement learning for multi-robot control. Journal Artificial Intelligence
- Yongzhao Wang, Michael P Wellman. 2023. Empirical Game-Theoretic Analysis for Mean Field Games. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
- Zun Li, Marc Lanctot, Kevin R McKee, Luke Marris, Ian Gemp, Daniel Hennes, Kate Larson, Yoram Bachrach, Michael P Wellman, Paul Muller. 2023. Search-Improved Game-Theoretic Multiagent Reinforcement Learning in General and Negotiation Games. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
- Max Olan Smith, Michael P Wellman. 2023. Co-Learning Empirical Games and World Models. arXiv:2305.14223
- Yongzhao Wang, Michael P Wellman. 2023. Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis. arXiv:2302.04928
- Zun Li, Marc Lanctot, Kevin R McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P Wellman. 2023. Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning. arXiv:2302.00797
- Max Olan Smith, Thomas Anthony, Michael P Wellman. 2023. Strategic Knowledge Transfer. Journal of Machine Learning Research
- Katherine Mayo, Shaily Fozdar, Michael P Wellman. 2023. Flagging Payments for Fraud Detection: A Strategic Agent-Based Model. 2023, Association for the Advancement of Artificial Intelligence
- Ivan Geffner, Joseph Y Halpern. 2023. Communication games, sequential equilibrium, and mediators. arXiv:2309.14618
- Ittai Abraham, Danny Dolev, Ittay Eyal, Joseph Y Halpern. 2023. Colordag: An incentive-compatible blockchain. arXiv:2308.11379
- Kaya Alpturer, Joseph Y Halpern, Ron van der Meyden. 2023. Optimal Eventual Byzantine Agreement Protocols with Omission Failures. Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing
- Xinming Liu, Joseph Y Halpern. 2023. Strategic Play By Resource-Bounded Agents in Security Games. In Proc. 2023 International Conference on Autonomous Agents and Multiagent Systems
- Ivan Geffner, Joseph Y Halpern. 2023. Lower Bounds on Implementing Mediators in Asynchronous Systems with Rational and Malicious Agents. Journal of the ACM
- Meir Friedenberg, Joseph Y Halpern. 2023. Joint Behavior and Common Belief. R. Verbrugge (Ed.): Theoretical Aspects of Rationality and Knowledge 2023 (TARK 2023)
- Niko A Grupen, Michael Hanlon, Alexis Hao, Daniel D Lee, Bart Selman. 2023. Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning. arXiv:2301.11857
- Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael D Dennis, Stuart Russell. 2023. Who needs to know? Minimal knowledge for optimal coordination. In Proc. ICML-23, 2023
- Yuqing Du, Pieter Abbeel, Aditya Grover. 2022. It Takes Four to Tango: Multiagent Self Play for Automatic Curriculum Generation. 8th International Conference on Learning Representations (ICLR)
- NA Grupen, B Selman, DD Lee. 2022. Cooperative Multi-Agent Fairness and Equivariant Policies. AAAI Conference on Artificial Intelligence
- C KONICKI, M CHAKRABORTY, MP WELLMAN. 2022. Exploiting Extensive-Form Structure in Empirical Game-Theoretic Analysis. 18th Conference on Web and Internet Economics (WINE),
- Z LI, F JIA, A MATE, S JABBARI, M CHAKRABORTY, M TAMBE, AND Y VOROBEYCHIK. 2022. Solving Structured Hierarchical Games Using Differential Backward Induction. 38th Conference on Uncertainty in Artificial Intelligence (UAI),
- Y WANG, Q MA AND MP WELLMAN. 2022. Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis. 21st International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)
- Hardy, M. D., Krafft, P. M., Thompson, B., & Griffiths, T. L.. 2022. Overcoming Individual Limitations Through Distributed Computation: Rational Information Accumulation in Multigenerational Populations.. Topics in Cognitive Science, 14(3), 550–573
- Hawkins, R. D., Franke, M., Frank, M. C., Goldberg, A. E., Smith, K., Griffiths, T. L., & Goodman, N. D.. 2022. From partners to populations: A hierarchical Bayesian account of coordination and convention. Psychological Review
- A Critch, M Dennis, S Russell . 2022. Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory.
- S Emmons, C Oesterheld, A Critch, V Conitzer, S Russell. 2022. For learning in symmetric teams, local optima are global nash equilibria. International Conference on Machine Learning, 5924-5943
- E Biyik, A Lalitha, R Saha, A Goldsmith, D Sadigh. 2022. Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams. Proceedings of the AAAI Conference on Artificial Intelligence 36 (9), 9296-9303
- Z Cao, E Biyik, G Rosman, D Sadigh. 2022. Leveraging Smooth Attention Prior for Multi-Agent Trajectory Prediction. 2022 International Conference on Robotics and Automation (ICRA), 10723-10730
- Andy Shih, Stefano Ermon, Dorsa Sadigh. 2022. Conditional Imitation Learning for Multi-Agent Games. 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2022
- Bidipta Sarkar*, Aditi Talati*, Andy Shih*, Dorsa Sadigh. 2022. PantheonRL: A MARL Library for Dynamic Training Interactions. Proceedings of the 36th AAAI Conference on Artificial Intelligence (Demo Track), February 2022
- Shushman Choudhury, Jayesh Gupta, Mykel J. Kochenderfer, Dorsa Sadigh, Jeannette Bohg. 2022. Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints. Journal of Autonomous Robots (AURO), 2022
- PJK Christoffersen, AA Haupt, D Hadfield-Menell. 2022. Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL. arXiv preprint arXiv:2208.10469
- TR Sumers, RD Hawkins, MK Ho, TL Griffiths, D Hadfield-Menell. 2022. Linguistic communication as (inverse) reward design. arXiv preprint arXiv:2204.05091
- Raphael Köster, Dylan Hadfield-Menell, Richard Everett, Laura Weidinger, Gillian K Hadfield, Joel Z Leibo. 2022. Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents. Proceedings of the National Academy of Sciences 119 (3), e2106028118
- DR Anthony, DP Nguyen, D Fridovich-Keil, JF Fisac. 2022. Back to the Future: Efficient, Time-Consistent Solutions in Reach-Avoid Games. 2022 International Conference on Robotics and Automation (ICRA), 6830-6836
- Pavel Czempin, Adam Gleave. 2022. Reducing Exploitability with Population Based Training. arXiv
- JG Kuba, X Feng, S Ding, H Dong, J Wang, Y Yang. 2022. Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl. arXiv preprint arXiv:2208.01682
- M Wen, JG Kuba, R Lin, W Zhang, Y Wen, J Wang, Y Yang. 2022. Multi-agent reinforcement learning is a sequence modeling problem. arXiv preprint arXiv:2205.14953
- Z Dou, JG Kuba, Y Yang. 2022. Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2202.04868
- Timon Willi*, Alistair Letcher*, Johannes Treutlein*, Jakob Foerster. 2022. COLA: Consistent Learning with Opponent-Learning Awareness. ICML 2022
- Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster. 2022. Similarity-based Cooperation. arXiv:2211.14468
- N. Lauffer, M. Ghasemi, A. Hashemi, Y. Savas, and U. Topcu.. 2022. No‑regret Learning in Dynamic Stackelberg Games. arXiv preprint 2022.
- Yuqing Du, Pieter Abbeel, Aditya Grover. 2022. It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation. ICLR 2022
- Eladio Montero-Porras, Jelena Grujić, Elias Fernández Domingos & Tom Lenaerts. 2022. Inferring strategies from observations in long iterated Prisoner’s dilemma experiments. Scientific Reports volume 12, Article number 7589 (2022)
- Elias Fernández Domingos, Inês Terrucha, Rémi Suchon, Jelena Grujić, Juan C. Burguillo, Francisco C. Santos & Tom Lenaerts. 2022. Delegation to artificial agents fosters prosocial behaviors in the collective risk dilemma. Scientific reports volume 12, Article number: 8492 (2022)
- Xintong Wang, David M Pennock, Nikhil R Devanur, David M Rothschild, Biaoshuai Tao, Michael P Wellman. 2021. Designing a Combinatorial Financial Options Market.
- Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell. 2021. Accumulating Risk Capital Through Investing in Cooperation. AAMAS 2021
- Scott Emmons, Caspar Oesterheld, Andrew Critch, Vince Conitzer, Stuart Russell. 2021. Symmetry, Equilibria, and Robustness in Common-Payoff Games. GAIW 2021
- Jonathan Stray. 2021. Designing Recommender Systems to Depolarize.
- Katherine Mayo, Shaily Fozdar, Michael P. Wellman. 2021. An Agent-Based Model of Strategic Adoption of Real-Time Payments.
- Max Olan Smith, Thomas Anthony, Michael P Wellman. 2021. Iterative Empirical Game Solving via Single Policy Best Response. ICLR 2021
- Xintong Wang, Christopher Hoang, Yevgeniy Vorobeychik, Michael P Wellman. 2021. Spoofing the Limit Order Book: A Strategic Agent-Based Analysis. Games 2021
- Yongzhao Wang, Qiurui Ma, Michael P Wellman. 2021. Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis.
- Zun Li, Michael P Wellman. 2021. Evolution Strategies for Approximate Solution of Bayesian Games. AAAI 2021
- Katherine Mayo, Michael P Wellman. 2021. A Strategic Analysis of Portfolio Compression. AAMAS 2021
- Megan Shearer, David Byrd, Tucker Hybinette Balch, Michael P Wellman. 2021. Stability Effects of Arbitrage in Exchange Traded Funds: An Agent-Based Model. ICAIF 2021
- Stephen McAleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox. 2021. Improving Social Welfare While Preserving Autonomy via a Pareto Mediator.
- Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster. 2021. A New Formalism, Method and Open Issues for Zero-Shot Coordination. PMLR 2021
- Jialu Bao, Kun He, Xiaodong Xin, Bart Selman, John E. Hopcroft. 2020. Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective. (Preprint, submitted to TAMC 2020)
- Raphael Köster, Dylan Hadfield-Menell, Gillian K. Hadfield, Joel Z. Leibo. 2020. Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors. AAMAS 2020
- Robert D. Hawkins, Noah D. Goodman, Adele E. Goldberg, Thomas L. Griffiths. 2020. Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks. CogSci 2020
- Stefano V. Albrechta, Peter Stone, Michael P. Wellman. 2020. Special issue on autonomous agents modelling other agents: Guest editorial. Artificial Intelligence 285
- Valerio Capraro, Joseph Y Halpern. 2020. Translucent players: Explaining cooperative behavior in social dilemmas. Rationality and Society 31(4), 371-408
- Zun Li, Michael P. Wellman. 2020. Structure Learning for Approximate Solution of Many-Player Games. AAAI 2020
- Max Olan Smith, Thomas Anthony, Yongzhao Wang, Michael P Wellman. 2020. Learning to play against any mixture of opponents.
- Michael Chang, Sid Kaushik, S. Matthew Weinberg, Tom Griffiths, Sergey Levine. 2020. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions. ICML 2020
- Qi Zhang, Edmund H. Durfee, Satinder Singh. 2020. Efficient Querying for Cooperative Probabilistic Commitments.
- Anagha Kulkarni, Siddharth Srivastava, Subbarao Kambhampati. 2019. A unified framework for planning in adversarial and cooperative environments. AAAI 2019
- Arunesh Sinha, Michael P. Wellman. 2019. Incentivizing Collaboration in a Competition. AAMAS 2019
- Ittai Abraham, Danny Dolev, Ivan Geffner, Joseph Y. Halpern. 2019. Implementing Mediators with Asynchronous Cheap Talk. PODC 2019
- Ittai Abraham, Danny Dolev, Joseph Y. Halpern. 2019. Distributed Protocols for Leader Election: A Game-Theoretic Perspective. ACM Transactions on Economics and Computation 7(1)
- Joseph Y. Halpern, Rafael Pass. 2019. Sequential equilibrium in computational games. ACM Transactions on Economics and Computation
- Joseph Y. Halpern, Rafael Pass, Daniel Reichman. 2019. On the Existence of Nash Equilibrium in Games with Resource-Bounded Players. SAGT 2019
- Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2019. The truth behind the myth of the folk theorem. Games and Economic Behavior, 117
- Mark K. Ho, Joanna Korman, Thomas L. Griffiths. 2019. The Computational Structure of Unintentional Meaning. CogSci 2019
- Megan Shearer, Gabriel Rauterberg, Michael P. Wellman. 2019. An Agent-Based Model of Financial Benchmark Manipulation. ICML 2019
- Meir Friedenberg, Joseph Y. Halpern. 2019. Blameworthiness in Multi-Agent Settings. AAAI 2019
- Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, Michael P. Wellman. 2019. Deception in finitely repeated security games. AAAI 2019
- Xintong Wang, Chris Hoang, Michael P. Wellman. 2019. Learning-Based Trading Strategies in the Face of Market Manipulation. ICML 2019 Workshop on AI in Finance
- Andrew Whalen, Thomas L. Griffiths, Daphna Buchsbaum. 2018. Sensitivity to Shared Information in Social Learning. 3.3. Cognitive science, uncategorized
- Bryce Wiedenbeck, Fengjun Yang, Michael P. Wellman. 2018. A Regression Approach for Modeling Games with Many Symmetric Players. AAAI 2018
- Joseph Y. Halpern, Rafael Pass. 2018. Game Theory with Translucent Players. International Journal of Game Theory
- Mason Wright and Michael P. Wellman. 2018. Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions. AAMAS 2018
- Natasha Alechina, Joseph Y. Halpern, Ian A. Kash, Brian Logan. 2018. Incentive-Compatible Mechanisms for Norm Monitoring in Open Multi-agent perspectives and applications. JAIR
- Nishant Desai, Andrew Critch, Stuart J. Russell. 2018. Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making. NeurIPS 2018
- Adam Bjorndahl, Joseph Y. Halpern, Rafael Pass. 2017. Reasoning about Rationality. Games and Economic Behavior 104, 146-164
- Joseph Y. Halpern, Rafael Pass, Lior Seeman. 2017. Computational Extensive-Form Games. EC 2016
- Michael Wellman, Eric Sodomka, Amy Greenwald. 2017. Self-confirming price-prediction strategies for simultaneous one-shot auctions. Games and Economic Behavior, 102, 339–372
- Natasha Alechina, Joseph Y. Halpern, Brian Logan. 2017. Causality, Responsibility and Blame in Team Plans. AAMAS 2017
- Joseph Y. Halpern, Xavier Vilaca. 2016. Rational Consensus (extended abstract). 2016 ACM Symposium on Principles of Distributed Computing
2.5. Models of bounded or imperfect rationality
- Ruiqi He, Carlos G Correa, Thomas L Griffiths, Mark K Ho. 2023. Structurally guided task decomposition in spatial navigation tasks. arXiv:2310.02221
- Declan Campbell, Sreejan Kumar, Tyler Giallanza, Jonathan D Cohen, Thomas L Griffiths. 2023. Relational Constraints On Neural Networks Reproduce Human Biases towards Abstract Geometric Regularity. arXiv:2309.17363
- Frederick Callaway, Thomas L Griffiths, Kenneth A Norman, Qiong Zhang. 2023. Optimal metacognitive control of memory recall. Journal Psychological Review
- Nikolay Sukhov, Rachit Dubey, Annie Duke, Tom Griffiths. 2023. When to Keep Trying and When to Let Go: Benchmarking Optimal Quitting. PsyArXiv - Preprint
- Mark K Ho, Jonathan D Cohen, Tom Griffiths. 2023. Rational simplification and rigidity in human planning. PsyArXiv - Preprint
- David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh. 2023. On the Convergence of Bounded Agents. arXiv:2307.11044
- Joseph Y Halpern, Aditya Saraf. 2023. Chunking Tasks for Present-Biased Agents. Proceedings of the 24th ACM Conference on Economics and Computation
- Cassidy Laidlaw and Stuart Russell. 2022. Uncertain Decisions Facilitate Better Preference Learning. Proceedings of NeurIPS-21
- K Oktar, T Lombrozo. 2022. Deciding to be authentic: Intuition is favored over deliberation when authenticity matters. Cognition 223, 105021
- D Kinney, T Lombrozo. 2022. Evaluations of Causal Claims Reflect a Trade-Off Between Informativeness and Compression. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
- Bai, X., Fiske, S. T., & Griffiths, T. L. 2022. Globally inaccurate stereotypes can result from locally adaptive exploration. Psychological Science 33(5) 671–684
- Callaway, F., Hardy, M., & Griffiths, T. 2022. Optimal nudging for cognitively bounded agents: A framework for modeling, predicting, and controlling the effects of choice architectures.
- Callaway, F., Griffiths, T. L., & Karreskog, G. 2022. Rational heuristics for one-shot games.
- Callaway, F., van Opheusden, B., Gul, S., Das, P., Krueger, P. M., Griffiths, T. L., & Lieder, F.. 2022. Rational use of cognitive resources in human planning. Nature Human Behaviour,. Nature Human Behaviour, 6, 1–14
- Dasgupta, I., & Griffiths, T. L.. 2022. Clustering and the efficient use of cognitive resources.. Journal of Mathematical Psychology, 109, 102675
- Dubey, R., Griffiths, T. L., & Dayan, P. . 2022. The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons. PLoS Computational Biology, 18(8), e1010316
- DMRL Ho, M. K., Abel, D., Correa, C. G., Littman, M. L., Cohen, J. D., & Griffiths, T. L. 2022. People construct simplified mental representations to plan.. Nature, 606(7912), 129-136
- Jain, Y. R., Callaway, F., Griffiths, T. L., Dayan, P., He, R., Krueger, P. M., & Lieder, F. 2022. A computational process-tracing method for measuring people’s planning strategies and how they change over time. Behavior Research Methods, 1-43
- DMRL Russek, E., Acosta-Kane, D., van Opheusden, B., Mattar, M. G., & Griffiths, T. . 2022. Time spent thinking in online chess reflects the value of computation.
- Zhang, Q., Griffiths, T. L., & Norman, K. A.. 2022. Optimal policies for free recall. Psychological Review
- Justin Svegliato , Connor Basich , Sandhya Saisubramanian , and Shlomo Zilberstein. 2022. Metareasoning for Safe Decision Making in Autonomous Systems. ICRA
- Abhinav Bhatia, Justin Svegliato, Samer B. Nashed, Shlomo Zilberstein. 2022. Tuning the Hyperparameters of Anytime Planning: A Metareasoning Approach with Deep Reinforcement Learning. ICAPS
- Samer B. Nashed , Justin Svegliato , Abhinav Bhatia , Stuart Russell , Shlomo Zilberstein. 2022. Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning. IROS
- Connor Basich , Justin Svegliatob , Kyle H. Wrayc , Stefan Witwickic , Joydeep Biswasd Shlomo Zilbersteina. 2022. Competence-Aware Systems. AIJ
- M Curmei, AA Haupt, B Recht, D Hadfield-Menell. 2022. Towards Psychologically-Grounded Dynamic Preference Models. Proceedings of the 16th ACM Conference on Recommender Systems, 35-48
- Cassidy Laidlaw and Anca Dragan. 2022. The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models. ICLR 2022
- Cassidy Laidlaw, Anca Dragan. 2022. The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models. ICLR 2022
- Samer B. Nashed*, Justin Svegliato*, Abhinav Bhatia, Shlomo Zilberstein, Stuart Russell. 2022. Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning. ICRA 2022
- Justin Svegliato, Connor Basich, Sandhya Saisubramanian, Shlomo Zilberstein. 2022. Metareasoning for Safe Decision Making in Autonomous Systems. ICRA 2022
- Abhinav Bhatia, Justin Svegliato, Samer Nashed, Shlomo Zilberstein. 2022. Tuning the Hyperparameters of Anytime Planning: A Metareasoning Approach with Deep RL. ICAPS 2022
- Bill Thompson and Thomas L. Griffiths. 2021. Human biases limit cumulative innovation.
- Ruairidh M. Battleday, Joshua C. Peterson, and Thomas L. Griffiths. 2021. From convolutional neural networks to models of higher-level cognition (and back again).
- Thomas A. Langloisa, Nori Jacobyc, Jordan W. Suchowe, and Thomas L. Griffiths. 2021. Serial reproduction reveals the geometry of visuospatial representations. PNAS 2021
- Samarie Wilson, Somya Arora, Qiong Zhang, Thomas L. Griffiths. 2021. A Rational Account of Anchor Effects in Hindsight Bias.
- Ruairidh M Battleday, Joshua C Peterson, Thomas L Griffiths. 2021. From convolutional neural networks to models of higher level cognition (and back again).
- Sreejan Kumar, Ishita Dasgupta, Jonathan D. Cohen, Nathaniel D. Daw, and Thomas L. Griffiths. 2021. Meta-Learning of Structured Task Distributions in Humans and Machines. ICLR 2021
- Frederick Callaway, Antonio Rangel, Thomas L. Griffiths. 2021. Fixation patterns in simple choice reflect optimal information sampling.
- Falk Lieder, Owen X. Chen, Paul M. Krueger, Thomas L. Griffiths. 2020. Cognitive prostheses for goal achievement. Nature Human Behaviour 3:1096–1106
- Falk Lieder, Thomas L. Griffiths. 2020. Advancing rational analysis to the algorithmic level. Behavioral and Brain Sciences, 43, E27
- Frederick Callaway, Antonio Rangel, Tom Griffiths. 2020. Fixation patterns in simple choice are consistent withoptimal use of cognitive resources. (Preprint)
- Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. The Efficiency of Human Cognition Reflects Planned Information Processing. AAAI 2020
- Smitha Milli, Falk Lieder, Tom Griffiths. 2020. A Rational Reinterpretation of Dual-Process Theories. UAI 2020
- Joseph Y Halpern, Evan Piermont. 2020. Dynamic Awareness.
- Xinming Liu, Joseph Halpern. 2020. Bounded Rationality in Las Vegas: Probabilistic Finite Automata Play Multi-Armed Bandits. PMLR
- Ida Momennejad, Jarrod Lewis-Peacock, Kenneth A Norman, Jonathan D Cohen, Satinder Singh, Richard L Lewis. 2020. Rational use of episodic and working memory: A normative account of prospective memory. Neuropsychologia
- Qiong Zhang, Kenneth A. Norman, Tom Griffiths. 2020. The method of loci is an optimal policy for memory search. CogSci 2020
- Rachel Jansen, Anna N. Rafferty, Tom Griffiths. 2020. A rational model of sequential self-assessment. CogSci 2020
- Carlos G. Correa, Mark K. Ho, Frederick Callaway, Tom Griffiths. 2020. Resource-rational Task Decomposition to Minimize Planning Costs. CogSci 2020
- Mark K. Ho, David Abel, Jonathan D. Cohen, Michael L. Littman, Thomas L. Griffiths. 2020. People Do Not Just Plan,They Plan to Plan. AAAI 2020
- Falk Lieder, Thomas L. Griffiths. 2019. Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, E1
- Frederick Callaway, Tom Griffiths. 2019. Attention in value-based choice as optimal sequential sampling. (Preprint)
- Joshua Peterson, David Bourgin, Daniel Reichman, Thomas Griffiths, Stuart Russell. 2019. Cognitive model priors for predicting human decisions. ICML 2019
- Mark K. Ho, David Abel, Tom Griffiths, Michael L. Littman. 2019. The Value of Abstraction. Current Opinion in Behavioral Sciences, 29:111-116
- Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths. 2019. Capturing human categorization of natural images at scale by combining deep networks and cognitive models. (Preprint)
- Thomas L. Griffiths, Frederick Callaway, Michael B. Chang, Erin Grant, Paul M. Krueger, Falk Lieder. 2019. Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences 29: 24-30
- Falk Lieder, Amitai Shenhav, Sebastian Musslick, Thomas L. Griffiths. 2018. Rational metareasoning and the plasticity of cognitive control. PLoS Comp. Biol.
- Falk Lieder, Thomas L. Griffiths, Ming Hsu. 2018. Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychological Review
- Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. Empirical evidence for resource-rational anchoring and adjustment. Psychonomic Bulletin & Review
- Falk Lieder, Thomas L. Griffiths, Quentin J. M. Huys, Noah D. Goodman. 2018. The anchoring bias reflects rational use of cognitive resources. Psychonomic Bulletin & Review
- Joseph Y. Halpern, Lior Seeman. 2018. Is state-dependent valuation more adaptive than simpler rules?. Behavioural Processes
- Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, Matthew M Botvinick. 2017. Toward a Rational and Mechanistic Account of Mental Effort. Annual Review of Neuroscience, 40, 9f4b26db33-124
- Falk Lieder, Paul Krueger, Tom Griffiths. 2017. An automatic method for discovering rational heuristics for risky choice. CogSci 2017
- Smitha Milli, Falk Lieder, Tom Griffiths. 2017. When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?. AAAI 2017
- Owain Evans, Andreas Stuhlmüller, John Salvatier, Daniel Filan. 2017. Modeling Agents with Probabilistic Programs.
- Nan Rong, Joseph Y. Halpern, Ashutosh Saxena. 2016. MDPs with Unawareness in Robotics. UAI 2016
2.6. Models of human cognition
- Evan Russek, Frederick Callaway, Thomas L. Griffiths . 2023. Inverting cognitive models with machine learning to infer preferences from fixations. NeuRIPS 2023 Workshop on Gaze Meets ML
- Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell, Thomas Unterthiner, Andrew K. Lampinen, Klaus-Robert Müller, Mariya Toneva, Thomas L. Griffiths. 2023. Getting aligned on representational alignment. arXiv:2310.13018
- David B Kinney, Tania Lombrozo. 2023. Building Compressed Causal Models of the World. PsyArXiv - Preprint
- Tania Lombrozo, Emily G Liquin. 2023. Explanation Is Effective Because It Is Selective. Current Directions in Psychological Science
- Corey Cusimano, Tania Lombrozo. 2023. People recognize and condone their own morally motivated reasoning. Journal Cognition
- Casey Lewry, George Tsai, Tania Lombrozo. 2023. Are ethical explanations explanatory? Meta-ethical beliefs shape judgments about explanations for social change. PsyArXiv
- Thalia Vrantsidis, Tania Lombrozo. 2023. The Edge of Ockham’s Razor: Examining Boundary Conditions on Preferences for Simpler Explanations. Proceedings of the Annual Meeting of the Cognitive Science Society
- David Kinney, Tania Lombrozo. 2023. Tell Me Your (Cognitive) Budget, and I’ll Tell You What You Value: Evidential Relationships Between Values, Data, and Generic Causal Claims about the Social World. Journal Proceedings of the Annual Meeting of the Cognitive Science Society
- Casey Lewry, Sera Gorucu, Emily G Liquin, Tania Lombrozo. 2023. Minimally counterintuitive stimuli trigger greater curiosity than merely improbable stimuli. Journal Cognition
- Daniel Reichman, Falk Lieder, David D Bourgin, Nimrod Talmon, Thomas L Griffiths. 2023. The Computational Challenges of Means Selection Problems: Network Structure of Goal Systems Predicts Human Performance. Journal Cognitive science
- Michael Y Li, Fred Callaway, William D Thompson, Ryan P Adams, Thomas L Griffiths. 2023. Learning to Learn Functions. Journal: Cognitive Science
- Feng Xia, Jianqiao Zhu, Tom Griffiths. 2023. Comparing Human Predictions from Expert Advice to On-line Optimization Algorithms. Proceedings of the Annual Meeting of the Cognitive Science Society
- Sunayana Rane, Mira L Nencheva, Zeyu Wang, Casey Lew-Williams, Olga Russakovsky, Tom Griffiths. 2023. Predicting word learning in children from the performance of computer vision systems. Proceedings of the Annual Meeting of the Cognitive Science Society
3. Other topics
3.1. Adversarial training and testing
- Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell. 2023. Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game. arXiv:2311.01011
- Tony Tong Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell. 2023. Adversarial Policies Beat Superhuman Go AIs. arXiv:2211.00241
- Luke Bailey, Euan Ong, Stuart Russell, and Scott Emmons. 2023. Image Hijacks: Adversarial Images can Control Generative Models at Runtime. arXiv:2309.00236
- S Casper, K Hariharan, D Hadfield-Menell. 2022. Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks. NeurIPS ML Safety Workshop
- S Casper, D Hadfield-Menell, G Kreiman. 2022. White-Box Adversarial Policies in Deep Reinforcement Learning. arXiv preprint arXiv:2209.02167
- Tony Wang, Adam Gleave, Nora Belrose, Tom Tseng, Joseph Miller, Michael Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell. 2022. Adversarial Policies Beat Professional-Level Go AIs. arXiv.
- Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa . 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Transactions on Graphics
- Cassidy Laidlaw, Sahil Singla, Soheil Feizi. 2021. Perceptual Adversarial Robustness: Defense Against Unseen Threat Models. ICLR 2021
- Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell. 2020. Adversarial Policies: Attacking Deep Reinforcement Learning. ICLR 2020
- Albert Zhan, Stas Tiomkin, Pieter Abbeel. 2020. Preventing Imitation Learning with Adversarial Policy Ensembles. ICLR 2020
- Marc Khoury, Dylan Hadfield-Menell. 2020. On the Geometry of Adversarial Examples. (Preprint)
- Xintong Wang, Michael P Wellman. 2020. Market Manipulation: An Adversarial Learning Framework for Detection and Evasion. 29th International Joint Conference on Artificial Intelligence
- Marc Khoury, Dylan Hadfield-Menell. 2019. Adversarial Training with Voronoi Constraints. (Preprint)
- Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song. 2019. Natural Adversarial Examples. CVPR 2021
3.2. AI capabilities, uncategorized
- Joey Hong, Anca Dragan, Sergey Levine. 2023. Offline RL with Observation Histories: Analyzing and Improving Sample Complexity. arXiv:2310.20663
- Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran. 2023. Combining Behaviors with the Successor Features Keyboard. arXiv:2310.15940
- Vint Lee, Pieter Abbeel, Youngwoon Lee. 2023. DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing. arXiv:2311.01450
- Hao Liu, Pieter Abbeel . 2023. Blockwise Parallel Transformers for Large Context Models. 37th Conference on Neural Information Processing Systems
- Carmelo Sferrazza, Younggyo Seo, Hao Liu, Youngwoon Lee, Pieter Abbeel. 2023. The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning. arXiv:2311.00924
- Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk. 2023. Scalable Diffusion for Materials Generation. NeurIPS 2023 AI for Science Workshop
- Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik. 2023. Interactive Task Planning with Language Models. arXiv:2310.10645
- Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson. 2023. Video Language Planning. arXiv:2310.10625
- Hao Liu, Matei Zaharia, Pieter Abbeel. 2023. Exploration with Principles for Diverse AI Supervision. arXiv:2310.08899
- Weirui Ye, Yunsheng Zhang, Mengchen Wang, Shengjie Wang, Xianfan Gu, Pieter Abbeel, Yang Gao. 2023. Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance. arXiv:2310.02635
- Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel. 2023. Temporally Consistent Video Transformer for Long-Term Video Prediction. arXiv:2210.02396
- Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao. 2023. Become a Proficient Player with Limited Data through Watching Pure Videos. The 11 th International Conference on Learning Representations
- Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, Pieter Abbeel. 2023. Learning Interactive Real-World Simulators. arXiv:2310.06114
- Lingjun Zhao, Khanh Nguyen, and Hal Daumé III. 2023. Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models. In ACL Findings, 2023
- R Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L Griffiths. 2023. Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve. arXiv:2309.13638
- Theodore Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L Griffiths. 2023. Cognitive architectures for language agents. arXiv:2309.02427
- Rachit Dubey, Matthew Hardy, Tom Griffiths, Rahul Bhui. 2023. AI-generated visuals of car-free American cities help increase support for sustainable transport policies. PsyArXiv - Preprint
- Ilia Sucholutsky, Ruairidh M Battleday, Katherine M Collins, Raja Marjieh, Joshua Peterson, Pulkit Singh, Umang Bhatt, Nori Jacoby, Adrian Weller, Thomas L Griffiths. 2023. On the Informativeness of Supervision Signals. Conference Uncertainty in Artificial Intelligence
- Bhishma Dedhia, Michael Chang, Jake C Snell, Thomas L Griffiths, Niraj K Jha. 2023. Im-Promptu: In-Context Composition from Image Prompts. arXiv:2305.17262
- Minkyu Shin, Jin Kim, Bas van Opheusden, Thomas L Griffiths. 2023. Superhuman artificial intelligence can improve human decision-making by increasing novelty. Proceedings of the National Academy of Sciences
- Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601
- Joshua Peterson, Marina Mancoridis, Tom Griffiths. 2023. To each their own theory: Exploring the limits of individual differences in decisions under risk. Proceedings of the Annual Meeting of the Cognitive Science Society
- Natalia Vélez, Brian Christian, Mathew Hardy, Bill D Thompson, Thomas L Griffiths. 2023. How do Humans Overcome Individual Computational Limitations by Working Together?. Journal Cognitive Science
- Krishnamurthy Dj Dvijotham, Shayegan Omidshafiei, Kimin Lee, Katherine M Collins, Deepak Ramachandran, Adrian Weller, Mohammad Ghavamzadeh, Milad Nasr, Ying Fan, Jeremiah Zhe Liu. 2023. Algorithms for Optimal Adaptation of Diffusion Models to Reward Functions. ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
- Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan. 2023. StyleDrop: Text-to-Image Generation in Any Style. arXiv:2306.00983
- Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan, Luke Harold Miles, Tao R Lin, Hjalmar Wijk, Joel Burget, Aaron Ho, Elizabeth Barnes, Paul Christiano. 2023. Evaluating Language-Model Agents on Realistic Autonomous Tasks. Alignment Research Center, Evaluations Team
- Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine. 2023. Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv:2304.10573
- Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian. 2023. Learning Personalized Story Evaluation. arXiv:2310.03304
- George Obaido, Friday Joseph Agbo, Christine Alvarado, Solomon Sunday Oyelere. 2023. Analysis of Attrition Studies Within the Computer Sciences. Journal IEEE Access
- Simphiwe M Simelane, Phumlani G Dlamini, Fadekemi J Osaye, George Obaido, Blessing Ogbukiri, Kehinde Aruleba, Cadavious M Jones, Chidozie W Chukwu, Oluwaseun F Egbelowo. 2023. Modeling the impact of public health education on tungiasis dynamics with saturated treatment: Insight through the Caputo fractional derivative. Journal; Mathematical Biosciences and Engineering
- M. Fishman, N. Kumar, C. Allen, N. Danas, M. Littman, S. Tellex, and G. Konidaris. 2023. Task Scoping: Generating Task-Specific Simplifications of Open-Scope Planning Problems. IJCAI Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning
- Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh. 2023. Diversifying AI: Towards Creative Chess with AlphaZero. arXiv:2308.09175
- David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh. 2023. A Definition of Continual Reinforcement Learning. arXiv:2307.11046
- Robert Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, Sebastian Flennerhag. 2023. Discovering evolution strategies via meta-black-box optimization. Proceedings of the Companion Conference on Genetic and Evolutionary Computation
- Jakob Bauer, Kate Baumli, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Satinder Singh, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei M Zhang. 2023. Human-Timescale Adaptation in an Open-Ended Task Space. In Proc. 40 th International Conference on Machine Learning
- Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani. 2023. Structured state space models for in-context reinforcement learning. arXiv:2303.03982
- Bernardo Avila Pires, Feryal Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh. 2023. Hierarchical Reinforcement Learning in Complex 3D Environments. arXiv:2302.14451
- Wilka Carvalho, Angelos Filos, Richard L Lewis, Satinder Singh. 2023. Composing task knowledge with modular successor feature approximators. arXiv:2301.12305
- Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh. 2023. Optimistic meta-gradients. arXiv:2301.03236
- Dieqiao Feng, Yuanqi Du, Carla P Gomes, Bart Selman. 2023. Weighted Sampling without Replacement for Deep Top-k Classification. In Proc. 40 th International Conference on Machine Learning
- Hao Liu, Matei Zaharia, Pieter Abbeel. 2023. Ring Attention with Blockwise Transformers for Near-Infinite Context. arXiv:2310.01889
- Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-Hui Liu. 2023. Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training. arXiv:2309.13942
- Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James. 2023. Language-Conditioned Path Planning. arXiv:2308.16893
- Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel. 2023. RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning. 7th Annual Conference on Robot Learning
- Hiroshi Yoshitake, Pieter Abbeel. 2023. The Impact of Overall Optimization on Warehouse Automation. arXiv:2308.06036
- Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb. 2023. Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects. arXiv:2308.00091
- Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath. 2023. Robust and versatile bipedal jumping control through reinforcement learning. Journal - Robotics: Science and Systems XIX
- Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel. 2023. SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks. arXiv:2307.03567
- David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum. 2023. Multi-environment pretraining enables transfer to action limited datasets. In Proc. 40 th International Conference on Machine Learning
- Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica. 2023. CLUTR: curriculum learning via unsupervised task representation learning. In Proc. 40 th International Conference on Machine Learning
- Joey Hejna, Pieter Abbeel, Lerrel Pinto. 2023. Improving long-horizon imitation through instruction prediction. In Proc. AAAI Conference on Artificial Intelligence
- Xinran Liang, Anthony Han, Wilson Yan, Aditi Raghunathan, Pieter Abbeel. 2023. ALP: Action-Aware Embodied Learning for Perception. arXiv:2306.10190
- Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel. 2023. Temporally Consistent Transformers for Video Generation. In Proc. 40 th International Conference on Machine Learning
- Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B Tenenbaum, Pieter Abbeel. 2023. Probabilistic Adaptation of Text-to-Video Models. arXiv:2306.01872
- Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta. 2023. Train Offline, Test Online: A Real Robot Learning Benchmark. arXiv:2306.00942
- Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo. 2023. Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration. arXiv:2305.19476
- Hao Liu, Pieter Abbeel. 2023. Blockwise Parallel Transformer for Long Context Large Models. arXiv:2305.19370
- Hao Liu, Pieter Abbeel. 2023. Emergent agentic transformer from chain of hindsight experience. arXiv:2305.16554
- Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song. 2023. The False Promise of Imitating Proprietary LLMs. arXiv:2305.15717
- YuXuan Liu, Pieter Abbeel. 2023. Perception for Real-World Robotic Applications. Technical Report No. UCB/EECS-2023-122
- YuXuan Liu, Xi Chen, Pieter Abbeel. 2023. Self-Supervised Instance Segmentation by Grasping. arXiv:2305.06305
- Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran. 2023. Masked trajectory models for prediction, representation, and control. arXiv:2305.02968
- YuXuan Liu, Nikhil Mishra, Pieter Abbeel, Xi Chen. 2023. Distributional Instance Segmentation: Modeling Uncertainty and High Confidence Predictions with Latent-MaskRCNN. arXiv:2305.01910
- Kevin Zakka, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel. 2023. RoboPianist: A Benchmark for High-Dimensional Robot Control. arXiv:2304.04150
- Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier. 2023. Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?. arXiv:2303.18240
- Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans. 2023. Foundation models for decision making: Problems, methods, and opportunities. arXiv:2303.04129
- Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath. 2023. Robust and versatile bipedal jumping control through multi-task reinforcement learning. arXiv:2302.09450
- Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas. 2023. Guiding pretraining in reinforcement learning with large language models. arXiv:2302.06692
- Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel. 2023. Controllability-Aware Unsupervised Skill Discovery. arXiv:2302.05103
- Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E Gonzalez. 2023. The Wisdom of Hindsight Makes Language Models Better Instruction Followers. arXiv:2302.05206
- Hao Liu, Carmelo Sferrazza, Pieter Abbeel. 2023. Chain of Hindsight Aligns Language Models with Feedback. arXiv:2302.02676
- Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel. 2023. Multi-view masked world models for visual robotic manipulation. arXiv:2302.02408
- Hao Liu, Wilson Yan, Pieter Abbeel. 2023. Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment. arXiv:2302.00902
- Yilun Dai, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Josh Tenenbaum, Dale Schuurmans, Pieter Abbeel. 2023. Learning universal policies via text-guided video generation. arXiv:2302.00111
- Xinyang Geng, Arnav Gudibande, Hao Liu, Eric Wallace, Pieter Abbeel, Sergey Levine, Dawn Song. 2023. Koala: A Dialogue Model for Academic Research. BAIR
- Ajay Jain, Amber Xie, Pieter Abbeel. 2023. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan. 2023. Learning to model the world with language. arXiv:2308.01399
- Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine. 2023. Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control. arXiv:2307.00117
- Yi Liu, Andreea Bobu, Anca Dragan. 2023. Similarity-Based Representation Learning. Technical Report No. UCB/EECS-2023-78
- Andreea Bobu, Yi Liu, Rohin Shah, Daniel S Brown, Anca D Dragan. 2023. SIRL: Similarity-based Implicit Representation Learning. In Proc. 2023 ACM/IEEE International Conference on Human-Robot Interaction
- Dylan Cope, Justin Svegliato, Stuart Russell. 2023. Learning to Plan with Tree Search via Deep RL. PRL Workshop Series Bridging the Gap Between AI Planning and Reinforcement Learning
- M. Carroll, O. Paradise, J. Lin, R. Georgescu, M. Sun, D. Bignell, S. Milani, K. Hofmann, M. Hausknecht, A.D. Dragan, S. Devlin. 2022. Uni[MASK]: Unified Inference in Sequential Decision Problems. . Conference on Neural Information Processing Systems (NeurIPS), 2022
- Kourosh Hakhamaneshi, Marcel Nassar, Mariano Phielipp, Pieter Abbeel, Vladimir StojanoviÄ. 2022. Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design. Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2022
- John So*, Amber Xie*, Jeffrey Edlund, Rohan Thakker, Sunggoo Jung, Ali-akbar Agha-mohammadi, Pieter Abbeel, Stephen James. 2022. Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data. Conference on Robot Learning (CoRL)
- Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell.. 2022. Real-World Robot Learning with Masked Visual Pre-training. Conference on Robot Learning (CoRL)
- Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel. 2022. Masked World Models for Visual Control. Conference on Robot Learning (CoRL)
- Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel. 2022. DayDreamer: World Models for Physical Robot Learning. Conference on Robot Learning (CoRL)
- Fangchen Liu, Hao Liu, Aditya Grover, Pieter Abbeel. 2022. Masked Autoencoding for Scalable and Generalizable Decision Making. Neural Information Processing Systems (NeurIPS), 2022
- Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel. 2022. Director: Deep Hierarchical Planning from Pixels. Neural Information Processing Systems (NeurIPS), 2024
- Zhao Mandi, Pieter Abbeel, Stephen James. 2022. On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning. Neural Information Processing Systems (NeurIPS), 2025
- Mengjiao (Sherry) Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum. 2022. Chain of Thought Imitation with Procedure Cloning. Neural Information Processing Systems (NeurIPS), 2026
- Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel. 2022. CIC: Unsupervised Reinforcement Learning with Contrastive Intrinsic Control. Neural Information Processing Systems (NeurIPS), 2027
- Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-Hui Liu, Pieter Abbeel, Qi Dou. 2022. Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking. European Conference on Computer Vision (ECCV)
- Albert Zhan, Ruihan (Philip) Zhao, Lerrel Pinto, Pieter Abbeel, Misha Laskin. 2022. Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel. 2022. Autoregressive Latent Video Prediction with High-Fidelity Image Generator. IEEE International Conference in Image Processing
- Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox. 2022. Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks. International Conference on Machine Learning (ICML)
- Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel. 2022. Reinforcement Learning with Action-Free Pre-Training from Videos. International Conference on Machine Learning (ICML)
- Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch. 2022. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. International Conference on Machine Learning (ICML)
- Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole. 2022. Zero-Shot Text-Guided Object Generation with Dream Fields,. Conference on Computer Vision and Pattern Recognition (CVPR)
- Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch. 2022. Pretrained Transformers as Universal Computation Engines. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI)
- Jose A Barreiros, Artemis Xu, Sofya Pugach, Narahari Iyengar, Graeme Troxell, Alexander Cornwell, Samantha Hong, Bart Selman, Robert F Shepherd. 2022. Haptic perception using optoelectronic robotic flesh for embodied artificially intelligent agents. Science Robotics
- D Feng, CP Gomes, B Selman. 2022. Graph Value Iteration.
- D Feng, C Gomes, B Selman. 2022. Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning.
- Zeyu Zheng, Risto Vuorio, Richard Lewis, and Satinder Singh. 2022. Pairwise Weights for Temporal Credit Assignment. 36th AAAI Conference on Artificial Intelligence
- S&C Chang, M., Griffiths, T. L., & Levine, S. . 2022. Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation. Advances in Neural Information Processing Systems 36.
- Dasgupta, I., Grant, E., & Griffiths, T. L. 2022. Distinguishing rule- and exemplar-based generalization in learning systems. Proceedings of the International Conference on Machine Learning
- ID Mienye, G Obaido, K Aruleba, OA Dada. 2022. Enhanced prediction of chronic kidney disease using feature selection and boosted classifiers. Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021
- G Obaido, K Aruleba, OA Dada, ID Mienye. 2022. Mining Frequently Traveled Routes During COVID-19. Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021
- G Obaido. 2022. PhD thesis: SQL Comprehension and Synthesis. arXiv preprint arXiv:2203.03469
- E Esenogho, ID Mienye, TG Swart, K Aruleba, G Obaido. 2022. A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10, 16400-16407
- GO K Aruleba, OA Dada, I Mienye. 2022. Demography of Machine Learning Education Within the K12. Innovations in Bio-Inspired Computing and Applications: Proceedings of the 12th International Conference on Innovations in Bio-Inspired Computing and Applications (IBICA 2021) Held During December 16–18, 2021
- Hao Liu, Lisa Lee, Kimin Lee, Pieter Abbeel. 2022. Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models. Arxiv preprint 2022
- Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao. 2022. Optimal conservative offline RL with general function approximation via augmented Lagrangian. International Conference on Learning Representations (ICLR) 2023
- P Rashidinejad, B Zhu, C Ma, J Jiao, S Russell. 2022. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism. IEEE Transactions on Information Theory 68 (12), 8156-8196
- P Rashidinejad. 2022. Reliable Prediction and Decision-Making in Sequential Environments. University of California, Berkeley
- Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks. 2022. Forecasting Future World Events with Neural Networks. NeurIPS 2022
- Suneel Belkhale, Dorsa Sadigh. 2022. PLATO: Predicting Latent Affordances Through Object-Centric Play. Proceedings of the 6th Conference on Robot Learning (CoRL), December 2022
- Andy Shih, Dorsa Sadigh, Stefano Ermon. 2022. Training and Inference on Any-Order Autoregressive Models the Right Way. Conference on Neural Information Processing Systems (NeurIPS), November 2022
- Mark Beliaev*, Andy Shih*, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani. 2022. Imitation Learning by Estimating Expertise of Demonstrators. 39th International Conference on Machine Learning (ICML), July 2022
- Zihan Wang*, Zhangjie Cao*, Yilun Hao, Dorsa Sadigh. 2022. Weakly Supervised Correspondence Learning. International Conference on Robotics and Automation (ICRA), May 2022
- MJ McDonald, D Hadfield-Menell. 2022. Guided imitation of task and motion planning. Conference on Robot Learning, 630-640
- Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah. 2022. An empirical investigation of representation learning for imitation. arXiv preprint arXiv:2205.07886
- E Jenner, M Weiler. 2022. Steerable Partial Differential Operators for Equivariant Neural Networks. ICLR
- C Lu, JG Kuba, A Letcher, L Metz, CS de Witt, J Foerster. 2022. Discovered policy optimisation. arXiv preprint arXiv:2210.05639
- J Grudzien, CAS De Witt, J Foerster. 2022. Mirror learning: A unifying framework of policy optimisation. International Conference on Machine Learning, 7825-7844
- Jessy Lin, Geza Kovacs, Aditya Shastry, Joern Wuebker, John DeNero. 2022. Automatic Correction of Human Translations. NAACL 2022
- Cem Anil*, Ashwini Pokle*, Kaiqu Liang*, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J. Zico Kolter, and Roger Grosse. 2022. Path Independent Equilibrium Models Can Better Exploit Test-Time Computation. NeurIPS 2022.
- N. Lauffer*, B. Yalcinkaya*, M. Vazquez-Chanlatte, A Shah, and S. Seshia. 2022. Learning Deterministic Finite Automata Decompositions from Examples and Demonstrations. FMCAD 2022.
- C. Neary, M. Cubuktepe, N. Lauffer, X. Jin, A. Phillips, Z. Xu, D. Tong, and U. Topcu.. 2022. Multiscale Heterogeneous Optimal Lockdown Control for COVID-19 Using Geographic Information. Scientific Reports 2022
- Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, and Sergey Levine. 2022. RvS: What is Essential for Offline RL via Supervised Learning?. International Conference on Learning Representations, 2022
- Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei*, Anjana Arunkumar*, Arjun Ashok*, Arut Selvan Dhanasekaran*, Atharva Naik*, David Stap*, Eshaan Pathak*, Giannis Karamanolakis*, Haizhi Gary Lai*, Ishan Purohit*, Ishani Mondal*, Jacob Anderson*, Kirby Kuznia*, Krima Doshi*, Maitreya Patel*, Kuntal Kumar Pal*, Mehrad Moradshahi*, Mihir Parmar*, Mirali Purohit*, Neeraj Varshney*, Phani Rohitha Kaza*, Pulkit Verma*, Ravsehaj Singh Puri*, Rushang Karia*, Shailaja Keyur Sampat*, Savan Doshi*, Siddharth Deepak Mishra*, Sujan Reddy*, Sumanta Patro*, Tanay Dixit*, Xudong Shen*, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, and Daniel Khashabi.. 2022. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ Tasks.. 2022 Conference on Empirical Methods in Natural Language Processing,
- Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph E. Gonzalez, and Stuart Russell. 2022. MADE: Exploration via Maximizing Deviation from Explored Regions. In Advances in Neural Information Processing Systems 34, 2022
- Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, and Stuart Russell. 2022. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism. In Advances in Neural Information Processing Systems 34, 2022
- Cynthia Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven Wang, Ping Luo, Stuart Russell, Pieter Abbeel, and Rohin Shah. 2022. An Empirical Investigation of Representation Learning for Imitation. In Advances in Neural Information Processing Systems 34, 2022
- Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos. 2022. Cross-Domain Imitation Learning via Optimal Transport. ICLR 2022
- Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown. 2022. Scalable Online Planning via Reinforcement Learning Fine-Tuning.
- Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball. 2021. CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review.
- Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset.
- George Matheos, Alexander K. Lew, Matin Ghavamizadeh, Stuart Russell, Marco Cusumano-Towner, Vikash K. Mansinghka. 2021. Transforming Worlds: Automated Involutive MCMC for Open-Universe Probabilistic Models. Proc. 3rd Symposium on Advances in Approximate Bayesian Inference (AABI)
- Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Michael Wellman, Yevgeniy Vorobeychik. 2021. A Game-Theoretic Approach for Hierarchical Policy-Making.
- Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown . 2021. Scalable Online Planning via Reinforcement Learning Fine-Tuning. NEurIPS 2021
- Hao Liu, Pieter Abbeel. 2021. Behavior From the Void: Unsupervised Active Pre-Training. NeurIPS 2021
- Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin. 2021. Decoupling Representation Learning from Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning
- Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee . 2021. State Entropy Maximization with Random Encoders for Efficient Exploration. ICML 2021
- Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives. 2021. MSA Transformer. bioRxiv
- Hao Liu, Pieter Abbeel. 2021. APS: Active Pretraining with Successor Features. ICML 2021
- Boyuan Chen, Pieter Abbeel, Deepak Pathak. 2021. Unsupervised Learning of Visual 3D Keypoints for Control. ICML 2021
- Ajay Jain, Matthew Tancik, Pieter Abbeel. 2021. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis.
- Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica. 2021. Contrastive Code Representation Learning.
- Seunghyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin. 2021. Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble. CoRL 2021
- Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin. 2021. Reinforcement Learning with Latent Flow.
- Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel. 2021. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings.
- Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. 2021. Decision Transformer: Reinforcement Learning via Sequence Modeling.
- Charles Packer, Pieter Abbeel, Joseph E. Gonzalez . 2021. Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL. NeurIPS 2021
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao. 2021. Mastering Atari Games with Limited Data. NeurIPS 2021
- Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel. 2021. URLB: Unsupervised Reinforcement Learning Benchmark.
- Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch. 2021. Pretrained Transformers as Universal Computation Engines.
- Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia. 2021. Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments.
- Ellis Ratner; Andrea Bajcsy; Terrence Fong; Claire J. Tomlin; Anca D. Dragan. 2021. Efficient Dynamics Estimation With Adaptive Model Sets. IEEE Robotics and Automation Letters
- Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh. 2021. Discovery of Options via Meta-Learned Subgoals.
- Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh. 2021. Learning State Representations from Random Deep Action-Conditional Predictions. NeurIPS 2021
- Jonathan Stray. 2021. Making Algorithms Work for Reporting.
- Nemanja Djuric, Henggang Cui, Zhaoen Su, Shangxuan Wu, Huahua Wang, Fang-Chieh Chou, Luisa San Martin, Song Feng, Rui Hu, Yang Xu, Alyssa Dayan, Sidney Zhang, Brian C Becker, Gregory P Meyer, Carlos Vallespi-Gonzalez, Carl K Wellington. 2021. Multixnet: Multiclass multistage multimodal motion prediction.
- Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine. 2021. Explore and Control with Adversarial Surprise.
- Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt. 2021. Measuring Coding Challenge Competence With APPS. NeurIPS 2021
- Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel. 2021. Replay-Guided Adversarial Environment Design. NeurIPS 2021
- Abhinav Bhatia, Justin Svegliato, Shlomo Zilberstein. 2021. On the benefits of randomly adjusting anytime weighted A*.
- Shane Parr, Ishan Khatri, Justin Svegliato, Shlomo Zilberstein. 2021. Agent-aware state estimation for autonomous vehicles.
- Connor Basich, Justin Svegliato, Allyson Beach, Kyle H. Wray, Stefan Witwicki, Shlomo Zilberstein. 2021. Improving Competence via Iterative State Space Refinement. IROS 2021
- Abhinav Bhatia, Justin Svegliato, Shlomo Zilberstein. 2021. Tuning the hyperparameters of anytime planning: A deep reinforcement learning approach.
- Hankook Lee, Kibok Lee, Kimin Lee, Honglak Lee, Jinwoo Shin. 2021. Improving Transferability of Representations via Augmentation-Aware Self-Supervision. NeurIPS 2021
- Paria Rashidinejad, Xiao Hu, Stuart Russell. 2020. Patient-adaptable intracranial pressure morphology analysis using a probabilistic model-based approach. Physiological Measurement
- Sam Toyer, Felipe Trevizan, Sylvie Thiebaux, Lexing Xie. 2020. ASNets: Deep Learning for Generalised Planning. JAIR
- Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. 2020. Measuring Massive Multitask Language Understanding. ICLR 2021
- Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine. 2020. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. NeurIPS 2020
- Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak. 2020. Sparse Graphical Memory for Robust Planning. NeurIPS 2020
- Thomas Krendl Gilbert, Andrew Loveridge. 2020. Subjectifying objectivity: Delineating tastes in theoretical quantum gravity research. Social Studies of Science
- Oliver Richardson, Joseph Y Halpern. 2020. Probabilistic Dependency Graphs. AAAI 2021
- Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh. 2020. How Should an Agent Practice?. AAAI-2020
- Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van Hasselt, David Silver, Satinder Singh. 2020. What Can Learned Intrinsic Rewards Capture?. ICML
- IEEE Transactions on Robotics. 2019. Bayesian Relational Memory for Semantic Visual Navigation. ICCV 2019
- Prasad Tadepall, Cameron Barrie, Stuart J. Russell. 2019. Learning Causal Trees with Latent Variables via Controlled Experimentation. AAAI 2019
- Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee. 2018. Self-Imitation Learning. ICML 2018
- Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel. 2018. Learning Plannable Representations with Causal InfoGAN. ICML 2018 Workshop on Planning and Learning
- Vivek Veeriah, Junhyuk Oh, Satinder Singh. 2018. Many-Goals Reinforcement Learning. (Preprint)
- Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, Stuart Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. ICML 2018
- Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daumé III, Philip Resnik. 2018. Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings. Workshop on Computational Linguistics and Clinical Psychology 2018
- Paul Krueger, Falk Lieder, Tom Griffiths. 2017. Enhancing metacognitive reinforcement learning using reward structures and feedback. CogSci 2017
3.3. Cognitive science, uncategorized
- Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna. 2023. Improving Interpersonal Communication by Simulating Audiences with Language Models. arXiv:2311.00687
- Kerem Oktar, Ilia Sucholutsky, Tania Lombrozo, Thomas L. Griffiths. 2023. Dimensions of Disagreement: Unpacking Divergence and Misalignment in Cognitive Science and Artificial Intelligence. arXiv:2310.12994
- Qihong Lu, Tan T Nguyen, Uri Hasson, Thomas L Griffiths, Jeffrey M Zacks, Samuel J Gershman, Kenneth A Norman. 2023. Toward a More Neurally Plausible Neural Network Model of Latent Cause Inference. Conference on Cognitive Computational Neuroscience
- Mathew D. Hardy, Bill D. Thompson, P. M. Krafft & Thomas L. Griffiths. 2023. Resampling reduces bias amplification in experimental social networks. Nature Human Behaviour Journal
- Casey Lewry, Deborah Kelemen, Tania Lombrozo. 2023. The moral consequences of teleological beliefs about the human species. Journal of Experimental Psychology: General
- Kerem Oktar, Adam Lerner, Maya Malaviya, Tania Lombrozo. 2023. Philosophy instruction changes views on moral controversies by decreasing reliance on intuition. Journal Cognition
- Scientific and Religious Explanations, Together and Apart. 2023. Scientific and Religious Explanations, Together and Apart. Conjunctive Explanations
- Neil Van Leeuwen, Tania Lombrozo. 2023. The Puzzle of Belief. Journal Cognitive science
- Casey Lewry, Sana Asifriyaz, Tania Lombrozo. 2023. Intuitive theories of moral progress. Proceedings of the Annual Meeting of the Cognitive Science Society
- Kerem Oktar, Tania Lombrozo. 2023. Ideological Differences in Paths to Persistence. Proceedings of the Annual Meeting of the Cognitive Science Society
- Erik Brockbank, Tania Lombrozo, Alison Gopnik, Caren M Walker. 2023. Ask me why, don’t tell me why: Asking children for explanations facilitates relational thinking. Journal Developmental science
- Daphna Buchsbaum, Rebekah Gelpi, A Whalen, Thomas L Griffiths, Fei Xu. 2023. Can Children Balance Majority Size with Information Quality in Learning About Preferences?. PsyArXiv - Preprint
- Abdullah Almaatouq, Thomas L Griffiths, Jordan Suchow, Mark E Whiting, James Evans, Duncan J Watts. 2023. Replies to commentaries on Beyond Playing 20 Questions with Nature. PsyArXiv - Preprint
- Stefan Uddenberg, Bill D Thompson, Madalina Vlasceanu, Thomas L Griffiths, Alexander Todorov. 2023. Iterated learning reveals stereotypes of facial trustworthiness that propagate in the absence of evidence. Journal Cognition
- Raja Marjieh, Nori Jacoby, Joshua C Peterson, Thomas L Griffiths. 2023. The Universal Law of Generalization Holds for Naturalistic Stimuli. arXiv:2306.08564
- R Thomas McCoy, Thomas L Griffiths. 2023. Modeling rapid language learning by distilling Bayesian priors into artificial neural networks. arXiv:2305.14701
- Mayank Agrawal, Joshua C Peterson, Jonathan D Cohen, Thomas L Griffiths. 2023. Stress, intertemporal choice, and mitigation behavior during the COVID-19 pandemic. Journal of Experimental Psychology: General
- Aditi Jha, Joshua C. Peterson, Thomas L. Griffiths. 2023. Extracting low‐dimensional psychological representations from convolutional neural networks. Journal: Cognitive Science
- Theodore R Sumers, Mark K Ho, Robert D Hawkins, Thomas L Griffiths. 2023. Show or Tell? Exploring when (and why) teaching with language outperforms demonstration. Journal Cognition
- Raja Marjieh, Ilia Sucholutsky, Pol van Rijn, Nori Jacoby, Thomas L Griffiths. 2023. What language reveals about perception: Distilling psychophysical knowledge from large language models. arXiv:2302.01308
- Ilia Sucholutsky, Thomas L Griffiths. 2023. Alignment with human representations supports robust few-shot learning. arXiv:2301.11990
- Thomas L Griffiths, Sreejan Kumar, R Thomas McCoy. 2023. On the hazards of relating representations and inductive biases. Journal Behavioral and Brain Sciences
- Mathew Hardy, Ilia Sucholutsky, Bill Thompson, Tom Griffiths. 2023. Large language models meet cognitive science: Llms as tools, models, and participants. Proceedings of the Annual Meeting of the Cognitive Science Society
- Raja Marjieh, Thomas L Griffiths, Nori Jacoby. 2023. Musical pitch has multiple psychological geometries. Journal bioRxiv
- Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh. 2023. Toward Grounded Social Reasoning. arXiv:2306.08651
- EG Liquin, T Lombrozo. 2022. Motivated to learn: An account of explanatory satisfaction. Cognitive Psychology 132
- E Foster-Hanson, T Lombrozo. 2022. How “is” shapes “ought” for folk-biological concepts. Cognitive Psychology 139
- N Vasil, A Ruggeri, T Lombrozo. 2022. When and how children use explanations to guide generalizations. Cognitive Development 61, 101144
- R Dubey, TL Griffiths, T Lombrozo. 2022. If it’s important, then I’m curious: Increasing perceived usefulness stimulates curiosity. Cognition 226
- TH Vrantsidis, T Lombrozo. 2022. Simplicity as a Cue to Probability: multiple roles for Simplicity in Evaluating Explanations. Cognitive science 46
- T Davoodi, T Lombrozo. 2022. Varieties of ignorance: Mystery and the unknown in science and religion. Cognitive science 46 (4), e13129
- E Foster-Hanson, T Lombrozo. 2022. What are men and mothers for? The causes and consequences of functional reasoning about social categories. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
- C Lewry, T Lombrozo. 2022. Ethical Explanations. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
- K Oktar, T Lombrozo. 2022. Mechanisms of Belief Persistence in the Face of Societal Disagreement. Proceedings of the Annual Meeting of the Cognitive Science Society 44 (44)
- Agrawal, M., Peterson, J. C., Cohen, J. D., & Griffiths, T. L. 2022. Stress, Intertemporal Choice, and Mitigation Behavior During the COVID-19 Pandemic. PsyArXiv Preprints
- Callaway, F., Jain, Y. R., van Opheusden, B., Das, P., Iwama, G., Gul, S., Krueger, P. M., Becker, F., Griffiths, T. L., & Lieder, F.. 2022. Leveraging artificial intelligence to improve people’s planning strategies. Proceedings of the National Academy of Sciences.
- Dubey, R., Griffiths, T. L., & Lombrozo, T. 2022. If it’s important, then I’m curious: Increasing perceived usefulness stimulates curiosity. Cognition, 226, 105193
- Gates, V., Suchow, J. W., & Griffiths, T. L. 2022. Memory transmission in small groups and large networks: An empirical study. Psychonomic Bulletin & Review, 29(2), 581-588
- Ho, M. K., & Griffiths, T. L.. 2022. Cognitive science as a source of forward and inverse models of human decisions for robotics and control. Annual Review of Control, Robotics, and Autonomous Systems, 5, 33-53.
- DMRL Kumar, S., Correa, C. G., Dasgupta, I., Marjieh, R., Hu, M. Y., Hawkins, R.D., Daw, N. D., Cohen, J. D., Narasimhan, K. R., & Griffiths, T. L.. 2022. Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines.. Advances in Neural Information Processing Systems, 36
- Kumar, S., Dasgupta, I., Hu, M. Y., Marjieh, R., Hawkins, R. D., Daw, N., Cohen, J., Narasimhan, K. R., & Griffiths, T. L.. 2022. Using Natural Language to Guide Meta-Learning Agents towards Human-like Inductive Biases. BACL 1st Workshop on Learning with Natural Language Supervision
- Kumar, S., Dasgupta, I., Marjieh, R., Daw, N. D., Cohen, J. D., & Griffiths, T. L.. 2022. Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning..
- Kumar, S., Sumers, T. R., Yamakoshi, T., Goldstein, A., Hasson, U., Norman, K. A., Griffiths, T. L., Hawkins, R. D., Nastase, S. A.. 2022. Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model.
- Malaviya, M., Sucholutsky, I., Oktar, K., & Griffiths, T. L. 2022. Can Humans Do Less-Than-One-Shot Learning?. Proceedings of the 44th Annual Conference of the Cognitive Science Society
- Marjieh, R., Sucholutsky, I., Sumers, T. R., Jacoby, N., & Griffiths, T. L. 2022. Predicting Human Similarity Judgments Using Large Language Models.. Proceedings of the 44th Annual Conference of the Cognitive Science Society.
- Morgan, T. J., Suchow, J. W., & Griffiths, T. L.. 2022. The experimental evolution of human culture: flexibility, fidelity and environmental instability. Proceedings of the Royal Society B, 289(1986), 20221614
- SML Murthy, S. K., Hawkins, R. D., & Griffiths, T. L. . 2022. Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task.. Cognition, 225, 105152
- Peterson, J. C., Uddenberg, S., Griffiths, T. L., Todorov, A., & Suchow, J. W. . 2022. Deep models of superficial face judgments. Proceedings of the National Academy of Sciences, 119(17), e2115228119.
- CEIL Thompson, B., van Opheusden, B., Sumers, T., & Griffiths, T. L. . 2022. Complex cognitive algorithms preserved by selective social learning in experimental populations. Science, 376(6588), 95-98
- J Persons, V Gates. 2022. Relationship to CBT outcome and dropout of decision support tools of the written case formulation, list of treatment goals, and plot of symptom scores. PsyArXiv
- George Matheos, Andrew D. Bolton, McCoy Becker, Cameron Freer, Vikash K. Mansinghka. 2022. Brain computation as fast spiking neural Monte Carlo inference in probabilistic programs. MIT Quest for Intelligence
- Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh. 2021. Reinforcement Learning of Implicit and Explicit Control Flow Instructions.
- Thomas A. Langlois, H. Charles Zhao, Erin Grantd, Ishita Dasguptae, Thomas L. Griffiths, and Nori Jacoby. 2021. Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity. NeurIPS 2021
- Stephan C. Meylan, Sathvik Nair, Thomas L. Griffiths. 2021. Evaluating models of robust word recognition with serial reproduction. Cognition 2021
- Casey Lewry, Kaley Curtis, Nadya Vasilyeva, Fei Xu, Thomas L. Griffiths. 2021. Intuitions about magic track the development of intuitive physics. Cognition 2021
- Arjun Devraj, Qiong Zhang, Thomas L. Griffiths. 2021. The Dynamics of Exemplar and Prototype Representations Depend on Environmental Statistics.
- Ni Ji, Gurrein K Madan, Guadalupe I Fabre, Alyssa Dayan, Casey M Baker, Talya S Kramer, Ijeoma Nwabudike, Steven W Flavell. 2021. A neural circuit for flexible control of persistent behavioral states. eLife 2021
- Aditi Jha, Joshua Peterson, Thomas L. Griffiths. 2020. Extracting low-dimensional psychological representations from convolutional neural networks. CogSci 2020
- Alexander Todorov, Stefan Uddenberg, Joshua Peterson, Thomas Griffiths, Jordan Suchow. 2020. Data-Driven, Photorealistic Social Face-Trait Encoding, Prediction, and Manipulation Using Deep Neural Networks. Patent application
- Antonia Langenhoff, Alex Wiegmann, Joseph Y. Halpern, Joshua B. Tenenbaum, Tobias Gerstenberg. 2020. Predicting responsibility judgments from dispositional inferences and causal attributions. (Preprint)
- Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths. 2020. Scaling up psychology via Scientific Regret Minimization. PNAS 2020
- R. Dubey, T. L. Griffiths. 2020. Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455–476
- Sophia Sanborn, Michael Chang, Sergey Levine, Thomas Griffiths. 2020. Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes. ICLR 2020 submission
- Thomas J. H. Morgan, Jordan W. Suchow, Thomas L. Griffiths. 2020. What the Baldwin Effect affects depends on the nature of plasticity. Cognition, 197
- Max Kleiman-Weiner, Felix Sosa, Bill Thompson, Sebastiaan van Opheusden, Tom Griffiths, Samuel Gershman, Fiery Cushman. 2020. Downloading Culture.zip: Social learning by program induction. CogSci 2020
- Anne S. Hsu, Jay B. Martin, Adam N. Sanborn, Thomas L. Griffiths. 2019. Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods 51:1706–1716
- Mathew Hardy, Tom Griffiths. 2019. Demonstrating the Impact of Prior Knowledge in Risky Choice. (Preprint)
- Arnon Lotem, Joseph Y. Halpern, Shimon Edelman, Oren Kolodny. 2017. The evolution of cognitive mechanisms in response to cultural innovations. PNAS
- David Bourgin, Falk Lieder, Daniel Reichman, Nimrod Talmon, Tom Griffiths. 2017. The Structure of Goal Systems Predicts Human Performance. CogSci 2017
3.4. Ethics for AI and AI development
- Alistair Knott, Dino Pedreschi, Raja Chatila, Tapabrata Chakraborti, Susan Leavy, Ricardo Baeza-Yates, David Eyers, Andrew Trotman, Paul D. Teal, Przemyslaw Biecek, Stuart Russell, Yoshua Bengio. 2023. Generative AI models should include detection mechanisms as a condition for public release. Ethics and Information Technology Journal
- Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj. 2023. Harms from Increasingly Agentic Algorithmic Systems. Journal FAccT 2023
- Samer B Nashed, Justin Svegliato, Su Linn Blodgett. 2023. Fairness and sequential decision making: Limits, lessons, and opportunities. arXiv:2301.05753
- Chinasa T Okolo, Kehinde Aruleba, George Obaido. 2023. Responsible AI in Africa—Challenges and Opportunities. Springer International Publishing
- Jonathan Stray. 2023. Editorial Values for News Recommenders: Translating Principles to Engineering. News Quality in the Digital Age
- Jonathan Stray. 2023. The AI Learns to Lie to Please You: Preventing Biased Feedback Loops in Machine-Assisted Intelligence Analysis. Analytics 2023
- Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, Anca D. Dragan. 2023. Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media. arXiv:2305.16941
- Alistair Knott, Dino Pedreschi, Raja Chatila, Susan Leavy, Ricardo Baeza-Yates, Tapabrata Chakraborti, David Eyers, Andrew Trotman, Lama Saouma, Virginia Morini, Valentina Pansanella, Paul D. Teal, Przemyslaw Biecek, Ivan Bratko, Stuart Russell, and Yoshua Bengio. 2023. State-of-the-art Foundation AI Models Should be Accompanied by Detection Mechanisms as a Condition of Public Release. Global Partnership on Artificial Intelligence
- RJ Yew, D Hadfield-Menell. 2022. A Penalty Default Approach to Preemptive Harm Disclosure and Mitigation for AI Systems. Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 823-830
- Thomas Krendl Gilbert, Micah Carroll. 2022. Trade Regulation Rule on Commercial Surveillance and Data Security Rulemaking. Federal Trade Commission
- The Anh Han, Tom Lenaerts, Francisco C.Santos, Luís Moniz Pereira. 2022. Voluntary safety commitments provide an escape from over-regulation in AI development. Technology in Society Volume 68
- Theodor Cimpeanu, Francisco C. Santos, Luís Moniz Pereira, Tom Lenaerts, The Anh Han. 2022. Artificial intelligence development races in heterogeneous settings. Scientific Reports volume 12, Article number: 1723 (2022)
- The Anh Han, Tom Lenaerts, Francisco C. Santos, Luís Moniz Pereira,. 2022. Voluntary safety commitments provide an escape from over-regulation in AI development. Technology in Society, Volume 68
- Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert. 2022. Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems. UC Berkeley CLTC White Paper Series
- Thomas Krendl Gilbert. 2021. Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles. Simons Institute Newsletter
- Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung. 2020. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. (Preprint)
- Ravit Dotan, Smitha Milli. 2020. Value-laden Disciplinary Shifts in Machine Learning. (Preprint)
- Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt. 2020. Aligning AI With Shared Human Values. ICLR 2021
- John Miller, Smitha Milli, Moritz Hardt. 2019. Strategic Classification is Causal Modeling in Disguise. FAT* 2019
- McKane Andrus, Thomas Krendl Gilbert. 2019. Towards a Just Theory of Measurement: A Principled Social Measurement Assurance Program for Machine Learning. AIES 2019
- Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz. 2019. Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments. NeurIPS 2019
- Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt. 2019. The Social Cost of Strategic Classification. FAT* 2019
- Thomas Krendl Gilbert, Yonatan Mintz. 2019. Epistemic Therapy for Bias in Automated Decision-Making. AIES 2019
- Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli. 2018. A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. FAT/ML 2018
3.5. Robust inference, learning, and planning
- Michael Y Li, Erin Grant, Thomas L Griffiths. 2023. Gaussian process surrogate models for neural networks. Conference Uncertainty in Artificial Intelligence
- Zi Wang, Alexander Ku, Jason Baldridge, Thomas L Griffiths, Been Kim. 2023. Gaussian Process Probes (GPP) for Uncertainty-Aware Probing. arXiv:2305.18213
- Michael Chang, Alyssa L Dayan, Franziska Meier, Thomas L Griffiths, Sergey Levine, Amy Zhang. 2023. Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement. arXiv:2303.11373
- Michael K Cohen. 2023. Pessimistic Bayesianism for Conservative Optimization and Imitation. University of Oxford
- Gaurav Rohit Ghosal, Amrith Setlur, Daniel S Brown, Anca Dragan, Aditi Raghunathan. 2023. Contextual Reliability: When Different Features Matter in Different Contexts. In Proc. 40 th International Conference on Machine Learning
- Alexander Lew, George Matheos, Matin Ghavamizadeh, Nishad Gothoskar, Stuart Russell, and Vikash Mansinghka. 2023. SMCP3: Sequential Monte Carlo with Probabilistic Program Proposals. In Proc. Twenty-Sixth International Conference on Artificial Intelligence and Statistics
- Thomas Krendl Gilbert , Aaron J. Snoswell , Michael Dennis , Rowan McAllister , and Cathy Wu. 2022. Sociotechnical Specification for the Broader Impacts of Autonomous Vehicles. Fresh Perspectives on the Future of Autonomous Driving workshop, ICRA 2022
- YuXuan (Andrew) Liu, Nikhil Mishra, Maximilian Sieb, Fred Shentu, Pieter Abbeel, Peter Chen. 2022. Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction. European Conference on Computer Vision (ECCV)
- Kyle Wray*, Stas Tiomkin*, Mykel Korchenderfer, Pieter Abbeel. 2022. Multi-Objective Policy Gradients with Topological Constraints. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Qiyang (Colin) Li, Ajay Jain, Pieter Abbeel. 2022. AdaCat: Adaptive Categorical Discretization for Autoregressive Models. Conference on Uncertainty in Artificial Intelligence (UAI)
- Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia.. 2022. Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI)
- Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou. 2022. A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges. TMLR
- Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, WENXUAN PENG, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu. 2022. OpenOOD: Benchmarking Generalized Out-of-Distribution Detection. NeurIPS 2022
- Jiachen Sun, Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Dan Hendrycks, Jihun Hamm, Zhuoqing Mao. 2022. A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness. ECCV 2022
- Dan Hendrycks*, Andy Zou*, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt. 2022. PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures. CVPR 2022
- Dan Hendrycks*, Steven Basart*, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song. 2022. Scaling Out-of-Distribution Detection for Real-World Settings. ICML 2022
- KC Hsu, DP Nguyen, JF Fisac. 2022. ISAACS: Iterative Soft Adversarial Actor-Critic for Safety. arXiv preprint arXiv:2212.03228
- Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ameet Rahane, Anantharaman S Iyer, Anders Andreassen, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D Manning, Christopher Potts, Cindy Ramirez, Clara E Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615
- KC Hsu, AZ Ren, DP Nguyen, A Majumdar, JF Fisac. 2022. Sim-to-Lab-to-Real: Safe RL with Shielding and Generalization Guarantees. ICLR 2022 Workshop on Generalizable Policy Learning in Physical World
- Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine. 2022. RvS: What is Essential for Offline RL via Supervised Learning?. ICLR2022
- Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg. 2021. Policy Gradient Bayesian Robust Optimization for Imitation Learning. ICML 2021
- Justin Svegliato, Connor Basich, Sandhya Saisubramanian and Shlomo Zilberstein. 2021. Using metareasoning to maintain and restore safety for reliable autonomy.
- Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer. 2020. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.
- Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song. 2020. Pretrained Transformers Improve Out-of-Distribution Robustness. Association for Computational Linguistics (ACL)
- Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ICLR 2020
- Paria Rashidinejad, Jiantao Jiao, Stuart Russell. 2020. SLIP: Learning to predict in unknown dynamical systems with long-term memory.
- Dieqiao Feng, Carla P Gomes, Bart Selman. 2020. Solving hard AI planning instances using curriculum-driven deep reinforcement learning.
- Adam Stooke, Joshua Achiam, Pieter Abbeel. 2020. Responsive Safety in Reinforcement Learning by PID Lagrangian Methods. ICML 2020
- Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, Claire J. Tomlin. 2019. Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning. IEEE 2019
- Karthika Mohan, Judea Pearl. 2019. Graphical Models for Processing Missing Data. JASA
- Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan. 2019. Bayesian Robustness: A Nonasymptotic Viewpoint. (Preprint)
- Margaret P. Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M. Smith, Victoria Cheng, Jaime F. Fisac, Susmit Jha, Marco Pavone, Claire J. Tomlin. 2019. A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems. American Control Conference (ACC) 2019
- Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, Kun Zhang. 2019. Causal Discovery in the Presence of Missing Data. AISTATS 2019
- Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song. 2019. Scaling Out-of-Distribution Detection for Real-World Settings.
- Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, Jacob Steinhardt. 2019. Testing robustness against unforeseen adversaries.
- Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song. 2019. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty. NeurIPS 2019
- Dan Hendrycks, Kimin Lee, Mantas Mazeika. 2019. Using Pre-Training Can Improve Model Robustness and Uncertainty. ICML 2019
- Dan Hendrycks, Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. ICLR 2019
- Dan Hendrycks, Mantas Mazeika, Thomas Dietterich. 2019. Deep Anomaly Detection with Outlier Exposure. ICLR 2019
- Karthika Mohan. 2018. On Handling Self-masking and Other Hard Missing Data Problems. AAAI 2018
- Karthika Mohan, Felix Thoemmes, Judea Pearl. 2018. Estimation with Incomplete Data: The Linear Case. IJCAI 2018
- Si Liu, Risheek Garrepalli, Thomas G Dietterich, Alan Fern, Dan Hendrycks. 2018. Open Category Detection with PAC Guarantees. ICML 2018
- Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel. 2018. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise. NeurIPS 2018
3.6. Security problems and solutions
- A Critch. 2022. WordSig: QR streams enabling platform-independent self-identification that’s impossible to deepfake. arXiv preprint arXiv:2207.10806
- A Critch. 2022. WordSig: QR streams enabling platform-independent self-identification that’s impossible to deepfake. arXiv preprint arXiv:2207.10806
- Sushil Jajodia, George Cybenko, V. S. Subrahmanian, Vipin Swarup, Cliff Wang, Michael Wellman. 2020. Adaptive Autonomous Secure Cyber Systems. Springer/Nature Books
- Ivan Geffner, Joseph Y. Halpern. 2019. Security in Asynchronous Interactive Systems. (Preprint)
- Xinlei Pan, Weiyao Wang, Xiaoshuai Zhang, Bo Li, Jinfeng Yi, Dawn Song. 2019. How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning. AAMAS 2019
- Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael P. Wellman. 2018. SoK: Security and Privacy in Machine Learning. IEEE European Symposium on Security and Privacy
3.7. Transparency & interpretability
- Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid. 2023. Activation Addition: Steering Language Models Without Optimization. arXiv:2308.10248
- Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner. 2023. Understanding and Controlling a Maze-Solving Policy Network. arXiv:2310.08043
- Bilal Chughtai, Lawrence Chan, Neel Nanda. 2023. Neural Networks Learn Representation Theory: Reverse Engineering how Networks Perform Group Operations. ICLR 2023 Workshop on Physics for Machine Learning
- Bilal Chughtai, Lawrence Chan, Neel Nanda. 2023. A toy model of universality: Reverse engineering how networks learn group operations. arXiv:2302.03025
- Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, Jacob Steinhardt. 2023. Progress measures for grokking via mechanistic interpretability. ICLR 2023
- Jordan Boyd-Graber, Samuel Carton, Shi Feng, Q Vera Liao, Tania Lombrozo, Alison Smith-Renner, Chenhao Tan. 2022. Human-Centered Evaluation of Explanations. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts
- SML Yamakoshi, T., Griffiths, T.L., Hawkins, R.D.. 2022. Probing BERT’s priors with serial reproduction chains. Findings of the Association for Computational Linguistics (ACL)
- George Obaido, Blessing Ogbuokiri, Theo G Swart, Nimibofa Ayawei, Sydney Mambwe Kasongo, Kehinde Aruleba, Ibomoiye Domor Mienye, Idowu Aruleba, Williams Chukwu, Fadekemi Osaye, Oluwaseun F Egbelowo, Simelane Simphiwe, Ebenezer Esenogho. 2022. An interpretable machine learning approach for hepatitis b diagnosis. Applied Sciences 12 (21), 11127
- T Räukur, A Ho, S Casper, D Hadfield-Menell. 2022. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. arXiv preprint arXiv:2207.13243
- J Frost, O Watkins, E Weiner, P Abbeel, T Darrell, B Plummer, K Saenko . 2022. Explaining Reinforcement Learning Policies through Counterfactual Trajectories. arXiv preprint arXiv:2201.12462
- Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava.. 2022. Discovering User-Interpretable Capabilities of Black-Box Planning Agents.. the 19th International Conference on Principles of Knowledge Representation and Reasoning, 2022.
- Naman Shah*, Pulkit Verma*, Trevor Angle, and Siddharth Srivastava.. 2022. JEDAI: A System for Skill-Aligned Explainable Robot Planning.. the Twenty-First International Conference on Autonomous Agents and MultiAgent Systems (Demonstration Track), 2022
- Rashmeet Kaur Nayyar*, Pulkit Verma*, and Siddharth Srivastava.. 2022. Differential Assessment of Black-Box AI Agents.. the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022.
- Rashmeet Kaur Nayyar, Pulkit Verma, and Siddharth Srivastava. 2022. Differential Assessment of Black-Box AI Agents. AAAI2022
- Pulkit Verma, Shashank Rao Marpally, Siddharth Srivastava. 2022. Discovering User-Interpretable Capabilities of Black-Box Planning Agents. KR 2022
- Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2021. Clusterability in Neural Networks.
- Pulkit Verma, Shashank Rao Marpally, Siddharth Srivastava. 2021. Asking the Right Questions: Learning Interpretable Action Models Through Query Answering.
- Olivia Watkins, Sandy Huang, Julius Frost, Kush Bhatia, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko, Anca Dragan. 2021. Explaining robot policies.
- Jonathan Stray. 2021. Show me the algorithm: Transparency in recommendation systems.
- Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell. 2021. Detecting Modularity in Deep Neural Networks.
- Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell. 2019. Pruned Neural Networks are Surprisingly Modular. (Preprint, under review NeurIPS 2020)
- Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt. 2019. Model Reconstruction from Model Explanations. FAT* 2019
- Jacob Andreas, Anca Dragan, Dan Klein. 2017. Translating Neuralese. ACL 2017