Behavioral Insights from Human-AI Collaboration Within Hand-and-Brain Chess

Nov 13, 2025, 5:53 AM | 4

How do humans decide when to trust AI and when to take control? Kevin Yang, a Biopsychology major at UC Santa Barbara and U.S. Chess National Master, explores this question through the lens of hand-and-brain chess, a unique format where one teammate chooses a piece (the “brain”) and the other decides the move (the “hand”).

Yang’s project, Model of Human-AI Collaboration Applied to Hand-and-Brain Chess, earned him recognition as one of the 2025 Chessable Research Award winners. In this guest post, Yang sheds light on his experiment, how people weigh their confidence in AI, and how their trust evolves during a game. For the full article, visit https://doi.org/10.48550/arXiv.2509.20666

You’ve probably heard of AlphaZero, Leela, or Stockfish: all of these are chess engines, developed through neural networks or through incorporation of artificial intelligence (AI) to some degree. Players typically consult engines to prepare before matches or during post-game analyses to see what they might have missed during the game. But what if AI was your teammate during a hand-and-brain chess game? If your teammate was actually Stockfish, you would likely seek to give them more control over the moves being made. Additionally, there would be a huge discrepancy between your strength and Stockfish’s strength. In this scenario, the individual would likely delegate the responsibility of the brain to the AI almost every single time, relying on the fact that Stockfish has a much higher rating compared to the individual. Given the disparity in strength and overall understanding of chess, how do humans and AI coordinate decision-making as a team in any given environment?

The author of this article, NM Kevin Yang. Courtesy photo.

Current Literature in Human-AI Teaming

In the realm of human-AI interaction, individuals (users) can choose to delegate and allow the AI to act autonomously without user intervention or seek guidance from the AI while ultimately making the final decision (Adam et al., 2024). Current human-AI coordination strategies can be further classified into static configurations and agent-driven adaptations. Static configurations lack the flexibility to adapt to evolving decision-making and account for human biases (Chang et al., 2025; Ibrahim et al., 2025). Agent-driven adaptations involve the AI itself modifying its decision-making in response to user preferences (Amershi et al., 2019; Salikutluk et al., 2024). At times, these agent-driven adaptations may conflict with individual preferences (Duan et al., 2025). Human-AI Teaming has become a popular area of exploration due to the possibility of outperforming humans or AI alone under time-critical situations (Vaccaro et al., 2024). However, the notion of AI as a teammate is not incredibly straightforward, as an AI agent must be able to communicate its intent and simultaneously adapt to the needs of the human teammate to achieve a common goal (Zhao et al., 2022). Previous research also suggests that individuals perceive humans and AI teammates differently, which can lead to lack of trust in AI agents as teammates or placing greater faith in AI teammates when tasked with handling situations involving risky decisions (Feng et al., 2019).

Chess as a Toolbox for Human-AI Collaboration

In an attempt to eliminate the differing perception of human and AI teammates, one of the factors considered was the discrepancy in strength between the two entities on the same team, which could possibly sway an individual toward strongly choosing either hand or brain. Additionally, knowing that an AI and a human of the same strength may not think, train, or blunder in the same manner, making sure that both entities are somewhat compatible in this manner is

important in bridging coordination in decision-making between the human and its AI teammate so as to not induce distrust of one another.

Taking advantage of the hand-and-brain chess format and adapting it into a human-AI collaboration task, individuals that work with an AI teammate have two modes of control to choose from on every move. When selecting “brain,” participants exercised higher-level control by selecting a piece type, and the AI then executed a valid move for that piece type. Conversely, when selecting “hand,” the AI selected a piece type, and the participant exercised lower-level control by making a legal move with that piece type. Note that in a typical hand-and-brain format, the same individual keeps their role the same, whether brain or hand, from start to finish. The evolving nature of chess positions enable us to capture the fine-grained control preference shifts over the course of the game as factors like positional complexity, amount of time remaining, or strategic understanding may influence whether to choose “brain” or “hand” in a particular position.

In this study, we sought to address a few questions:

1) What contextual factors influence when participants decide to hand over control to the AI teammate? (Positional complexity, limited time, evaluation of the position, etc.)

2) How does the dynamic between the human and AI teammate influence the outcome of the game?

3) What kind of strategy do participants employ knowing that their teammate is AI and how does that influence their decision-making on various moves?

Through gaze data, we assessed whether players exhibit different gaze patterns before they change from “hand” one turn to “brain” on the next compared to when they do not. We hypothesized that gaze patterns would differ when participants switched control modes compared to when they did not, reflecting increased contemplation preceding a mode change. We also analyzed how switching influences the objective quality of the move played in the turn. In order to capture details needed to answer these questions, we created a custom Chrome extension integrated with the Chess.com interface, monitoring cursor positions or user interactions while notating the moves made. To ensure adherence with Chess.com’s Terms of Service and prevent any fair play violations, the extension was restricted to operate only on special research accounts provided by the platform.

Recruiting Teammates & Study Procedure

We used the Maia chess engine (McIlroy-Young et al., 2020) as the AI teammate for all the participants, playing at a fixed rating of 1500 ELO (which corresponds to an intermediate-level club player). Maia chess engines are the desired AI teammates for this format because they are trained to predict human moves and make human-like errors, improving the interpretability of its

decisions and fostering a more realistic collaborative dynamic. One important detail employed in the study was not telling participants the rating of the AI teammate, which adds a layer of complexity onto the kind of strategy participants employ when not knowing the strength of their teammate. This opens the possibility of participants thinking that the AI could be stronger or weaker than they are, increasing the variety of strategies available to them.

We recruited eight participants (7 male, 1 female; ages 18–29) from a local chess club, assessing participant skill based on their Chess.com blitz ratings. Participant ratings ranged from 400 to 2200 Elo, ranging from novices to advanced club-level players. Participants completed a demographic survey and preliminary surveys intended to reveal their preferred level of control (as hand or brain) when matched with teammates of stronger, weaker, or about equal strength. The study consisted of two phases. In both phases, participants played hand-and-brain chess as the White pieces and partnered with the same AI teammate (1500 ELO). Each participant played against a Chess.com bot at a rating as close as possible to their blitz rating. Participants were informed of the rating of the Chess.com bot they would face.

Participants started by playing four chess positions selected to represent different stages of the game (opening, two middlegame positions, and endgame) with evaluations near 0.0 by Stockfish 17. For each position, participants had one minute to analyze before playing five turns of our modified hand-and-brain chess. This familiarized them with the interface and helped them form an initial mental model of the AI teammate. After the four initial positions, each participant played a full game with the same AI teammate. Throughout the full game, we recorded participant facial expressions, eye-gaze, time elapsed, board state, moves, and hand-brain choices. After the game, participants completed a semi-structured interview that involved narrating the game, identifying key decision moments, and explaining their choices to act as “brain” or “hand” at specific moments. To conclude the study, participants completed post-study questionnaires assessing teammate trust and perceived team dynamics.

Figure 1: Example of the interface when deciding to pick either “brain” or “hand” every move. Participants have the option to also annotate the possible candidate moves after making their selection which offers insight into the comparison between human and AI thought processes. Notably, the choice of the AI when selecting a particular piece to move or what square to move a particular piece is not affected by the participant’s arrows indicating candidate moves.

Behavioral Insights & Future Directions

Despite the popularization of hand-and-brain chess by content creators, most participants had not played hand-and-brain chess frequently in the past and thus had limited familiarity. Both novices and advanced chess players indicated a lack of trust in their AI teammate based on the overall trust score, with those winning their game trusting their teammate more than those who did not win. Some relied on the AI to handle ambiguity in complex situations while others preferred to retain control after negative experiences with AI decisions. Trust evolved throughout the game, influenced by the participant’s perceived competence of the AI’s performance. Overall, however, participants felt they were not part of a cohesive team due to three primary reasons: misalignment in move intent, lack of feedback from the AI, and cognitive fatigue from repeated hand-brain selection. Interestingly, occasional misalignments did not always erode trust as some participants interpreted unexpected moves as signs of superior insight (Goddard et al., 2012), while others described frustration when the AI disrupted their internal plans or engaged in non-strategic behavior (e.g., piece shuffling).

Additionally, participants were more likely to switch between the two roles in more complex positions, signifying some level of uncertainty with their own judgement. Participants frequently selected “hand” when faced with many options, during openings, or when unsure what to do. These were situations where delegating to the AI helped reduce effort or narrow the decision space. Switching modes (“hand” or “brain”) each turn imposed a cognitive burden, requiring not only board evaluation but also simulating potential outcomes under different control modes. Some participants developed consistent heuristics for mode selection, such as preferring brain mode to minimize risk or defaulting to “hand” unless they had a specific move in mind. Behavioral cues such as gaze entropy, dispersed attention, and longer deliberation, often preceded mode switching. These findings suggest the feasibility of inferring switching intent from real-time user signals which could be incorporated into future interfaces to enable timely, context-implicated support for individuals.

Admittedly, the small sample size and limited number of switching events per participant limit the generalizability and predictive strength of our model. However, many of the observations from data analysis are applicable to other sequential domains that feature decision-making under pressure, including that of emergency room situations or pressure in the restaurant industry. Long-term studies could also investigate how control strategies, trust dynamics, and perceptions of alignment change as individuals gain experience with specific AI partners. Studies like these demonstrate not only how much insight we receive from working with AI, but also exemplify how many still struggle to perceive AI as a teammate rather than merely a tool.

Works Cited

Adam, M., Diebel, C., Goutier, M., & Benlian, A. (2024). Navigating autonomy and control in human-AI delegation: User responses to technology- versus user-invoked task allocation. Decision Support Systems, 180, 114193. https://doi.org/10.1016/j.dss.2024.114193

Amershi, S., Inkpen, K., Teevan, J., Kikin-Gil, R., Horvitz, E., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., & Bennett, P. N. (2019). Guidelines for Human-AI Interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19. https://doi.org/10.1145/3290605.3300233

Chang, S., Anderson, A., & Hofman, J. (2025). ChatBench: From Static Benchmarks to Human-AI Evaluation. https://arxiv.org/html/2504.07114v1

Duan, W., Zhou, S., Scalia, M. J., Freeman, G., Gorman, J., Tolston, M., McNeese, N. J., & Funke, G. (2025). Understanding the processes of trust and distrust contagion in Human–AI Teams: A qualitative approach. Computers in Human Behavior, 165, 108560. https://doi.org/10.1016/j.chb.2025.108560

Feng, J., Sanchez, J., Sall, R., Lyons, J. B., & Nam, C. S. (2019). Emotional expressions facilitate human–human trust when using automation in high-risk situations. Military Psychology, 31(4), 292–305. https://doi.org/10.1080/08995605.2019.1630227

Goddard, K., Roudsari, A., & Wyatt, J. C. (2012). Automation bias: a systematic review of frequency, effect mediators, and mitigators. Journal of the American Medical Informatics Association, 19(1), 121–127. https://doi.org/10.1136/amiajnl-2011-000089

Ibrahim, L., Huang, S., Bhatt, U., Ahmad, L., & Anderljung, M. (2025). Towards Interactive Evaluations for Interaction Harms in Human-AI Systems. Knight First Amendment Institute. https://doi.org/10.48550/arXiv.2405.10632

McIlroy-Young, R., Sen, S., Kleinberg, J., & Anderson, A. (2020). Aligning Superhuman AI with Human Behavior. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3394486.3403219

Salikutluk, V., Schöpper, J., Herbert, F., Scheuermann, K., Frodl, E., Balfanz, D., Jäkel, F., & Koert, D. (2024). An Evaluation of Situational Autonomy for Human-AI Collaboration in a Shared Workspace Setting. https://doi.org/10.1145/3613904.3642564

Vaccaro, M., Almaatouq, A., & Malone, T. (2024). When combinations of humans and AI are useful: A systematic review and meta-analysis. Nature Human Behaviour, 8. https://doi.org/10.1038/s41562-024-02024-1 Zhao, M., Simmons, R., & Admoni, H. (2022). The Role of Adaptation in Collective Human–AI