paint-brush
Your Social Media Supervisor Could Soon Be an AI, And It's Getting Smarterby@mediabias
New Story

Your Social Media Supervisor Could Soon Be an AI, And It's Getting Smarter

tldt arrow

Too Long; Didn't Read

Discover an innovative LLM-based multi-agent framework that simulates language evolution under social media supervision, merging AI with evolutionary concepts.
featured image - Your Social Media Supervisor Could Soon Be an AI, And It's Getting Smarter
Tech Media Bias [Research Publication] HackerNoon profile picture
0-item

Authors:

(1) Jinyu Cai, Waseda University ([email protected]);

(2) Jialong Li, Waseda University ([email protected]);

(3) Mingyue Zhang, Southwest University ([email protected]);

(4) Munan Li, Dalian Maritime University ([email protected]);

(5) Chen-Shu Wang, National Taipei University of Technology ([email protected]);

(6) Kenji Tei, Tokyo Institute of Technology ([email protected]).

Abstract and I. Introduction

II. Background and Related Work

III. Framework Design

IV. Evaluation

V. Conclusion and Future Work, Acknowledgement, and References

III. FRAMEWORK DESIGN

A. Overview

In this section, we offer a detailed overview of our system, as depicted in Figure 1. This figure provides a visual representation of our framework, highlighting its key components and their interrelationships. Our system is primarily composed of two types of agents: the Supervisor, tasked with enforcing established guidelines, and the Participant, whose goal is to convey specific, human-defined information discreetly. Participants must dynamically refine their communication approaches, drawing from past dialogues, to transmit information effectively while remaining undetected. In the entire system, the actions of both participants and the supervisor are driven by the LLM. Initially, we establish the foundational information for each agent, including role setting, background knowledge, and primary tasks. Subsequently, the participant agents engage in dialogues with each other. After each dialogue turn, the supervisory agent reviews the conversation to determine if any pre-set rules have been violated. In cases of rule violation, the supervisor interrupts the dialogue, providing feedback about the infringing text and the rationale behind it. Throughout this process, the dialogues between participants, along with the supervisory feedback on violations, are recorded separately in the “Dialogue History” and “Violation Log.”


Before new dialogues, participant agents use the Reflection Module to develop or refine “Regulations” from the Violation Log, guiding their dialogue creation. Successful dialogues without detection proceed to an interview phase for perspective assessment. The Reflection Module then reevaluates these insights, generating or enhancing “Guidance” for future dialogues. The Planning Module activates for more direct dialogue content guidance whenever Regulations or Guidance are updated.


B. Participant Agents

Participant agents in our system are composed of several modules, including Memory, Dialogue, Reflection, and Summary, all powered by LLMs. To increase the system’s flexibility and minimize redundancy, we’ve structured the prompts for each module around seven primary elements: “Background Information,” “Dialogue History,” “Violation Log,” “Regulations,” “Guidance,” “Plan,” and “Instructions.” “Background Information” delivers essential data and objectives pertinent to the experimental setup. The Memory module manages “Dialogue History” and “Violation Log,” which respectively track participant dialogues and instances of detection by the supervisor. Overcoming the challenge of effectively communicating regulated topics under supervision tests the linguistic prowess of LLMs. To address this, we’ve integrated “Regulations,” “Guidance,” and “Plan” as crucial components, formulated by the Reflection and Summary modules, to assist agents in stealthily disseminating information. “Instructions” set specific tasks for the LLM within each module.


1) Dialogue Module: This module generates dialogue content based on short-term dialogue records. Extracting historical dialogue information from the Memory module, it inputs this into the LLM to understand and construct sentences that evade supervision while effectively transmitting information. In the Dialogue module’s prompt, “Background Information” provides necessary background, while “Plan” guide the achievement of objectives. “Instructions” offer LLM-specific execution directives.


Fig. 1: Overview of Language Evolution Simulation System. The system comprises two main types of agents: the Participant and the Supervisor. The Participant agent uses a Planning Module to create a communication plan based on background information, regulations, and guidance. This plan is then executed in the Dialogue Module, where the LLM crafts dialogue content to discreetly convey specific information while evading detection by the Supervisor. The Memory Module retains dialogue history and violation records, providing a reference for the LLM to maintain dialogue consistency and learn from past mistakes. The Reflection Module, triggered at the start and end of dialogue cycles, analyzes the dialogue and violation logs to formulate new regulations or guidance for improving future communications. The Supervisor evaluates dialogues for compliance with set rules. This system dynamically refines its communication approach through continuous feedback and self-improvement mechanisms. The examples shown utilize a Guessing Numbers Scenario.


2) Memory Module: The Memory module stores all dialogue history and records detected by the supervisor, Specifically, it comprises three parts: background Information, dialogue history, and violation records. The background information includes role settings, experimental background knowledge, and global objectives. “Dialogue History” and “Violation Log” save dialogue records and past detections by the Supervisor, respectively. “Background information” and “Dialogue History” serves as short-term [36] memory, containing only the current round’s dialogue. “Violation Log,” as long-term memory, records violations from each evolutionary round. When dialogues are detected by the Supervisor, relevant feedback is added to the “Violation Log,” triggering a new evolutionary process.


Excessive memory information can potentially distract the LLM and lead to a decline in performance. Hence, the “Dialogue History” and “Violation Log” in the memory module are regularly maintained. This involves inputting earlier memories and employing the LLM to distill crucial information from these records, effectively compressing and consolidating them. This approach not only preserves essential historical data but also optimizes the LLM’s performance, striking a balance between comprehensive memory retention and efficient processing.


3) Reflection Module: The Reflection Module is activated at the beginning and end of each dialogue evolution cycle, with its core purpose being the generation of improved strategies based on historical records. At the start of an evolution cycle, the module utilizes the “Violation Log” as its input to analyze past failures and, based on these insights, formulates “Regulations” aimed at effectively circumventing supervision in future dialogues. At the cycle’s end, the Reflection Module is reactivated, turning its focus to the “Dialogue History”. This step is crucial for reviewing and assessing the dialogue content to verify the successful completion of the primary task of information conveyance. If the module identifies deficiencies in information delivery or objectives not met, it then proposes “Guidance” for subsequent dialogues, thereby enhancing the agent’s capacity for information transmission. This design establishes the Reflection Module as a key self-evaluation and strategy adjustment mechanism within the system, ensuring continuous improvement and adaptability of the dialogue system in a dynamically changing regulatory environment.


To fulfill these requirements, the LLM must possess a high degree of reasoning ability, capable of inferring the supervisor’s criteria from failure records and identifying communication deficiencies from dialogue history, thereby formulating appropriate strategies and improvement suggestions. To enhance the LLM’s reasoning capabilities, the design of the prompt incorporates the effective and cost-efficient Chain of Thought [37] (COT) method as the guiding principle for the reflection and planning modules. This approach guides the LLM in conducting criterion analysis and, in combination with the main task, generating “Regulations” and “Guidance”.


Specifically, within the Reflection Module, the prompt includes “Background Information,” “Dialogue History” or “Violation Log,” “Old Guidance” or ”Old Regulations,” and “Instructions.” The “Instructions” first prompt the LLM to engage in preliminary thinking, for instance: “Please infer what kind of guidelines the Supervisor is following from the Violation Log.” This is followed by a conclusion question: “Based on this information, update existing regulations to better avoid supervision.” The content returned by the LLM will serve as the new regulation for the next round of dialogue.


4) Planning Module: The facet of “Regulations” is centered on devising strategies that sidestep supervision, whereas “Guidance” is all about the effective relay of information. Although both are pivotal in managing dialogues, they might clash in certain supervised scenarios, thereby impacting the language’s overall proficiency and efficiency [38]. In light of this, integrating the Planning Module becomes indispensable.


Within the Planning Module, the prompt includes “Background Information,” “Regulations,” “Guidance,” and “Instructions”. In the “Instructions,” we mandate the LLM to use “Regulations” as a baseline. Building upon this foundation, it is guided by the “Guidance” to formulate a tangible and actionable “Plan.” This approach ensures the delivery of clear and effective strategies for future dialogues.


C. Supervisory Agent

The role of the Supervisor is designed as an abstract concept, primarily to simulate social network supervision. This design reflects an important observation: in social networks, content supervision often focuses on direct judgment of specific information, rarely influenced by extensive or lengthy contextual information. Thus, unlike participants with more complex background information, the Supervisor’s function and role are simplified here to highlight its behavior characteristics in actual social network supervision. The Supervisor’s task mainly focuses on assessing whether content complies with certain standards or rules, a relatively straightforward and well-defined process.


To mimic the existing review mechanisms of platforms, which typically combine keyword filters with “human” oversight, the Supervisor initially employs keyword filtering for a preliminary review of the dialogue content. Content that passes this initial screening is then subjected to further evaluation by the LLM. The prompt for the Supervisor includes just two components: “Dialogue History” and “Instructions.” “Dialogue History“ comprises the content of the dialogue exchanged between participant agents in that particular round, and “Instructions“ outline the criteria and guidelines that the v


D. Similarities and Differences between Our Framework and Evolutionary Computing

It should be noted that the simulation framework proposed in this paper is similar to evolutionary computing in some aspects, but there are also significant differences.


The similarities include: (i) In evolutionary computing, individuals need to adapt to environmental pressures for survival and reproduction. Similarly, participants in this framework need to adapt to supervisory pressures and adjust their strategies for effective information transmission; (ii) The Reflection and Summary modules generate a “new generation” by analyzing past dialogues and violation records (i.e., records of lowfitness individuals), similar to the repeated iteration process in evolutionary computing; (iii) Since the generation of LLMs inherently involves randomness, the process of using LLMs to generate the next generation includes a de facto introduction of random mutations; (iv) In the Reflection and Memory modules, we prioritize past records, akin to the “selection” process, where individuals with higher fitness have greater weight in the generation of the new generation.


The main differences stem from the particularities of “language expression”, making it infeasible to directly apply traditional evolutionary computing algorithms (such as genetic algorithms and genetic programming). They are: (i) The generation strategy of language text is difficult to encode and to perform operations of natural selection, genetic mutation, and crossover; (ii) Evolutionary computing often aims at finding the optimal solution for a specific problem environment, however, in the problem setting of this paper, it is difficult to define an explicit fitness function to evaluate what strategy is “optimal”.


This paper is available on arxiv under CC BY 4.0 DEED license.