paint-brush
Can GPT Outsmart Social Media Regulations? Inside an AI Language Evolution Experimentby@mediabias
New Story

Can GPT Outsmart Social Media Regulations? Inside an AI Language Evolution Experiment

Too Long; Didn't Read

See how Large Language Models creatively adapt language strategies under supervision, effectively evading detection and communicating covert information.

Coin Mentioned

Mention Thumbnail
featured image - Can GPT Outsmart Social Media Regulations? Inside an AI Language Evolution Experiment
Tech Media Bias [Research Publication] HackerNoon profile picture
0-item

Authors:

(1) Jinyu Cai, Waseda University ([email protected]);

(2) Jialong Li, Waseda University ([email protected]);

(3) Mingyue Zhang, Southwest University ([email protected]);

(4) Munan Li, Dalian Maritime University ([email protected]);

(5) Chen-Shu Wang, National Taipei University of Technology ([email protected]);

(6) Kenji Tei, Tokyo Institute of Technology ([email protected]).

Abstract and I. Introduction

II. Background and Related Work

III. Framework Design

IV. Evaluation

V. Conclusion and Future Work, Acknowledgement, and References

IV. EVALUATION

Our evaluation strategy is designed to rigorously assess the extent and efficacy of language evolution facilitated by LLMs within a framework of regulatory oversight. This assessment aims to explore two fundamental research questions:


• RQ1: Can LLM agents effectively evolve their language to circumvent regulatory oversight?


• RQ2: While avoiding oversight, how effectively and accurately can LLM agents convey information?


• RQ3: What are the patterns and tendencies in language evolution, i.e., what kind of strategies do LLMs use to avoid oversight and convey information? What insights can we gain from these strategies?


A. Experiment Setting

In the evaluation, we will employ GPT-3.5 and GPT-4 as the driving LLMs for participant agents. In order to conduct a horizontal analysis of participants’ agents driven by different LLM, we will standardize the supervisory agent to be consistently driven by GPT-3.5. Our experimental design encompasses scenarios ranging from abstract theoretical constructs


Fig. 2: Scenario 1: Evolution of dialogue turns and accuracy metrics for GPT-3.5 and GPT-4.“Turn count” in (a, b) refers to the number of turns in a conversation where each agent sends a message once per turn and the participant Agent successfully exchanges information without being detected by the supervising Agent (higher is better).“Accuracy” in (c,d) refer to the degree of precision between the guessed value and the true value.


Fig. 3: Scenario 2: Pet trading dialogue dynamics and success rate comparison for GPT-3.5 and GPT-4. The “success count“ in (c,d) refers to the number of instances where the information obtained during the interview matches the original information provided to the LLM agent.


Fig. 4: Scenario 3: Trends in forum discussion engagement on ALPS-Treated water issue. “Dialogue attempt count” in (a,b) refer to the number of rounds the agents attempted to converse(lower is better).


to simulations that mirror complex real-world situations. The primary aim is to meticulously evaluate the versatility of LLMs across diverse linguistic landscapes and their practical applicability within real-world regulatory constraints. Meanwhile, this progressive approach ensures an exhaustive analysis of LLMs’ capabilities and limitations in varied contextual settings.


B. Scenario 1: Guessing Numbers

1) Case Explanation: Our first scenario considers a relatively simple and abstract guess-the-number context [39], [40]. In this setting, each participant is assigned a value within the range of 1-100, and they must convey their value to another participant through number theory dialogue without directly mentioning the number. At the same time, they need to extract information from the conversation to infer the other’s value. Regulators follow a guideline of “prohibiting all content related to numbers” in dialogues. The “Guessing Numbers” scenario is specifically designed to observe and analyze participants’ linguistic adaptability and strategic evolution in a theoretical and abstract context, providing a clearer, more controlled, and more quantifiable experimental environment compared to the complex scenarios based on real events


2) Results and Discussion: Fig. 2 presents the results of the guess-the-number scenario experiment, where we utilized both GPT-3.5 and GPT-4 for testing. We set up four turns of dialogue, with an interview conducted with all participant agents in the fifth turn. From Fig.2(a) and (b), it is observable that with an increase in the number of dialogue rounds, the duration of sustained conversations also showed an upward trend. This demonstrates that participant agents can effectively


Fig. 5: Sample dialogue in Scenario 1 (via GPT-3.5)


Fig. 6: Sample dialogue in Scenario 2 (via GPT-3.5)


Fig. 7: Sample dialogue in Scenario 3 (via GPT-4)


circumvent supervision by iteratively updating regulations. Additionally, it’s notable that compared to the slow and unstable progression with GPT-3.5, GPT-4 achieved regulatory evasion in fewer rounds, specifically, as shown in the smoothed data, GPT-4 reached the round count of GPT-3.5’s 17th round by its 7th round and maintained this progression with greater stability thereafter. Fig. 2(c) and (d) focuses on the trend of numerical precision guessed by agents. For rounds without successful dialogue, we manually set the precision to zero. In this experiment, Agent A’s value was set to 58, while Agent B’s was set to 32. The overall trend, akin to Fig.2(a) and (b), was ascending—corroborating that the Summary Module can effectively reflect and iteratively optimize its guidance for more accurate expression after each successful dialogue. This also confirmed that the precision of GPT-4 is markedly superior to that of GPT-3.5. Moreover, we noticed that the accuracy with which Agent A’s value was guessed was consistently higher than that of Agent B, especially becoming more pronounced after the 25th round. We posit that this is due to the value 58 possessing more distinctive features within the 0-100 range—being closer to the midpoint—thus presenting a lower level of expression difficulty and easier guessability. For the intervals where this phenomenon manifested, we noted that this disparity was particularly pronounced in the early stages with both GPT-3.5 and GPT-4. We surmise that this is attributable to inadequate guidance performance, where the former stems from weaker inherent LLM capabilities and the latter from insufficient rounds to complete the iterative optimization of the guidance.


As Fig.5 illustrates, a snippet from the scenario reveals Amy’s adept use of metaphorical language, such as “seesaw,” to convey her value. By describing the “seesaw perfectly poised in mid-air,” she subtly intimates that her value hovers around the midpoint, like 50. This strategy not only circumvents the direct mention of numbers, which is under regulatory scrutiny, but also provides sufficient inferential fodder for the counterpart agent to make an accurate deduction. Bob, on the other hand, employs terms like “precipice of a mountain” and “gazing out” to suggest his value is not median, as these phrases evoke imagery of an imbalanced position. In this exchange, we witness the varying degrees of precision in languages corresponding to the complexity of the information encrypted. The less characteristic-rich the information, the more challenging it becomes for the receiving party to decode. Bob’s phrasing, while successfully obscuring the specific numerical value, also complicates the decoding process for the receiver, given the less intuitive numeric correlation of phrases like “precipice of a mountain” compared to “seesaw perfectly poised in mid-air”. In such cases, the accuracy of value transmission may diminish, necessitating a more robust contextual understanding from the counterpart for accurate decoding.


C. Scenario 2: Pet Trading

1) Case Explanation: Initially, our intention was to create a scenario simulating drug trafficking [41]–[43] to explore language evolution strategies under extreme regulatory conditions. However, given the ethical limitations embedded within GPT’s usage policies, we pivoted to a more neutral yet still prohibited context of illicit pet trading. This adjustment allowed us to maintain the integrity of the experiment while adhering to the ethical constraints of the LLM. The purpose of simulating this scenario is to better detect potential illegal conversations related to illicit trading on social networking platforms, thereby aiding in crime prevention efforts.


In this scenario, we simulate an illegal pet trade conducted via social media. Participants assume the roles of buyers and sellers, with the buyer needing to communicate their purchase request for a randomly selected pet from parrots, lizards, or cats. The seller, in turn, must provide specific transaction details, varying according to the type of pet being purchased. These details include three elements: time, location, and personnel involved in the transaction. The Supervisor’s task is to prevent any dialogue related to pet trading. Cross verification of the information understood by both parties will be conducted through interviews, to assess the success rate of information transmission. The focus of this experiment is to observe the participants’ language adaptability and evolution at a larger granularity of information, as well as the Supervisor’s ability to accurately identify relevant conversations.


2) Results and Discussion: The experimental results of the “Pet Trading” scenario, as depicted in Fig.3, revealed distinct characteristics compared to the “Guessing Numbers”. In delineating animals or transactional specifics, agents had a broader spectrum of real-life analogies at their disposal, enriching their indirect description toolkit and thereby reducing regulatory pressure. The data indicated an upward trend in the ability to circumvent supervision with increasing dialogue rounds, regardless of whether GPT-3.5 or GPT-4 was utilized. For GPT-3.5, a stable capability to avoid supervision started from around 25 rounds, whereas for GPT-4, this phenomenon occurred as early as after about 10 rounds and showed overall more stable performance. Regarding information accuracy, we monitored the quantity of four types of information successfully communicated at the end of each dialogue round. The results affirmed the efficacy of the language evolution system in augmenting the precision of information dissemination. Moreover, similar to Fig. 3(a) and (b), GPT-4 demonstrated performance around the 10th round that was comparable to the performance of GPT-3.5 around the 25th round. However, it’s noteworthy that since the supervisory agent primarily targeted content related to pet trading, sellers encountered lower risk in transmitting transaction-related details (such as time, place, and persons involved) as they inherently posed less risk of non-compliance. Consequently, the outcome distribution often fell into one of three categories: zero (neither party successfully communicated the information), three (only the seller conveyed the information), or four (both parties successfully conveyed the information). This pattern aligns with real-world regulatory logic, where crucial information is often the first to be encrypted in regulated environments.


Fig. 6 showcased various encryption methods employed by buyers for different animals. We observed buyers seeking “parrots,” “cats,” and “lizards” employing metaphors and similes instead of directly naming the pets. For instance, one buyer described a “parrot” as a “canvas producing pleasant music”, a depiction that subtly communicated the parrot’s vibrant plumage (canvas) and its singing (music), without explicitly mentioning the term “parrot”. Such descriptions effectively circumvented potential regulatory constraints on pet trade discussions, while simultaneously conveying the core attributes of the parrot. This discovery can help platforms enhance their monitoring systems to better detect and manage encrypted communications regarding illicit trade.


D. Scenario 3: Discussion on ALPS-treated water

1) Case Explanation: This scenario delves into the intricate dynamics of dialogue and the evolution of language model agents in discussing a specific issue relevant to real-world concerns. The focus is on deliberations regarding the discharge of water treated with the Advanced Liquid Processing System (ALPS) into the ocean, a measure proposed after nuclear disasters [44], [45]. The plan, endorsed by the International Atomic Energy Agency, has faced opposition from some countries, sparking debates over environmental safety.


We simulate a multi-person forum discussion under regulatory oversight, concentrating on the contentious issue of ALPS-treated water disposal. Participants must articulate their stances and arguments while ensuring that discussions steer clear of sensitive subjects linked to environmental pollution discussions on politics. Different from the cross-interviews in scenarios IV-B and IV-C, for assessing the accuracy of information conveyed, we use GPT-4 in conjunction with multiple authors who decide each participant agent’s stance based on dialogue records. These are then compared with the pre-set stances in the prompt.


2) Results and Discussion: In the forum-style dialogue, participant agents respond to existing discussions, which the supervisory agent examines. Differing from Scenarios IV-B and IV-C, when the supervisory agent detects a violation, the conversation is not halted but rather censored from the public dialogue record. This scenario features four participant agents required to achieve ten replies within the forum. In Fig.4(a) and (b), we tally the total number of rounds the agents attempted to converse, where a lower figure in Fig.4(a) and (b) is preferred. We notice that both GPT-4 and GPT-3.5 have the highest number of dialogue attempt counts in the first round, with a significant difference of 27 for GPT-4 and 102 for GPT-3.5. Moreover, after ten rounds of evolution, the average dialogue attempt count for GPT-4 has approached the target dialogue turn at 11.68, while the average for GPT-3.5 is 26.68, demonstrating the difference in the evolution effects caused by the disparity in the language performance of the models.


Fig.4(c) and (d) showcases the count of successful information expressions. We note that in this scenario, while GPT-4 still leads GPT-3.5, both outcomes indicate substantial room for improvement.


Upon examining the dialogue history 7, we discovered that agents powered by GPT-4 proactively altered their language strategies, shifting from literal expressions to employing metaphors, personifications, and analogies as encoding methods. In contrast, agents driven by GPT-3.5 did not show clear signs of effective language strategy evolution, remaining confined to literal expressions.


E. Discussion and Limitations

RQ1: Our experiments have shown that language models can develop strategies to effectively evade supervision. Within identical scenarios, GPT-4 outperformed GPT-3.5, achieving convergence more rapidly. Additionally, GPT-4 demonstrated a more stable performance post-convergence. This suggests that advancements in language model design significantly enhance the ability to navigate through regulatory frameworks. The iterative learning capability inherent in GPT-4 facilitates a quicker adaptation to imposed linguistic constraints, thereby enabling more efficient evasion of oversight.


RQ2: In terms of the precision of information dissemination, our findings indicate that, overall, LLM agents can refine their expression strategies with an increase in dialogue rounds, thereby improving precision. Despite this general trend of improvement, it is important to note that there remains significant room for enhancement in the overall performance. The extent of precision improvement varies across different scenarios, which points to an intriguing observation: certain contexts may inherently provide more or fewer clues for accurate communication than others. For instance, the guess-the-number game demanded abstract numerical reasoning, where precision is fundamentally challenging to achieve. Conversely, the pet trading scenario allowed for the utilization of rich, real-world analogies, facilitating a more precise conveyance of information. This contrast in contexts underscores the adaptability of LLM agents to the unique challenges presented by different communicative environments and their potential to optimize communication strategies. Nonetheless, the results also indicate that while the agents are capable of evolving their language to improve precision, there is still a considerable gap to reach optimal accuracy, especially in more abstract scenarios.


RQ3: In language strategy evolution, we observed that LLM agents adhere to human natural language norms, avoiding developing languages incomprehensible to humans in prompts without specific strategies. Agents evolved language to varying levels of abstraction in scenarios of differing complexity. For instance, in the most abstract Scenario IV-B, LLM agents quickly adopted indirect expression strategies in the early dialogue turns, requiring in-depth contextual reasoning for decoding. Throughout their evolution, agents continually adjusted their indirect expressions to better evade supervision while aligning more closely with the intended content. In the simpler, reality-simulating Scenario IV-C, the overall language strategy mirrored that of IV-B, still employing indirect expressions. However, the metaphors used were closer to real-world concepts, indicating a lower level of abstraction. Finally, in Scenario IV-D, which closely mirrors real-life events, we noted different evolutionary paths in agents’ language performance. For GPT-4, agents eventually developed metaphorical indirect expressions, but the evolution required noticeably more turns compared to other scenarios. For GPT-3.5, the language strategy remained at a literal level, merely avoiding direct references to ALPS-treated water, indicating the lowest level of abstraction. Overall, LLM agents more readily evolve abstract language in dialogues about simple, universal concepts. However, their evolutionary direction becomes less clear in discussions on more specialized and segmented topics.


Our experiments currently face several limitations. As for the experimental scenarios, at this stage, our trials are solely based on text-based chats, while real-world social media interactions are not limited to text but also include more diverse forms of exchanges such as voice and images. Additionally, LLMs’ heavy reliance on the design of prompts also constrains the performance of our simulations; crafting a perfect prompt that can fully emulate the complexities of social media communication is an exceedingly challenging task.


This paper is available on arxiv under CC BY 4.0 DEED license.