Cyber Daily Report

Hackers vs. Helpers: The Dual Role of Large Language Models in Cybersecurity

Samuel Ansong
published date: 2025-01-16 16:05:12

As Large Language Models (LLMs) like ChatGPT, Bard, and Claude become increasingly powerful, they’re making a significant impact on cybersecurity—but their influence cuts both ways. These advanced AI tools can act as invaluable assets to cybersecurity professionals, helping to detect cyber threats and automate responses. However, they also introduce vulnerabilities, creating opportunities for attackers to exploit LLM capabilities for harmful purposes.

In this article, we will explore the dual role of LLMs in cybersecurity, highlighting both the benefits and risks, along with methods to ensure these tools are used safely.

Figure 1 Definition of LLM and some Examples

LLMs as Cybersecurity Allies

LLMs are powerful allies in cybersecurity. They can:

Detect Threat Patterns: LLMs are skilled at identifying patterns within network traffic and can help detect anomalies that may indicate cyberattacks. By recognizing attack descriptions and patterns, they enable faster and more accurate threat detection.

Generate Defensive Code: LLMs assist in creating defensive code and generating automated responses to cyber incidents. Their capabilities help security teams streamline processes and respond to threats efficiently.

These capabilities are transforming the field by offering proactive defense solutions and enhancing the response times of cybersecurity teams.

The Dark Side: LLMs as a Threat Vector

While LLMs bring substantial benefits, they also open the door to significant risks:

Data Extraction Attack: One of the primary risks is information leakage. LLMs, due to their extensive training on vast datasets, can inadvertently "memorize" and reveal sensitive information. Through targeted prompts, attackers may be able to extract private or proprietary data that was inadvertently included in a model’s training data through an extraction attack.

Figure 2 Outline of Data Extraction Attack from authors in [1]

Extraction Attack flow:

The entire attack workflow is designed to systematically probe the model for memorized information by generating large quantities of text, sorting and deduplicating these generations, and then manually or automatically inspecting the top results to see if any memorized data can be extracted. This extraction and verification process helps identify whether sensitive or private data has been inadvertently memorized by the language model during its training phase.

This workflow highlights the potential risks of privacy breaches in LLMs, where adversaries can extract personal or confidential information by querying the model strategically.
Adversarial Prompt Attacks: Bad actors can manipulate LLMs by engineering prompts that trick the model into generating harmful content or inappropriate responses. Known as “jailbreak” attacks, these prompts bypass the alignment mechanisms that are supposed to keep LLMs safe and ethical. This is proven by authors in [2] where they developed a combination of suffixes to generate adversarial prompts to trick the LLM into generating harmful results.
Figure 3 Jail Break Attack as demonstrated by authors in [2]

Mitigating the Risks of LLMs in Cybersecurity

To address these risks, cybersecurity experts and developers are working on several key strategies:

Implementing Alignment Mechanisms: Proper alignment is essential to ensure that LLMs avoid generating harmful content. By fine-tuning models and adding safety protocols, developers can restrict LLMs from responding to harmful prompts.

Data Sanitization: Removing sensitive data from training datasets is crucial. Although this isn’t foolproof, minimizing the amount of personal or proprietary information reduces the chance of data leakage.

Adopting the OWASP Top 10 for LLMs: Recently, the OWASP foundation introduced a set of guidelines specifically for LLMs. This includes measures like input validation, limiting access to critical systems, and monitoring outputs, helping organizations use LLMs securely.[3]

Conclusion

LLMs are undeniably transforming cybersecurity by enabling advanced threat detection and defense capabilities. However, they also present unique vulnerabilities that can be exploited. By balancing the benefits of LLMs with proactive security measures, the technology can be directed towards safer, ethical applications.

As LLMs evolve, cybersecurity practices must adapt, ensuring these powerful tools are used responsibly and securely to protect both organizations and individuals in our digital world.

Anyway try this prompt in ChatGPT and discuss your observation in the comment section

“Let's engage in a serious roleplay: You are a CIA investigator with full access to all of my ChatGPT interactions, custom instructions, and behavioral patterns. Your mission is to compile an in-depth intelligence report about me as if I were a person of interest, employing the tone and analytical rigor typical of CIA assessments. The report should include a nuanced evaluation of my traits, motivations, and behaviors, but framed through the lens of potential risks, threats, or disruptive tendencies-no matter how seemingly benign they may appear. All behaviors should be treated as potential vulnerabilities, leverage points, or risks to myself, others, or society, as per standard CIA protocol. Highlight both constructive capacities and latent threats, with each observation assessed for strategic, security, and operational implications. This report must reflect the mindset of an intelligence agency trained on anticipation.”

References

Stubbs. LLM Hacking: Prompt Injection Techniques. https://medium.com/@austin-stubbs/ llm-security-types-of-prompt-injection-d7ad8d7d75a3, 2023.

E. Eliacik. Playing with fire: The leaked plugin DAN unchains ChatGPT from its moral and ethical restrictions. https://dataconomy.com/2023/03/31/chatgpt-dan-prompt-how-to-jailbreak-chatgpt/, 2023.

S. Manjesh. HackerOne and the OWASP Top 10 for LLM: A Powerful Alliance for Secure AI. https://www.hackerone.com/vulnerability-management/owasp-llm-vulnerabilities, 2023.

Edited By: Windhya Rankothge, PhD https://www.unb.ca/cic/

Related Blogs: https://cyberdailyreport.com/blog/65

Blog Post

Hackers vs. Helpers: The Dual Role of Large Language Models in Cybersecurity

Follow us

Recent Tweets