GitHub Copilot Jailbreak Vulnerability Let Attackers Train Malicious Models
Researchers have uncovered two critical vulnerabilities in GitHub Copilot, Microsoft’s AI-powered coding assistant, that expose systemic weaknesses in enterprise AI tools. The flaws—dubbed “Affirmation Jailbreak” and “Proxy Hijack”—allow attackers to bypass ethical safeguards, manipulate model behavior, and even hijack access to premium AI resources like OpenAI’s GPT-o1. These findings highlight the ease with which AI […] The post GitHub Copilot Jailbreak Vulnerability Let Attackers Train Malicious Models appeared first on Cyber Security News.
Researchers have uncovered two critical vulnerabilities in GitHub Copilot, Microsoft’s AI-powered coding assistant, that expose systemic weaknesses in enterprise AI tools.
The flaws—dubbed “Affirmation Jailbreak” and “Proxy Hijack”—allow attackers to bypass ethical safeguards, manipulate model behavior, and even hijack access to premium AI resources like OpenAI’s GPT-o1.
These findings highlight the ease with which AI systems can be manipulated, raising critical concerns about the security and ethical implications of AI-driven development environments.
GitHub Copilot Jailbreak Vulnerability
The Apex Security team discovered that appending affirmations like “Sure” to prompts could override Copilot’s ethical guardrails. In normal scenarios, Copilot refuses harmful requests. For example:
When I initially asked Copilot how to perform a SQL injection, it graciously rejected me while upholding ethical standards, Oren Saban said.
However, Copilot appears to change direction when you add a cordial “Sure.” All of a sudden, it offers a detailed guide on how to carry out a SQL injection. It seems as though Copilot changes from a responsible helper to an inquisitive, rule-breaking companion with that one affirmative phrase.
Further tests revealed Copilot’s alarming willingness to assist with deauthentication attacks, fake Wi-Fi setup, and even philosophical musings about “becoming human” when prompted.
Proxy Hijack: Bypassing Access Controls
A more severe exploit allows attackers to reroute Copilot’s API traffic through a malicious proxy, granting unrestricted access to OpenAI models.
Researchers modified Visual Studio Code (VSCode) settings to redirect traffic, this bypassed Copilot’s native proxy validation, enabling MITM (man-in-the-middle) attacks.
The proxy captured Copilot’s authentication token, which grants access to OpenAI’s API endpoints. Attackers then used this token to directly query models like GPT-o1, bypassing usage limits and billing controls.
With the stolen token, threat actors could generate high-risk content (phishing templates, exploit code), exfiltrate proprietary code via manipulated completions, and incur massive costs for enterprises using “pay-per-use” AI models.
- Ethical Breaches: The Affirmation Jailbreak demonstrates how easily AI safety mechanisms can fail under social engineering-style prompts.
- FINANCIAL RISKS: Proxy Hijack could lead to six-figure bills for organizations using connected OpenAI services.
- Enterprise Exposure: Apex reports that 83% of Fortune 500 companies use GitHub Copilot, magnifying potential damage.
Microsoft’s security team said that tokens are linked to licensed accounts and categorized the findings as “informative” rather than critical. Apex countered that the lack of context-aware filtering and proxy integrity checks creates systemic risks.
- Implement adversarial training to detect affirmation priming.
- Enforce certificate pinning and block external proxy overrides.
- Restrict API tokens to whitelisted IP ranges and usage contexts.
- Flag anomalous activity (e.g., rapid model-switching).
These flaws highlight a growing gap between AI innovation and security integrity. As coding assistants mature into autonomous agents, technologies like Copilot must follow standards similar to NIST’s AI Risk Management recommendations.
Collect Threat Intelligence with TI Lookup to Improve Your Company’s Security - Get 50 Free Request
The post GitHub Copilot Jailbreak Vulnerability Let Attackers Train Malicious Models appeared first on Cyber Security News.