Artificial Intelligence & Machine Learning , Next-Generation Technologies & Secure Development
Vulnerabilities in LangChain Gen AI Could Prompt Data Leak
Open-Source Company Issues Patches After Being Alerted by Palo AltoA widely used generative artificial intelligence framework is vulnerable to a prompt injunction flaw that could enable sensitive data to leak.
See Also: Stop Sensitive Data Loss with AI Powered DLP
Researchers at security firm Palo Alto Networks uncovered two arbitrary code flaws in LangChain, an open-source library that supports large language model app development.
"These two flaws could have allowed attackers to execute arbitrary code and access sensitive data. LangChain has since issued patches to resolve these vulnerabilities," the researchers said.
The first vulnerability, tracked as CVE-2023-44467, is a critical prompt injection flaw that affects PALChain, a Python library used by LangChain to generate code.
The researchers exploited the flaw by altering the functionalities of two security functions within from_math_prompt
, a method that translates user queries into executable Python code.
By setting the values of the two security functions to false
, the researchers altered LangChain's validation checks and its ability to detect dangerous functions - allowing them to run the malicious code on the application as a user-specified action.
"By disallowing imports and blocking certain built-in command execution functions, the approach theoretically reduces the risk of executing unauthorized or harmful code," the researchers said.
The other flaw, tracked CVE-2023-46229, affects a LangChain feature called SitemapLoader
that scrapes information from different URLs to generate information collected from each site as a PDF.
The vulnerability stems from SitemapLoader's ability to retrieve information from every URL that it receives. A supporting utility called scrape_all
collects data from each URL it receives without filtering or sanitizing any data.
"A malicious actor could include URLs to intranet resources in the provided sitemap. This can result in server-side request forgery and the unintentional leakage of sensitive data when content from the listed URLs is fetched and returned," the researchers said.
They also said the threat actors could potentially exploit the flaw to extract sensitive information from limited-access application program interfaces of an organization or other back-end environment that the LLM interacts with.
To mitigate the vulnerability, LangChain introduced a new function called extract_scheme_and_domain
and an allowlist that lets its users control domains," the researchers said.
Both Palo Alto and LangChain urged immediate patching, especially as companies rush to deploy AI solutions.
It is unclear if threat actors have exploited the flaws. LangChain did not immediately respond to a request for comment.