DeepSeek has witnessed an explosion in popularity since two of its cost-efficient AI models, released recently in quick succession, were touted as exhibiting performance on-par with large language models (LLMs) developed by US rivals such as OpenAI and Google. But DeepSeek’s meteoric rise has been accompanied by a range of concerns among users regarding data privacy, cybersecurity, disinformation, and more. Some of these concerns have been fueled by the AI research lab’s Chinese origins while others have pointed to the open-source nature of its AI technology. The US Navy has reportedly warned its members not to use DeepSeek’s AI services “for any work-related tasks or personal use,” citing potential security and ethical concerns. However, tech industry figures such as Perplexity CEO Aravind Srinivas have repeatedly sought to allay such worries by pointing out that DeepSeek’s AI can be downloaded and run locally on your laptop or other devices. How does DeepSeek handle user data? Do its AI models pose the same privacy risks as other LLMs? If not, what sets them apart? Let’s examine. What does DeepSeek’s privacy policy say? So far, DeepSeek has rolled out several AI models designed for coding, writing tasks, image generation, etc. The underlying model architecture of some of these AI models along with their weights (numerical values to determine how the AI model processes information) are available for download on platforms such as Hugging Face. However, average users are more likely to access DeepSeek’s AI by downloading its app on iOS and Android devices or using the web version. In its privacy policy, DeepSeek unequivocally states: “We store the information we collect in secure servers located in the People's Republic of China.” As per the privacy policy, the user data collected by DeepSeek is broadly categorised into: - Information provided by the user: Text or audio inputs, prompts, uploaded files, feedback, chat history, email address, phone number, date of birth, and username, etc. - Automatically collected information: Device model, operating system, IP address, cookies, crash reports, keystroke patterns or rhythms, etc. - Information from other sources: If a user creates a DeepSeek account using Google or Apple sign-on, it “may collect information from the service, such as access token.” It may also collect user data such as mobile identifiers, hashed email addresses and phone numbers, and cookie identifiers shared by advertisers. As per the privacy policy, DeepSeek may use prompts from users to develop new AI models. The company says it will “review, improve, and develop the service, including by monitoring interactions and usage across your devices, analysing how people are using it, and by training and improving our technology.” It further states that the user data can be accessed by DeepSeek’s corporate group and will be shared with law enforcement agencies, public authorities, and others in compliance with legal obligations. What are the main ways in which LLMs threaten users’ privacy? DeepSeek's data collection is in line with practices of other generative AI platforms. For instance, OpenAI’s ChatGPT was also criticised in the past for collecting vast amounts of user data. The AI chatbot was even briefly banned in Italy over privacy concerns. “Risks for privacy and data protection come from both the way that LLMs are trained and developed and the way they function for end users,” Privacy International, a UK-based non-profit organisation advocating for digital rights, said in a report. Privacy experts have also pointed out that it is possible for personal data to be extracted from LLMs by feeding in the right prompts. In its lawsuit against OpenAI, The New York Times had said that it came across examples of ChatGPT reproducing its articles verbatim. In 2023, Google Deepmind researchers also claimed that they had found ways to trick ChatGPT into spitting out potentially sensitive personal data. “The possibility to use LLMs (in particular ones that have been made available with open source weights) to make deepfakes, to imitate someone’s style and so on shows how uncontrolled its outputs can be,” Privacy International said. Users may also not be aware that the prompts they are feeding into LLMs are being absorbed into datasets to further train AI models, it added. Additionally, the US Federal Trade Commission (FTC) has noted that AI tools “are prone to adversarial inputs or attacks that put personal data at risk.” DeepSeek confirmed on Tuesday, January 28, that it was hit by a large-scale cyberattack, forcing it to pause new user sign-ups on its web chatbot interface. Do these privacy concerns hold for DeepSeek as well? To be sure, DeepSeek users can delete their chat history as well as their accounts via the Settings tab in the mobile app. However, it appears that there is no way for users to opt out of having their interactions used for AI training purposes. And while DeepSeek has made the underlying model architecture and weights of its reasoning model (R1) open-source, the training datasets and instructions used for training R1 are not publicly available, according to TechCrunch. The storage of DeepSeek user data in servers located in China is already inviting scrutiny from various countries. US government officials are reportedly looking into the national security implications of the app, and Italy’s privacy watchdog is seeking more information from the company on data protection. But when it comes to privacy and data protection, perhaps the strongest argument in favour of DeepSeek is that its open-source AI models can be downloaded and installed locally on a computer. Running local instances means that users can privately interact with DeepSeek’s AI without the company getting their hands on input data, according to a report by Wired. If users lack the hardware and compute power necessary to do this, they can use DeepSeek's AI chatbot through other platforms such as Perplexity. CEO Aravind Srinivas said that the AI search company is hosting the model in data centres located in the US and European Union (EU), not China. He also claimed that the Perplexity-hosted version of DeepSeek's AI model is free from censorship restrictions. All DeepSeek usage in Perplexity is through models hosted in data centers in the USA and Europe. DeepSeek is *open-source*. None of your data goes to China. — Aravind Srinivas (@AravSrinivas) January 27, 2025 Additionally, DeepSeek’s models can be modified and accessed through developer-focused platforms such as Together AI and Fireworks AI. DeepSeek has further claimed that its application programming interface (API) is a ‘stateless API’. This means that “the server does not record the context of the [developer’s] requests.” If developers want the DeepSeek AI model that they are accessing via an API to have any memory of previous interactions, then they have to explicitly “concatenate all previous conversation history and pass it to the chat API with each request.”