How to Stop Your Data from Being Used to Train AI

Rosslyn Elliott / Updated Apr 26, 2024 | Pub. Apr 26, 2024

Data can move so easily through digital networks. We’ve come a long way from the paper-based information system that lasted from the invention of the printing press to the end of the 20th century.

Most of the time, we think digital data is a good thing. It’s very easy to get our records or transcripts sent from one place to another.

But there is a downside to all that free-flowing, easy access data. Your data may be used to train AI, whether you like it or not.

If AI uses your data, you may lose personal privacy or accidentally contribute to discrimination against others. You also may experience the theft of aspects of your original writing or artwork.

People who use AI often don’t realize that generative AI may be copying parts of your work in ways that have never been legal before AI. It is now well-known that AI has taken copyrighted work to use for “training,” in other words, for imitation.

Read on to find out how to increase your data privacy. There are steps you can take to stop your data or your content from being used to train AI.

What is Data Privacy?

Data privacy is the protection of your personal information from unauthorized access, use, or tampering.

In our digital age, data privacy is complicated. Vast amounts of personal data are collected, stored, and analyzed by companies, governments, and other organizations.

Those organizations do not always have an investment in protecting your privacy. In fact, some may want to make money off the use of your data. The newest way for big tech to use your data is to train AI with it.

The Importance of Personal Data

Personal data includes any information that can be used to identify an individual, such as name, address, email, phone number, social security number, and financial information.

This data is valuable to organizations as it helps them understand their customers, target their marketing efforts, and improve their products and services.

 

person accesses personal data on the internet

Personal data

The Risks of Data Breaches

When personal data falls into the wrong hands, it can lead to identity theft, financial fraud, or damage to your reputation.

Data breaches have become increasingly common, with millions of records being exposed each year due to hacking, malware, and other cyber threats.

Now that AI is developing huge databases of information scraped from social media users or from other platforms on the internet, it is crucial that you understand why your data might end up in an AI system.

Do you want your data being used to train AI? Well, there are several reasons you wouldn’t.

Why is Data Privacy Important?

Protecting your data privacy is crucial for several reasons:

Preventing Identity Theft and Fraud

Identity theft occurs when someone uses your personal information without your permission to commit fraud or other crimes. By protecting your data privacy, you can reduce the risk of becoming a victim of identity theft and the financial and emotional toll it can take.

Maintaining Personal Autonomy

Your personal data belongs to you, and you have the right to control how it is collected, used, and shared. By protecting your data privacy, you can maintain your personal autonomy and make sure that your information is not being used in negative ways.

 

glowing hand holds glowing scales of justice

AI can make biased decisions

Avoiding Discrimination with AI

Personal data can be used to make decisions about individuals such as whether to offer them a job, a loan, or insurance. More and more, AI is making these kinds of decisions.

If AI uses this data unfairly, it can lead to serious harm. Algorithmic bias happens when people are denied medical care or opportunity based on their skin color or other characteristics. When you pay attention to your data privacy, you can help prevent discrimination based on your personal characteristics.

Building Trust

When individuals feel that their personal data is being protected, they are more likely to trust the organizations they interact with. This trust is essential for building long-term relationships and fostering a healthy digital ecosystem.

How is AI Trained?

Artificial intelligence (AI) systems are trained using large datasets, which often include personal information. This process allows AI to learn patterns, make predictions, and perform tasks based on the data it has been trained on.

The Importance of Quality Data

The quality of the data used to train AI is crucial for its performance and accuracy. If the data is biased or incomplete, it can lead to flawed or unfair outcomes. For example, if an AI system is trained on data that is predominantly from one demographic group, it may not perform well for other groups.

The Process of Training AI

Training AI involves feeding it large amounts of data and using algorithms to identify patterns within that data. This process can be time-consuming and resource-intensive, requiring powerful computers and specialized software.

 

Glowing head with gears showing AI training

AI training

How Does AI Collect Data?

AI systems collect data from various sources, including:

Social Media Platforms

Social media platforms like Facebook, Twitter, and Instagram collect vast amounts of personal data from their users, including their interests, behaviors, and social connections. This data can be used to train AI systems to better predict human behavior.

Online Browsing and Search History

Search engines like Google and Bing collect data about the websites people visit and the search terms they use. This data can be used to train AI systems to provide more relevant search results and targeted advertising.

Smartphone Apps and Sensors

Many smartphone apps collect data on user behavior, location, and other factors. AI can use this data to predict where you will go, how you will travel, and other facts that are very valuable for marketers or for anyone who wants to track you.

 

Data moving up into glowing cloud in abstract cityscape

AI collects online data

Public Records and Databases

Government agencies maintain public records and databases that contain your personal information, such as property records, voter registrations, and court records. AI systems can use this data for fraud detection and risk assessment.

Transactions and Purchases

When you make purchases online or in-store, your transaction data is often collected and used to train AI systems to predict consumer behavior and detect fraud. That’s why you sometimes get fraud alerts if you use a credit card in an unusual place or make a large purchase.

Types of AI Systems

There are several types of AI systems, each with different training methods and data requirements:

Supervised Learning

Supervised learning trains an AI system on labeled datasets, where the correct output is provided for each input. This allows the system to learn to make predictions or classifications based on the patterns it identifies in the data.

Unsupervised Learning

Unsupervised learning trains an AI system on unlabeled data, where the system must identify patterns and relationships on its own. This can be used for tasks such as clustering similar data points or identifying anomalies.

Reinforcement Learning

Reinforcement learning uses trial and error interactions within an environment to train an AI system. The system receives rewards or punishments based on its actions and learns to optimize its behavior over time.

Deep Learning

Deep learning trains artificial neural networks on large datasets, allowing the AI system to learn complex patterns and relationships in the data.

This deep learning technology has led to breakthroughs in areas such as image and speech recognition, natural language processing, and autonomous vehicles.

 

glowing image of head outline with lines representing synapses

Deep learning uses neural networks

How to Stop Your Personal Data from Being Used to Train AI

To protect your data privacy and prevent your personal information from being used to train AI systems, you can take the following steps:

Read Privacy Policies Carefully

Before sharing your personal data with any organization, carefully read their privacy policy and terms of service. Look for information on how your data will be collected, used, and shared, and whether you have the option to opt out of certain uses.

Use Privacy Settings

Most social media platforms and online services offer privacy settings that allow you to control who can see your information and how it can be used. Take advantage of these settings to limit the amount of personal data you share publicly. See the end of this article for specific tips.

Be Selective About Sharing Personal Information

Be cautious about sharing sensitive personal information online, such as your social security number, financial details, or medical information. Only share this information when absolutely necessary and with trusted organizations.

Use Privacy-Enhancing Tools

There are a variety of tools available that can help protect your data privacy, such as virtual private networks (VPNs), encrypted messaging apps, and ad blockers. These tools can help prevent your data from being collected and used without your knowledge or consent.

 

Cyborg surveys glowing binary code on a wall

Protect personal data

How to Stop Your Web Content or Images from Being Used to Train AI

Now that generative AI such as ChatGPT, Bing, and Claude 3 are so common, there’s a whole new level to how AI might take and use your data. The problem is not just whether AI uses you personal data. Ai may also take things you create, such as how-to books, creative writing, or artwork you have made.

ChatGPT, for example, may use anything you enter into it as a question or a resource for additional training data. So, if you enter an essay you wrote in college into the chat window to teach ChatGPT your writing style, your essay may serve as a foundation for someone else’s essay later.

In some cases, AI may even use exact wording from an existing work to make a supposedly new work. So, while AI companies insist that their products are “new,” don’t be surprised if you see your work showing up in very thin disguise. Many writers and artists are now reporting seeing their own work repeated back to them by AI.

The rarer your topic, the more likely that an AI system might take your exact words or images with barely any changes.

There are options on various platforms that claim to limit whether AI can use your data or your creations. There is only one certain way to keep AI out of your content, and that is to keep your content off the internet. But this is very difficult in today’s world, where the internet is the most popular means of communication or advertising.

 

Human woman lokoks out of round lens surrounded by binary code

AI ignores copyright laws

How to Opt Out of AI Training

Whether you want your personal data to be off-limits or your ideas and art to be protected, there are a few ways you can ask companies to respect your confidentiality and your content.

Remember, there are no guarantees against AI or data theft or misuse. But here are some ways to choose more private options.

Adobe

Adobe makes it easy to keep your design work in the cloud, but that means Adobe can also have easy access to your creativity for its Sensei AI training. Here’s how to limit the access.

  1. Log in to your Adobe account.
  2. Go to the Privacy and personal data page.
  3. Switch off the toggle labeled “Content analysis.”
  4. Toggle off “Desktop app usage” to deter tracking.

Apple

To opt out of data collection and AI training on Apple devices:

  1. Go to Settings > Privacy > Analytics & Improvements
  2. Turn off “Share iPhone Analytics"
  3. Turn off “Share iCloud Analytics"
  4. Turn off “Share with App Developers"

Google

To opt out of data collection on Google services:

  1. Visit myactivity.google.com
  2. Click on “Activity controls"
  3. Turn off “Web & App Activity.”
  4. Delete any existing data you don’t want used for AI training

 

Google Gemini logo on phone

Google Gemini

You can also specifically opt out of allowing Google Gemini to use your conversations for training. But remember, there are no guarantees about whether anything you choose will actually protect your material entered into Gemini. We only have the words of these massive companies that they will not use the data. Historically, executives motivated by profits break rules.

  1. Open up Gemini in your browser.
  2. Click on Activity and choose “Turn Off.”

Grammarly

  1. Open up Account Settings.
  2. Click Data Settings.
  3. Turn Off Product Improvement and Training.

Microsoft

To opt out of data collection and AI training on Microsoft services:

  1. Sign in to your account at account.microsoft.com
  2. Go to privacy.microsoft.com
  3. Go to your Privacy dashboard
  4. Select “Manage my activity data"
  5. Delete any data you don’t want used for AI training
  6. Adjust your privacy settings on each of your devices to limit future data collection

OpenAI

To opt out of data collection and AI training by OpenAI:

  1. Email privacy@openai.com with the subject line “Opt out of data collection"
  2. Include your name and email address in the body of the message
  3. Request that your data be excluded from AI training and deleted if applicable

OpenAI logo looking burry

OpenAI ChatGPT

If you have a ChatGPT account:

  1. Go to Settings
  2. Go to Data Controls
  3. Turn off ChatGPT History & Training

If you don’t have a ChatGPT account and are just using the web browser

  1. Go to Settings.
  2. Uncheck “Improve the model for everyone.”

ChatGPT also has a form you can submit to remove your images from DALL-E. The process is not simple.

Slack

Slack may use your messages for AI training, and the only way to opt out is to have your administrator email feedback@slack.com. Your admin will have to include your organization’s URL and include the subject line “Slack Global model opt-out request.”

Your Own Sites

If you have your own website, you can use your robots.txt file to adjust whether or not AI can scrape your content.

Scraping is a techy word for gathering data that doesn’t belong to the scraper. It means taking content off the internet without permission (and without compensating you) so the scraper can use it.

AI can now scrape content on a huge scale. As recently reported on the Hard Fork podcast, both Open AI and Google scraped all the text from millions of YouTube videos to train their generative AI chatbots. This is an action that would have been regarded as content theft by previous copyright standards, especially because much of that material on YouTube was actually copyrighted.

Which brings up another point. When your content is hosted on an external platform such as YouTube, there may be far less you can do to protect it. Many YouTube creators still don’t realize that their content has been scraped and used to create new content for other people, bringing profits to the AI company but none to the original creator.

If you host a website on Squarespace, there is a user-friendly help guide to show you how to toggle off “Artificial Intelligence crawlers” for your Squarespace website.

Stay Aware and Don’t Trust All Corporate Promises About Your Data

By taking these steps and being proactive about protecting your data privacy, you can reduce the likelihood of your data being used to train AI systems without your knowledge or consent.

However, it’s important to recognize that in today’s digital age, it’s virtually impossible to completely avoid having your data collected and used in some way. The best approach is to stay informed, be selective about what you share, and take advantage of the privacy tools and security options available to you.

AI brings new risks to the internet. Be aware of potential outcomes when you interact with the digital world.

Make Sure Your Internet Provider is Trustworthy and Top-Rated

There’s no fail-safe way to protect your information once it is on the internet. But you can choose an internet provider known for solid, trustworthy business practices. The more companies you work with that do their best to protect your data, the better your odds of keeping some privacy.

Check out our information on internet providers to learn more.

 

 

 

 

Related Posts