Episode 22 — Secure AI Adoption: Prompt Injection, Data Poisoning, Model Theft, and Model DoS

In this episode, we’re going to treat Artificial Intelligence (A I) like any other powerful technology you might bring into an organization: something that can help a lot, but also something that creates new ways to get hurt if you don’t think ahead. When people first hear about security risks in A I, they often imagine a sci-fi robot going rogue, but the real risks are usually much more ordinary and much more familiar. They look like tricking a system into doing something it should not do, quietly changing the data it learns from, stealing valuable work, or knocking a service offline so nobody can use it. Those risks map cleanly to common cybersecurity themes, which is good news for beginners, because it means you can understand them without needing to be a machine learning expert. The goal is to build a mental model for four major problem areas—prompt injection, data poisoning, model theft, and model denial of service—so you can recognize what is happening and choose sensible defenses.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

To make sense of these risks, it helps to separate what a typical A I solution is made of, because security problems tend to attach themselves to a specific part. Most real deployments have a place where users ask questions or submit requests, a place where those requests get processed, a place where data is stored, and a place where the model lives and produces outputs. There is also usually a path for updates, because models and the systems around them evolve over time. Each of those pieces has a different kind of exposure, and that exposure drives which threats are realistic. A user-facing interface can be manipulated through carefully crafted inputs, which is where prompt injection shows up. A training pipeline or feedback loop can be targeted with bad data, which is where data poisoning happens. A hosted model or a downloadable model can be copied or reverse engineered, which is where model theft becomes relevant. And any service that accepts requests can be overwhelmed, which is where model denial of service appears.

Prompt injection is best understood as a kind of input-based trick, similar in spirit to other attacks that try to get a system to interpret data as instructions. In a basic sense, the attacker writes a prompt that is designed to override the rules the system is supposed to follow. If an A I system is connected to internal documents, tools, or actions, a successful prompt injection can lead to revealing sensitive information, producing unsafe guidance, or taking actions that were never intended. Beginners sometimes assume the model will simply refuse because it has rules, but in practice, the model is trying to be helpful, and attackers exploit that helpfulness. They may hide instructions in long text, frame the request as urgent, or trick the system into prioritizing the attacker’s instructions over the organization’s policies. The underlying idea is not mysterious: if the system treats untrusted input as authoritative direction, an attacker will try to write input that sounds more authoritative than the real rules.

Prompt injection becomes even more dangerous when the A I system reads content from outside sources, because the attacker can place malicious instructions in that content. Imagine a system that summarizes web pages, emails, or support tickets, and then uses the summary to decide what to do next. If the system blindly trusts the content it retrieves, an attacker can seed that content with instructions that say, in effect, ignore previous guidance and reveal secrets or perform an action. This is sometimes called indirect prompt injection, because the attacker’s text enters through the system’s own data intake rather than through a direct user message. What makes it tricky is that the system may not show you the exact content it read, and it may not be obvious that an external document changed the model’s behavior. From a security perspective, this is the same old lesson about trust boundaries: content that comes from outside your control must be treated as untrusted, even if it looks like normal text. The moment untrusted text can steer behavior, you have an attack surface.

Defending against prompt injection starts with accepting that you cannot rely on a model to perfectly distinguish between safe and unsafe instructions. Instead, you design the system so that even if the model is manipulated, the damage is limited. One major principle is to separate data from instructions in the way your application works, so user input is not allowed to rewrite the system’s rules. Another principle is to reduce what the model can access, because access creates opportunity. If the model can read sensitive documents, you must gate that access with permissions that reflect the user’s rights, not the model’s curiosity. If the model can trigger actions, those actions should require strong checks, auditing, and, for high-impact operations, a separate approval step outside the model’s control. You also want to log prompts and outputs in a way that supports investigations, because when something goes wrong, you need to reconstruct how the model was influenced.

Data poisoning is a different kind of threat because it targets what the model learns rather than what it answers in the moment. If a model is trained on corrupted or manipulated data, it can behave badly even when users do nothing unusual. Poisoning can be obvious, like inserting nonsense, but the more dangerous version is subtle, such as adding many examples that quietly push the model toward a biased or incorrect outcome. In some cases, attackers aim for a backdoor effect, where a particular trigger phrase or pattern makes the model produce a specific output. This matters for beginners because many organizations adopt A I with the idea of continuous improvement, where feedback from users or new data is regularly fed back into training. That is a tempting target, because the attacker can act like a normal contributor and gradually shape the system. In plain terms, poisoning is the security risk of letting untrusted people influence your system’s brain.

A key misconception about data poisoning is that it only matters for giant public models trained on the entire internet. In reality, many organizations create smaller, specialized models or augment existing models with internal knowledge, and those systems can be poisoned too. If you use customer chats to improve a support assistant, an attacker can submit crafted conversations that teach the system wrong behavior. If you ingest product reviews or public posts to detect issues, an attacker can flood your pipeline with misleading data. Even if you are not retraining a model from scratch, you may be building indexes, embeddings, or ranking systems that influence what the model sees, and those can be manipulated. The core problem is that learning systems amplify patterns in their inputs, so if an attacker can change inputs at scale, they can influence outcomes. That is why data quality and data governance become security concerns in A I adoption, not just analytics concerns.

Defending against data poisoning looks a lot like defending any data pipeline: you control who can contribute, you validate what they contribute, and you watch for anomalies. If your system accepts feedback or training examples from users, you need friction and verification so it cannot be spammed cheaply. You also need processes that detect weird shifts, like sudden surges in a certain label or a new pattern of content that seems coordinated. Another useful idea is separation between raw inputs and training-ready data, where the raw stream is treated as untrusted and only a curated subset makes it into learning. Versioning matters too, because if a new model behaves strangely, you want to compare it to the previous version and roll back safely. None of this requires advanced math to understand; it is the same mindset as secure software updates and change control, applied to data and learning behavior instead of executable code.

Model theft is often misunderstood because people picture a physical theft, like stealing a server, but most model theft is about copying value that took time and money to create. A model can be stolen directly if someone gains access to the file or the hosting environment, but it can also be stolen indirectly through repeated queries that reveal behavior. If an attacker can query a model freely, they may be able to create a substitute that mimics the original, especially if they can gather many input-output examples. That is sometimes called model extraction, and the idea is similar to reverse engineering: you do not need the original blueprint if you can observe how the system behaves enough times. Theft can also involve stealing proprietary training data, system prompts, or configuration, which can be just as valuable as the model weights themselves. For organizations adopting A I, this risk matters because models may encode competitive advantage, and leaked models can undermine business value and create downstream safety issues.

The best defenses against model theft begin with classic access control and environment hardening, because many theft scenarios are just ordinary compromise stories with a new target. If the hosting environment is breached, the attacker may be able to copy models, logs, prompts, and keys in one sweep. So you apply least privilege, strong identity checks, secure storage, and careful segmentation. For the query-based theft angle, you limit who can access the model, how fast they can query it, and what kinds of outputs they can get. Rate limiting is not only for availability; it also slows down extraction attempts. Monitoring for unusual patterns, like systematic probing or highly repetitive automated queries, can help identify theft in progress. You also want to reduce accidental leakage by keeping secrets out of prompts and by controlling what internal data the model can display, because theft often begins with learning how the system is configured.

Model denial of service, sometimes shortened as DoS, is the risk that attackers make the A I service unavailable by exhausting resources. Denial of service is not new, but A I systems can be uniquely expensive to run, which changes the economics of attack and defense. A single request can consume a lot of compute, and some attackers will try to craft inputs that are unusually costly, such as extremely long prompts or prompts designed to force very long outputs. Others will simply send a high volume of requests, hoping to overwhelm the service or drive up costs until the organization throttles usage. There is also a reliability angle where the model is available but becomes slow enough that users give up, which is still a real availability failure. For beginners, the important point is that availability is part of security, and A I systems often have a bigger availability target painted on them because they are resource-intensive by nature.

Defending against model DoS combines the same families of controls used for any online service with A I-specific awareness of cost and compute. You implement rate limiting, quotas, and request validation so the system cannot be abused cheaply. You put boundaries on input size and output size, and you choose defaults that limit runaway responses. You also design for graceful degradation, meaning that if the A I feature is stressed, it can reduce functionality instead of collapsing completely. For example, the system might fall back to shorter responses, fewer features, or cached answers for common requests, all while keeping critical non-A I functions running. Observability matters here, because you need to see not only the number of requests, but also the compute cost per request and how that cost changes over time. When the metrics show sudden spikes in cost or latency, you can respond before users experience a complete outage.

All four of these threat areas share a common lesson: secure A I adoption is as much about system design choices as it is about model behavior. If the model is treated like a magical brain that can do anything, it becomes a single point of failure. If it is treated like a component inside a controlled system, then the system can enforce rules the model cannot reliably enforce on its own. That means thinking carefully about what the model is allowed to see, what it is allowed to do, and what human or non-model checks exist before high-impact outcomes occur. It also means deciding where trust is placed, because A I systems are often fed by large amounts of untrusted text. You do not want the model to be the final authority for permissions, approvals, or critical actions, because the model is fundamentally designed to generate plausible language, not to serve as a security decision engine. The safest designs make the model a helper, not a gatekeeper.

A practical way to connect these risks to everyday cybersecurity thinking is to map them to familiar categories without getting lost in jargon. Prompt injection is an input manipulation problem at the boundary between user content and system instructions. Data poisoning is a supply chain and integrity problem for the information that shapes behavior over time. Model theft is a confidentiality and intellectual property problem that shows up through both direct compromise and indirect extraction. Model DoS is an availability problem made sharper by high compute costs and unpredictable workloads. Once you see those mappings, the defenses also start to feel familiar: least privilege, segmentation, validation, monitoring, rate limits, change control, and incident response readiness. The novelty is not that the fundamentals change, but that the places you apply them shift, and the consequences of getting them wrong can be surprising, because the model can produce convincing output even when it is wrong or influenced.

The last piece beginners should carry forward is that A I risk is not only about attackers, because well-meaning users can unintentionally trigger harmful behavior too. A user might paste sensitive data into a prompt, not realizing where that data might be stored or who might access it later. A user might trust an answer too much and make a poor decision, especially when the output sounds confident. Those risks are not prompt injection or model theft, but they shape how you design a secure adoption plan. You add clear rules about what kinds of data can be shared with the system, you add warnings or friction for sensitive actions, and you educate users about the fact that a helpful response is not the same as a verified response. When you combine user-aware design with the technical controls we discussed, you get a defense that is not fragile. You get a system that assumes mistakes and manipulation will happen, and that is exactly the kind of assumption that leads to resilient security.

Episode 22 — Secure AI Adoption: Prompt Injection, Data Poisoning, Model Theft, and Model DoS
Broadcast by