Episode 55 — Analyze Monitoring Data Like a Defender: SIEM Parsing, Retention, Baselines, Correlation
In this episode, we’re going to build the kind of calm, practical thinking that turns a pile of security data into a clear defensive picture. New learners often imagine monitoring as a screen that simply tells you what is bad, but real monitoring data is messy, incomplete, and full of normal behavior that can look suspicious at first glance. The defender’s job is to separate signal from noise without getting tricked by coincidences or distracted by harmless spikes. That requires understanding how the data gets collected, how it gets interpreted, and how decisions are made when information is missing. We’ll focus on Security Information and Event Management (S I E M) as the place where many organizations centralize monitoring, and we’ll connect that to four essential skills: parsing events correctly, keeping data long enough to learn from it, building baselines of normal behavior, and correlating related activity into a story. If you can do these four things well, you stop reacting to alerts and start reasoning like a defender.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
Once you move beyond the idea of a single alert, you start seeing monitoring as a chain where each link can distort the truth. Data begins at sources like endpoints, servers, identity systems, network devices, and cloud services, and it travels through collectors, forwarders, and pipelines before it ever appears in a dashboard. Along the way, fields can be dropped, timestamps can be transformed, and event formats can be changed in ways that make later analysis harder. Beginners sometimes assume the S I E M sees everything, but the S I E M only sees what is sent to it, and it only understands what it can parse reliably. That means analysis starts with a reality check: what sources exist, what sources are missing, and what the data actually contains. When defenders skip that step, they may confidently interpret a pattern that is actually an artifact of incomplete collection. Good analysis is therefore grounded, because it respects the limitations of evidence.
Parsing is the step where raw events become structured information, and it is far more important than it sounds. A log line that is just text is hard to search, hard to aggregate, and hard to correlate, but a parsed event can be treated like a record with clear fields such as user, host, source address, action, and outcome. When parsing is wrong, analysis becomes unreliable, because you might think an event happened on one system when it happened on another, or you might think a user performed an action when the field actually refers to something else. Parsing issues also create false confidence, because the dashboard still shows charts, but the charts are built on misread data. Beginners often assume that if data is present, it is usable, yet unusable data is very common in monitoring. Defenders care about parsing because it directly affects what can be searched, what can be counted, and what can be linked together. If you want your conclusions to be defensible, you need your parsing to be dependable.
A practical way to understand parsing is to imagine sorting a giant pile of mail where some envelopes are addressed clearly and others are smudged or misprinted. If the sorter misreads addresses, letters go to the wrong place, and later you cannot trust the delivery report. In monitoring, misparsed identity fields can cause the same kind of confusion, especially when usernames appear in different formats across systems. Misparsed timestamps can be even worse, because time is how you reconstruct a sequence, and sequence is how you distinguish cause from coincidence. Another common parsing problem is collapsing multiple values into one field or splitting one value into multiple fields, which can make searches miss what you think they should match. Defenders learn to spot parsing trouble by noticing patterns that don’t make sense, like impossible locations, strange device names, or counts that shift wildly when one data source changes format. When you treat parsing as a first-class security control rather than a technical footnote, your investigations become faster and far less error-prone.
The next idea, retention, is about how long you keep monitoring data and what kind of detail you keep during that time. Retention matters because many attacks are not discovered immediately, and the earliest signs may be weeks or months before anyone realizes something is wrong. If you only keep a short window of logs, you may see the end of an incident but miss the beginning, which makes it harder to understand what was compromised and what should be fixed. Beginners sometimes assume more retention is always better, but retention choices involve cost, privacy considerations, and operational capacity. A realistic retention strategy balances the need to investigate and learn against the limits of storage and search performance. It also considers what data is most valuable, because keeping high-detail logs for the right sources can be more useful than keeping low-detail logs for everything. Defenders care about retention because it determines whether they can tell a complete story under pressure, and whether they can validate claims about when an attacker first arrived.
Retention also affects your ability to build baselines, because baselines depend on enough history to understand what normal looks like. If you only have a week of data, you might mistake a normal monthly process for an anomaly simply because you haven’t seen it yet. If you only have business-hours data, you might misinterpret overnight maintenance behavior when it appears later. Good retention practices therefore support both detection and interpretation, because they provide the context that makes an unusual event truly unusual. Another subtle point is that retention includes integrity and availability of logs, meaning your retained data must be protected from tampering and must remain searchable when you need it. If attackers can delete or alter key events, retention becomes an illusion rather than a safeguard. Defenders often prefer central storage and controlled access for critical logs so that a compromised endpoint cannot quietly erase evidence. When you think like a defender, retention is not just about keeping data, it is about keeping trustworthy data long enough to matter.
Baselines are the next major concept, and they are the defender’s answer to a simple question: what is normal here. A baseline is not a single number; it is a set of expectations about typical behavior, such as normal login patterns, typical data transfer volumes, typical access paths, and typical administrative activity. Beginners sometimes assume baseline means average, but average can hide important details, because real environments have cycles, bursts, and seasonal patterns. A good baseline recognizes that Monday morning looks different from Saturday night and that end-of-month processing may look different from mid-month. Baselines can also vary by role, because a help desk technician may log into many systems, while a finance user may log into only a few. The defensive value of a baseline is that it makes anomalies meaningful, because you can point to how behavior differs from established norms rather than relying on intuition. When defenders lack baselines, they are forced to treat every alert as equally urgent, which leads to fatigue and missed real threats.
Building baselines is also a lesson in humility, because it forces you to confront how much strange but legitimate behavior exists. Organizations run scans, backups, software updates, inventory checks, and maintenance scripts that can resemble attacker behavior if you only look at surface patterns. Baselines help you learn those normal processes so you can stop chasing them as threats, while still recognizing when something mimics them in an unusual way. For example, it may be normal for a system to make many connections during a nightly backup, but it may be abnormal for the same pattern to happen at midday from a different host. A baseline mindset also helps you ask better questions, such as whether an unusual event is new for the environment or simply new for you. Beginners often think defenders memorize known bad indicators, but defenders spend a lot of time learning known good behavior so they can recognize deviations. In that sense, baselines are not about ignoring activity, they are about understanding activity deeply enough to interpret it accurately.
Correlation is where monitoring turns from individual events into a narrative, and it is one of the most important skills for thinking like a defender. A single event rarely proves an attack, because many events can have benign explanations, but a sequence of related events can reveal intent and progression. Correlation means linking events by shared attributes such as user, device, source address, destination, time proximity, and action type, so you can see patterns that one log line cannot show. Beginners often expect correlation to be automatic, but good correlation depends on good parsing, consistent identifiers, and enough retained history to connect dots. Correlation also depends on choosing meaningful relationships, because linking unrelated events can create false stories that waste time. The defender’s goal is to find connections that make the explanation more plausible, not connections that simply make the chart look busy. When done well, correlation highlights attacker behavior like credential misuse, lateral movement, privilege escalation, and data access patterns, even when each step alone might look ordinary.
A helpful way to think about correlation is that attackers usually have to perform multiple actions to accomplish a goal, and those actions leave multiple traces. A suspicious login followed by an unusual privilege change followed by access to a sensitive resource is a stronger story than any one of those events alone. Correlation helps you see that story and also helps you see what is missing, such as a gap where you expected a process launch but do not have endpoint telemetry. That missing evidence becomes a clue about coverage gaps, not a reason to guess. Another important correlation idea is timing, because attackers often act quickly in bursts, and defenders can use time windows to connect related actions. At the same time, sophisticated attackers may slow down to blend in, which is why longer-term correlation across days or weeks can matter. Correlation is therefore not just a technical feature of a S I E M; it is a defensive reasoning method that links behavior into hypotheses you can test.
All of this becomes much easier when you treat the S I E M as a lens rather than a judge. The S I E M can collect, index, and search data, and it can apply rules that raise alerts, but it does not automatically understand business context or intent. Beginners sometimes assume that an alert labeled critical is truly critical, but severity labels are often based on generic rules that may not match your environment’s reality. A defender uses the S I E M to answer focused questions, such as whether a user’s behavior changed, whether a host started communicating in a new way, or whether a pattern matches a known attack technique. This is where structured thinking helps: you start with the claim, you identify what evidence would support it, and you search for that evidence across sources. If the evidence is missing, you consider whether the event never happened or whether the telemetry is incomplete. When you approach the S I E M this way, you avoid the trap of trusting dashboards blindly and instead use them as tools for disciplined inquiry.
It’s also important to recognize that not all data sources contribute equally to detection and investigation. Identity events can tell you who attempted access and whether authentication succeeded, which is foundational for many attack stories. Endpoint and server events can tell you what executed, what changed, and what persisted, which helps confirm whether access became code execution. Network observations can tell you where systems communicated, which helps reveal command-and-control behavior and unusual data movement. Application logs can tell you what actions were taken inside a business system, which is crucial for understanding impact. Beginners sometimes focus on whichever logs are easiest to collect, but defenders prioritize logs that answer the most important questions. They also prioritize consistency, because one high-quality source that is reliably parsed can outperform many low-quality sources that are sporadic and confusing. As your monitoring program grows, the skill is not collecting more at any cost, but collecting the right mix that supports correlation and reduces blind spots.
As you get more comfortable, you’ll find that good analysis involves constantly checking for alternative explanations. A spike in failed logins could be brute force activity, but it could also be a misconfigured application repeatedly trying an outdated password. A new outbound connection pattern could be malware, but it could also be a legitimate software update service that recently changed endpoints. Defenders avoid being fooled by asking what else could cause the same pattern and then looking for supporting details that differentiate the possibilities. This is where baselines and correlation work together, because baselines tell you whether the pattern is truly unusual, and correlation tells you whether the pattern is part of a broader sequence of suspicious behavior. Beginners sometimes fear making the wrong call, but the goal is not to be instantly certain, it is to be methodical and evidence-driven. A careful analyst communicates confidence levels honestly, identifies what would raise confidence, and avoids actions that cause unnecessary disruption when evidence is weak. That is how you stay both effective and credible.
To conclude, analyzing monitoring data like a defender means treating logs as evidence and treating your own conclusions as hypotheses that must be supported. S I E M gives you a central place to search and connect events, but it only becomes powerful when parsing is accurate enough to trust, retention is long and reliable enough to provide context, baselines are mature enough to define normal behavior, and correlation is thoughtful enough to reveal real sequences rather than coincidences. Each of these ideas reinforces the others, because poor parsing breaks correlation, weak retention weakens baselines, and missing baselines turn every anomaly into panic. The defender’s mindset is not about memorizing alerts, but about understanding how data is created, how it can be misleading, and how to assemble it into a coherent story of what likely happened. When you practice this way of thinking, monitoring stops being a noisy firehose and starts becoming a disciplined process for seeing reality clearly. That clarity is what lets you respond faster, avoid false alarms, and spot the quiet attacks that rely on confusion.