How to Build a Threat Hunting Capability in AWS
An infrastructure has been built, a patching plan has been developed, firewalls have been locked down and monitored, assets are being managed, and the security operations center is responding to alerts from security sensors. After basic security hygiene has been implemented, the threat hunting team needs to evaluate infrastructure for threats and undetected breaches. Because infrastructures are complex and have many moving parts, teams need a plan to manage data from all operating systems, networking tools, and custom applications. In addition, they need to know which threats to look for, how to prioritize them, and where to begin hunting for them. Cloud environments bring their own set of complexity and peculiarities to threat hunting. In elastic environments, customers may find that systems that had a threat on Friday are terminated on Sunday. Using cloud services likely means relying on platform-specific data. Additionally, for the cloud, teams must consider the management plan, as well as web apps, virtual machines, and databases.
Approach of Threat Hunting
Threat Hunting is more of an art than a science since various organizations can approach and implement it differently while still being correct. Organizations build and operate their infrastructure in different ways, and their teams are made up of different skill sets, talents, and goals. The goal of threat hunting is to approach security from a different perspective. The security operations center (SOC) receives alerts from various security products, including antivirus scans, email security solutions, vulnerability scans, firewall alerts, IDS/IPS, and login failures. The SOC creates tickets for the server administration teams if a scan finds a production server vulnerable to a critical alert. This interaction is driven by a security product alerting on a strong indicator. The workload must thus be patched.
An attacker may target our main web application because it is exposed to the internet. “Let’s see if we can determine that.” Or perhaps a weak indicator raises suspicions. There have been multiple failed SQL injection attacks in a row. There is a slowdown in the performance of the web server. There are a number of scenarios between these two that can be considered threat hunting. In the event of a strong indicator from a security service, there is a process in place to address the issue. Threat hunting involves looking for anomalous behaviors without strong indicators. It is likely that the outcome will be unknown, the investigation will be murky, and the process will be research intensive. In order to maximize the effectiveness of the team, it is crucial to build a threat hunting process and environment.
1. Create a Hypothesis:
In order to set priorities, determine the organization’s most valuable and at-risk areas.
2. Investigate with Tools and Techniques:
In order to accomplish this, data must be collected, understood, analyzed, and viewed comprehensively. It is also necessary for threat hunters to pivot through different types of logs and explore unstructured or partially structured data. AWS provides several tools to assist in this process.
Logs can be easily pulled from any system into Amazon Cloud Watch. Here is how that pathway looks:
3. Uncover Patterns and TTPs:
It is important that the team be involved in the threat modeling process, helping the architecture and operations teams to identify and evaluate the cloud infrastructure that needs to be secured and evaluated. By improving monitoring, reducing chaotic deployments, and segmenting infrastructure, threat hunting can be made easier without sacrificing operational capabilities.
4. Inform and enrich with analytics:
It is important for the threat hunting team to strike the right balance between collecting as much data as possible. The AWS Cloud Trail logs provide information on API calls, VPC connections, S3 access logs, and more, such as data connections in and out of a VPC. The team will then concentrate on locating the information gaps and how to fill them using the attacker’s approaches.
Tools for Analysis
A threat hunter can use a wide range of tools to analyze complex datasets from multiple sources, ranging from scripts that parse raw data to a full SIEM that provides ad hoc searching, reporting, and investigation capabilities. In most cases, the decision is made based on the complexity of the setup, the cost, and the need to scale as the team grows. In AWS, several services can be utilized and chained together with analytics and scripts.
Analyzing Logs Directly
Monitoring an AWS environment is made easy with Amazon Cloud Watch, which provides basic metrics, alarms, and dashboards. AWS Cloud Trail and Amazon Cloud Watch can be used together to interact directly with collected data. There are several ways to export Amazon Cloud Watch logs, collected from custom applications, to Amazon S3, AWS Lambda, or Amazon Elasticsearch.
AWS offers another service called Amazon Athena for running SQL queries against Amazon S3 buckets. Its customers create virtual tables for organizing and formatting the log data underneath the bucket objects. The process of ensuring data is formatted and managed takes time. A managed service from Amazon called Guard Duty is evaluating a growing number of findings that detect adversary behavior and alert the customer. By analyzing Amazon VPC Flow Logs, Amazon Guard Duty evaluates potential behaviors. Amazon Lambda, Amazon Kinesis, Amazon S3, Amazon Athena, and Amazon Quick Sight can be used to create a similar real-time VPC flow log analysis engine.
SIEMs in the Cloud
As a threat hunting team develops a corpus of analytics it wants to run repeatedly, or as its investigation, monitoring, and report needs increase, a full SIEM may be of interest. There are several cloud-specific services as well as traditional on-premises SIEMs that work with cloud infrastructure. Threat hunting teams should focus on developing and managing tactical SIEMs, which may be different from those used by security operations centers. The tactical SIEM is likely to contain unstructured data, have a shorter retention policy than the SOC’s SIEM, and have the capability to identify how the infrastructure looked recently. The cloud should be cost-effective, with pay-per-use pricing, by implementing good data management strategies. In general, free or open source solutions require more time and expertise to set up and maintain, but they are more customizable and cost little or nothing.
A commercial solution may cost more, but it may offer better support, easier access to purpose-built connectors, and more reporting options. The open source community’s favorite, Elasticsearch, boasts a significant user base and offers plug-ins for importing, translating, and displaying data with Kibana. AWS offers a managed Amazon Elasticsearch Service that makes it easy to set up and manage a search engine. In addition to Elasticsearch, Elastic has released an app called Elastic SIEM, which focuses on security operations. In addition, other products, such as those from Sumo Logic and Splunk, integrate directly with AWS and provide richer and more powerful analytics. The threat hunting team will discover repeated steps, analytics, or actions after the tactical SIEM is set up, the data is gathered, translated, and enriched; and mechanisms for analytics and reporting are in place. Security Orchestration, Automation and Response (SOAR) is a service that integrates with SIEM.
Soaring with SOAR
Threat hunting involves proactive analysis of data to detect anomalous behavior undetected by security products. The threat hunting team may develop repeatable analytics, enrichment, or data gathering steps as its analytics become more sophisticated. It can be automated if it is repeatable and articulate. In addition to leveraging the SIEM’s data storage and enrichment, SOARs understand the basics of infrastructure integration and are capable of automating playbooks. In the case of a web application, if there are several failed SQL injection attempts, the final attempt may be the last failure before success. At that time, it would be interesting to investigate the process of information flow from that host. By using a SOAR, it is possible to identify that ultimate SQL injection failure, tag it, and also tag the process log information. In the next step of the playbook, the logs could be moved to a separate Amazon S3 bucket for easier analysis. A malware signature API could then be used to validate the process logs to determine whether the process is well known. By gathering potential logs and automating enrichment processes when necessary, threat hunters can avoid tedious and repetitive tasks. Also, it could speed up the triage process.
A SIEM with a SOAR could significantly speed up analysis. In addition to the playbook, it’s possible to use data pushed to SIEMs and SOARs, such as SQL injection detection logs from WAFs, to initiate a response. By using a host-based tool, such as OS Query, the SIEM could reach out to the suspect web server in near real-time and pull the process list, rather than pulling it hourly. This automated response action reduces the amount of passive data that needs to be managed and makes it easier to correlate the process logs returned with suspicious SQL injection attacks. SIEM/SOAR can detect the first-time read of an Amazon S3 bucket within the Amazon EC2 use case. Using AWS services such as Amazon Inspector and AWS Systems Manager, the SOAR playbook interacts directly with EC2 to gather fresh process information and launch Amazon Inspector. The system then collects all these reports and provides them to security analysts in a single artifact bucket, creating a high-priority message in the corporate chat system or sending an SMS message to on-call personnel. A sophisticated SOAR playbook can also be built using Palo Alto’s Demisto and Splunk’s Phantom, which can detect cascading anomaly triggers that can trigger automated remediation.
SOAR Playbook Use Case
An attacker performs several SQL injection attacks against a particular EC2. SOAR starts a listing process and tags all logs from that EC2 with a unique identifier. Logs with the unique identifier indicate a failed Amazon S3 bucket listing attempt. An automated system knows the bucket, so an abnormal listing is unlikely. According to the SOAR, this failed bucket listing occurred on an EC2 that was being triaged. Since the organization uses auto-scaling, the SOAR notifies the auto-scaling system to deregister the EC2 (i.e., pull the EC2 out of service but keep it running). After the deregistration is complete, the SOAR playbook removes all security groups except triage, effectively isolating EC2 from all other systems. Next, the SOAR takes a snapshot of the EC2 memory, dumps the memory, and stops it. As soon as the security team is ready to investigate, all the data is gathered and prepared in an Amazon S3 bucket.
We are still in the early stages of threat hunting, particularly in cloud environments. Server less, event-driven architectures that rely on native cloud services are replacing traditional server-based infrastructure. This new infrastructure landscape will require threat hunters to adapt their processes, tools, and techniques to identify and neutralize threats. The purpose of threat hunting is to discover advanced attacker techniques that have evaded detection by deployed security products. A continuous learning process is essential to the threat hunting process. It requires an understanding of attacker techniques and your organization’s attack surface. It is crucial that the right data is collected, enriched, and available to the tools the threat hunting team uses to find anomalies in an ever-changing infrastructure. The threat hunting process is constantly evolving and adapting to new learnings, increasing experience, and changing threat landscapes.
Author – Likhit PG
From MSOC team