Threat Hunting with Pyshark: Using Open Source Python Libraries to Automate Threat Hunting

Automate Your Network Threat Hunting: A Practical Guide to PyShark

Ever feel like you’re drowning in network traffic, trying to pinpoint that one suspicious packet? Manually sifting through gigabytes of data with Wireshark is powerful, but it’s not always the most efficient way to hunt for threats, especially when you need to do it repeatedly or at scale. What if you could bring the power of Python to your network analysis?

That’s where PyShark comes in. If you’re looking to supercharge your threat hunting program, this handy open-source tool might just become your new best friend. Today, we’ll dive into how you can use PyShark to automate network-based threat hunting, making your investigations faster and more effective.

What is PyShark?

Simply put, PyShark is a Python wrapper for tshark. For those unfamiliar, tshark is the command-line interface (CLI) counterpart to the ever-popular Wireshark. PyShark cleverly uses tshark’s XML export features under the hood, allowing you to programmatically access packet data within your Python scripts. If you’ve used tshark’s ELK output mode, you’ll appreciate how PyShark taps into similar detailed information, but with the flexibility of Python.

One of the coolest things about PyShark is its versatility. You can use it for:

  • Live Traffic Analysis: Write Python scripts to listen to network interfaces in real-time.
  • Captured Data Analysis: Process pre-existing packet capture (PCAP) files.

While there might have been minor bugs in the past with specific filters (like Berkeley Packet Filters, or BPF) on pre-captured data, for the most part, PyShark offers functionally equivalent capabilities whether you’re looking at live streams or static files.

Speaking of filters, PyShark supports both capture filters (BPF) and display filters. This is crucial. On a busy network, capture filters let you tell Wireshark (and thus tshark/PyShark) to ignore irrelevant packets before they’re even processed, saving precious system resources. Display filters, on the other hand, work on the data already captured, much like when you type a filter into the Wireshark GUI’s filter bar. That same powerful filtering logic is accessible right within your Python code via PyShark.

You can find the official PyShark repository and excellent documentation on GitHub to explore further.

Getting PyShark Up and Running

Installation is a breeze, thanks to pip. Just open your terminal and type:

 
pip install pyshark
# Or if you're using Python 3 specifically
pip3 install pyshark

Important Prerequisite: Tshark must already be installed on your system for PyShark to work. PyShark will typically find tshark if it’s in your system’s environment path. However, you also have the option to point PyShark to a specific tshark executable if needed.

Sniffing Live Traffic with PyShark

Ready to see some live action? Capturing live network data with PyShark is surprisingly straightforward. Here’s a basic example:

import pyshark

# Start a live capture on the 'eth0' interface
# You can also add BPF or display filters here
capture = pyshark.LiveCapture(interface='eth0')

# Sniff for a specific number of packets (e.g., 5 packets)
for packet in capture.sniff_continuously(packet_count=5):
    print(f"Just captured a packet: {packet}")
    # Add your packet processing logic here

In this snippet, LiveCapture sets up the listening interface. The sniff_continuously() method then iterates through packets as they arrive. You can specify packet_count to limit how many packets are processed, or omit it to sniff indefinitely (until you stop the script). This is also where you’d pass your bpf_filter or display_filter arguments to LiveCapture.

Analyzing Captured Data (PCAPs) with PyShark

Got a PCAP file you need to dissect? PyShark handles that with similar ease. This is incredibly useful for offline analysis, incident response, or testing your hunting scripts.

import pyshark

# Open a PCAP file
capture = pyshark.FileCapture('my_capture.pcap')

# Access a specific packet (e.g., the first packet)
print(f"The first packet in the file is: {capture[0]}")

# Iterate through all packets in the PCAP
# Be mindful of memory with very large PCAP files!
for packet_num, packet in enumerate(capture):
    print(f"Processing packet #{packet_num}")
    # Your analysis logic here
    if packet_num > 10000 and some_memory_check_condition: # Example to prevent overconsumption
        print("Reached packet limit for this iteration, consider refining filters or iterative processing.")
        break # Or use capture.apply_on_packets(your_function, timeout=100) for large files

capture.close() # Good practice to close the file handle
 

When working with FileCapture, you can directly access packets by their index (e.g., capture[0]). Looping through the capture object lets you process each packet. A word of caution: if you’re dealing with massive PCAP files, loading everything into memory at once can be an issue. Consider processing packets iteratively or using more specific display filters within FileCapture to limit the initial data loaded.

Diving into Packet Data: Accessing Fields

So, you’ve got your packets loaded, either live or from a file. How do you get to the juicy bits of information inside?

Each packet object in PyShark has a layers attribute. This is a list containing all the protocol layers that tshark identified in that packet (e.g., Ethernet, IP, TCP, UDP, HTTP).

 
# Assuming 'packet' is a packet object from PyShark

print(f"Layers in this packet: {packet.layers}")

# Accessing specific fields (example for an IP packet)
if 'IP' in packet: # Check if the IP layer exists
    source_ip = packet.ip.src
    destination_ip = packet.ip.dst
    print(f"Source IP: {source_ip}, Destination IP: {destination_ip}")

if 'ETH' in packet: # Check for Ethernet layer
    source_mac = packet.eth.src
    destination_mac = packet.eth.dst
    print(f"Source MAC: {source_mac}, Destination MAC: {destination_mac}")

You can access individual fields using dot notation, like packet.ip.src or packet.eth.dst. These layer names (e.g., ip, eth, tcp, http) generally follow Wireshark’s conventions.

Crucial Tip for Robust Scripting: Always include error handling! If a packet doesn’t contain a specific layer or field you’re trying to access (e.g., trying to get packet.http.request_uri from a non-HTTP packet), your script will throw an error. Wrap your field access in try-except blocks or check for the layer’s existence (e.g., if 'HTTP' in packet:) before trying to access its attributes.

Threat Hunting in Action: A Real-World PyShark Example

Let’s put this all together with a concrete threat hunting scenario. Imagine we have a large PCAP file – say, the 209MB SANS SICS Geek Lounge PCAP (available from NetReSec or GitHub), which contains about 2.27 million packets. Our goal is to find evidence of nmap scanning.

  1. Load PCAP with a Display Filter: We’re interested in HTTP traffic, as some nmap probes use HTTP.
    capture = pyshark.FileCapture(
        'sics_geek_lounge.pcap',
        display_filter='http'
    )
    
     
  2. Extract and Count URIs: We’ll iterate through the HTTP packets and look at the requested URIs.
    from collections import Counter
    uri_counts = Counter()
    
    for packet in capture:
        try:
            if hasattr(packet, 'http') and hasattr(packet.http, 'request_uri'):
                uri = packet.http.request_uri
                uri_counts[uri] += 1
        except AttributeError:
            # This can happen if a field is unexpectedly not present
            continue # Skip to the next packet
    
    capture.close()
    
    # Print the most common URIs
    for uri, count in uri_counts.most_common(10):
        print(f"{uri}: {count}")
    
     
  3. Identify the Indicator: Looking at the output, you might see a URI like nice-test.php appearing frequently (e.g., 88 times in the example transcript). This specific URI is often an indicator of nmap’s service detection phase.
  4. Pivot and Investigate: Now that we have a suspicious URI, we can re-process the PCAP (or a subset) to find which source IPs requested this URI and which destinations they targeted.

    Python

     
    # Assume 'suspicious_uri' is "/nice-test.php"
    # Re-open or refine the capture
    scan_details = {} # To store {source_ip: [destination_ips]}
    
    # For a more targeted approach, you might construct a more specific display filter
    # Example: f'http.request.uri == "{suspicious_uri}"'
    refined_capture = pyshark.FileCapture(
        'sics_geek_lounge.pcap',
        display_filter=f'http.request.uri == "{suspicious_uri}"'
    )
    
    for packet in refined_capture:
        try:
            if 'IP' in packet: # Ensure IP layer exists
                source = packet.ip.src
                destination = packet.ip.dst
                if source not in scan_details:
                    scan_details[source] = set()
                scan_details[source].add(destination)
        except AttributeError:
            continue
    
    refined_capture.close()
    
    for scanner_ip, targets in scan_details.items():
        print(f"Scanner IP: {scanner_ip} targeted: {', '.join(list(targets))}")
    
     

    This would reveal the source IP performing the scan (e.g., 192.168.2.137 in the transcript’s example) and the various internal IPs it probed.

Amazingly, a script like this can churn through millions of packets in under a minute. While Python might not be the absolute fastest language for raw execution speed compared to compiled languages, its development speed and powerful libraries make it an excellent choice for these tasks.

Beyond the Basics: Expanding Your PyShark Hunts

This nmap detection example is just scratching the surface. You can adapt these techniques for all sorts of threat hunting:

  • Searching for Known IOCs: Look for specific IP addresses, domain names, user agents, or even payload patterns associated with malware like EternalBlue (e.g., checking SMBv1 traffic).
  • Decoding and Decrypting Data: If attackers are using simple encoding like Base64 in packet payloads, or if you somehow have decryption keys, you can implement the decoding/decryption logic directly in Python.
  • Behavioral Analysis: Build more complex logic to detect sequences of actions or anomalous protocol usage.
  • Integration and Reporting: Feed your findings into other security tools, generate JSON reports, or trigger alerts.

Why PyShark Should Be in Your Threat Hunting Toolkit

PyShark empowers you to automate network analysis and threat hunting in ways that manual inspection simply can’t match. By combining the packet-parsing prowess of tshark with the scripting flexibility of Python, you can:

  • Process vast amounts of network data efficiently.
  • Create custom, reusable hunting scripts tailored to specific threats.
  • Integrate network analysis into broader security automation workflows.

It’s a fantastic tool that we find incredibly useful, and hopefully, this guide gives you a solid starting point to incorporate it into your own security operations.

See how Insane Cyber transforms security

Our products are designed to work with
you and keep your network protected.