Threat Hunting with Pyshark: Using Open Source Python Libraries to Automate Threat Hunting

Threat Hunting with PyShark: A Practical Guide to Automating Network Analysis

Ever find yourself lost in a sea of packets? If you’ve spent hours scrolling through Wireshark, you know it’s a powerful tool. But manually hunting for threats in gigabytes of network traffic isn’t just time-consuming—it’s inefficient at scale. What if you could automate the tedious parts and let Python do the heavy lifting?

Enter PyShark.

If you’re looking to elevate your threat hunting, this open-source Python library is about to become a staple in your toolkit. It bridges the gap between the raw power of command-line packet inspection and the flexible logic of Python scripting. Let’s dive in and explore how you can use PyShark to build automated, effective network threat hunts.

What is PyShark, Really?

In simple terms, PyShark is a Python wrapper for tshark. For those not in the know, tshark is the command-line interface (CLI) for the legendary Wireshark. PyShark cleverly taps into tshark‘s XML export capabilities, allowing you to parse and interact with packet data directly within a Python script.

This gives you two powerful ways to analyze traffic:

  • Live Traffic Analysis: Write scripts that listen to a network interface in real-time, flagging suspicious activity as it happens.
  • PCAP File Analysis: Process pre-existing packet capture (PCAP) files for incident response, forensics, or testing your hunting hypotheses.

The Power of Filters

One of PyShark’s most critical features is its support for both capture filters and display filters. Understanding the difference is key.

  • Capture Filters (BPF): These are applied before packets are processed. On a high-traffic network, this is a lifesaver. You can tell tshark (and by extension, PyShark) to ignore all the noise and only capture the packets that matter, saving significant CPU and memory resources.
  • Display Filters: These are applied to data that has already been captured. It’s the same powerful filter syntax you use in the Wireshark GUI bar, but now you can apply it programmatically in your Python code.

Getting PyShark Up and Running

Installation is managed easily through pip. Open your terminal and get started.

 
# For Python 3
pip3 install pyshark

# Or if pip is linked to your desired Python version
pip install pyshark

Crucial Prerequisite: PyShark is a wrapper, not a standalone tool. You must have tshark installed on your system first. tshark is included with a standard Wireshark installation.

A Quick Guide: Installing PyShark on macOS

Getting set up on a Mac is straightforward, but a few steps can save you a headache.

  1. Install Wireshark & Tshark: The best way to do this is with Homebrew. If you have an old version of Wireshark installed, it’s best to remove it first.

     
    # Install or reinstall Wireshark and its command-line tools
    brew install --cask wireshark
    

    Verify the installation by checking the tshark version.

    tshark --version
    

    If you see the version info, you’re good to go.

  2. Install Developer Tools: Some dependencies may need Apple’s command-line developer tools.

    xcode-select --install
    
  3. Fix Homebrew Permissions (If Needed): Occasionally, Homebrew might hit a “Permission denied” error when writing to /usr/local. If this happens, you can fix it by taking ownership of the directory.

    sudo chown -R $(whoami) /usr/local
    
  4. Install PyShark: Now, you can finally install the Python library itself.

    pip3 install pyshark
    

    You’re now ready to start hunting.

Sniffing Live Traffic: Your First PyShark Script

Ready to see PyShark in action? Capturing live network packets is incredibly intuitive. Here’s a basic script to get you started.

Python
import pyshark

# Start a live capture on the 'eth0' interface.
# You can add filters like bpf_filter='tcp port 80'
capture = pyshark.LiveCapture(interface='eth0')

# Sniff for the first 10 packets
for packet in capture.sniff_continuously(packet_count=10):
    print(f"Just captured a packet: {packet}")

In this example, LiveCapture sets up the listener. The sniff_continuously() method iterates through packets as they cross the wire. You can omit packet_count to sniff indefinitely until you manually stop the script.

Analyzing PCAPs for Incident Response

Offline analysis is where many threat hunts begin. PyShark makes dissecting PCAP files just as easy. This is perfect for analyzing evidence from an incident or testing scripts on known-malicious traffic captures.

Python
import pyshark

# Open a PCAP file for analysis
# A display filter here is great for targeting specific traffic
capture = pyshark.FileCapture('suspicious_traffic.pcap', display_filter='dns')

# Loop through each packet in the file
for packet in capture:
    # Check if the packet has the layers we're interested in
    if 'DNS' in packet and 'IP' in packet:
        # Access fields using dot notation
        print(f"DNS Query from {packet.ip.src} for {packet.dns.qry_name}")

# It's good practice to close the capture object when done
capture.close()

A word of caution: Be mindful when working with massive PCAP files. Loading millions of packets can consume a lot of memory. Use a display_filter in FileCapture to narrow your focus to only the packets you need for your specific hunt.

Pro-Tip for Python 3.8.3 Users

If you’re using Python 3.8.3, you might encounter an asyncio error when your script finishes. This is a known quirk. The fix is simple: always explicitly close your FileCapture object with capture.close() to ensure all resources are properly released.

Threat Hunting in Action: Finding an Nmap Scan

Let’s put it all together. Imagine we have a large PCAP file and we suspect someone was running an Nmap scan. Many Nmap probes use HTTP to identify web servers. One common indicator is a request for the URI /nice-test.php.

Let’s hunt for it.

Step 1: Find the Suspicious URI

First, we’ll iterate through all HTTP packets and count how many times each URI is requested.

Python
import pyshark
from collections import Counter

# Use a display filter to only load HTTP packets from the massive file
capture = pyshark.FileCapture('sics_geek_lounge.pcap', display_filter='http')

uri_counts = Counter()

for packet in capture:
    try:
        # The hasattr() check prevents errors if a field doesn't exist
        if hasattr(packet, 'http') and hasattr(packet.http, 'request_uri'):
            uri = packet.http.request_uri
            uri_counts[uri] += 1
    except AttributeError:
        # Skip any malformed packets
        continue

capture.close()

# Print the 10 most common URIs
for uri, count in uri_counts.most_common(10):
    print(f"URI: {uri:<25} Count: {count}")

Running this might reveal that /nice-test.php was requested 88 times. That’s our indicator!

Step 2: Pivot to Find the Scanner

Now that we have our indicator of compromise (IOC), we can pivot. Who was making these requests? Let’s run a second, more targeted query to find the source IP.

Python
# We already identified our suspicious URI
suspicious_uri = "/nice-test.php"

# A more specific filter makes this much faster
scanner_hunt = pyshark.FileCapture(
    'sics_geek_lounge.pcap',
    display_filter=f'http.request.uri == "{suspicious_uri}"'
)

scanners = set()
for packet in scanner_hunt:
    # Ensure the IP layer exists before we try to access it
    if 'IP' in packet:
        scanners.add(packet.ip.src)

scanner_hunt.close()

print("Potential Nmap scanner IPs found:")
for ip in scanners:
    print(ip)

This second pass would quickly reveal the source IP (e.g., 192.168.2.137) that was performing the scan. In under a minute, we’ve gone from millions of packets to a primary suspect.

Beyond This Example: What Else Can You Hunt?

This Nmap example is just the beginning. You can adapt these scripting techniques for countless other threat hunts:

  • Known IOCs: Hunt for IP addresses, domains, or user agents associated with malware campaigns.
  • Malware Artifacts: Search for specific protocol anomalies, like the patterns seen in EternalBlue’s SMBv1 traffic.
  • Data Exfiltration: Look for unusual DNS queries or large data payloads going to unexpected destinations.
  • Behavioral Analysis: Chain logic together to detect suspicious behaviors, like a single host rapidly failing to connect to multiple ports across a subnet.

By combining the deep packet inspection of tshark with the versatility of Python, PyShark gives you the power to automate your network investigations, create reusable hunting scripts, and integrate network analysis directly into your larger security workflows. It’s a powerful addition to any security professional’s arsenal.

See how Insane Cyber transforms security

Our products are designed to work with
you and keep your network protected.