Oct 10, 2025

100 concrete things to explore when using LLMs to assist penetration testing

Each item has: a short description, an example LLM prompt you can use immediately, and a low-level next step (command, code snippet or approach) you can run or adapt. I grouped items into categories so you can pick a pathway (recon → exploit → post-exploit → automation → defenses). I also cite a few foundational open-source tools you’ll want to integrate LLM output with. (GitHub)

Quick note on safety: use these only on systems you own or have explicit permission to test. LLMs can accelerate offensive work — use responsibly.

Reconnaissance (1–18)

Passive OSINT profile of a target domain
Prompt: “Collect open-source intelligence about example.com: subdomains, public cloud buckets, DNS history, leak pastebin hits.”
Next step: run subfinder/amass or have the LLM produce amass enum -d example.com + parsing script.
Enumerate subdomains using wordlists
Prompt: “Generate a prioritized subdomain wordlist for example.com based on its business, technologies, and country.”
Next step: feed list into massdns or naabu for fast resolution.
Fingerprint web technologies from a homepage
Prompt: “Given this HTTP response header and HTML, identify frameworks, versions and likely plugins.”
Next: LLM outputs fingerprint rules used to generate whatweb/wappalyzer signatures or custom regex.
Find exposed Git metadata (/.git)
Prompt: “Write a script that checks for /.git existence and pulls files safely (no destructive actions).”
Next step: use curl -sL http://target/.git/HEAD and the LLM’s script to fetch only files listed.
Locate cloud buckets and S3 misconfigurations
Prompt: “Given the company name and region, produce candidate S3/bucket names and a check script for public read/list.”
Next step: aws s3api head-bucket --bucket candidate or LLM-produced Python using boto3 with anonymous client.
Profile email addresses and possible password reset flows
Prompt: “Map public email formats, password-reset URLs, and CSP/anti-CSRF protections for staff@company.”
Next step: LLM produces curl sequences to test password reset endpoints and parse tokens.
Create prioritized vulnerability checklist
Prompt: “Given technologies [Nginx, Django, Postgres], list high-probability CVEs and detection checks.”
Next step: generate nmap NSE script commands or cve-search queries.
Map network ranges from DNS and BGP
Prompt: “From these IPs and ASN, identify the likely upstream netblocks and cloud providers.”
Next step: LLM produces whois, bgpview or ipinfo API call sequences.
Harvest JS files and analyze interesting functions
Prompt: “Download JS files from /static and summarize functions that handle auth, tokens or crypto.”
Next step: LLM outputs static analysis heuristics and a Python script using requests + AST parsing with esprima or regex.
Detect API endpoints and parameters
Prompt: “Given XHR request logs, extract REST endpoints, parameters, and possible injection points.”
Next step: LLM formats Burp/OWASP ZAP scanner policy or curl calls for fuzzing.
Detect leaked credentials in commits
Prompt: “Write a program to scan discovered Git history for AWS keys, with safe false-positive filters.”
Next: LLM produces a git log --pretty parser + regex for AWS key patterns.
Create adaptive reconnaissance playbook
Prompt: “Write a reconnaissance playbook that adapts based on results (e.g., if WordPress found → run plugin enumeration).”
Next: export as YAML for an automation tool (Ansible/Makefile) or a Python orchestration script.
Use LLM to triage open ports
Prompt: “Given a list of open ports from Nmap, rank them by likely exploitability and suggest exact fingerprint commands.”
Next: LLM suggests nmap -sV --script lines and prioritized exploits.
Construct targeted search engine dorks
Prompt: “Produce Google dorks / Bing queries to find documents, exposed dashboards, or backup files for example.com.”
Next: LLM returns a list; validate manually — don’t run automated scraping of search engines.
Generate prioritized wordlists for brute force
Prompt: “Build a prioritized username + password list for company using naming conventions and public bios.”
Next: use hydra or crowbar with rate limits and the LLM’s list.
Discover mobile app endpoints from APKs
Prompt: “Given an APK package name, extract server endpoints, API keys placeholders, and create static analysis script.”
Next: LLM suggests jadx commands and regex to extract strings.xml/build.gradle endpoints.
Identify likely CI/CD pipeline leakage
Prompt: “List signs in repo/site that indicate CI secrets leaked (e.g., webhook URLs, artifacts).”
Next: LLM outputs regex and git grep commands to run on mirrored repos.
Threat modeling of the target
Prompt: “Create a STRIDE/attack-tree for e-commerce site: asset list, attacker goals, and high-probability attack paths.”
Next: LLM emits a JSON attack tree usable by risk management tools.

Scanning & Service Enumeration (19–32)

Automate port/service scanning workflow
Prompt: “Write a pipeline: host discovery → fast port scan → service detection → vuln scan; include command examples.”
Next: nmap -sn → naabu -p- → nmap -sV -sC as LLM produces wrapper script.
Generate tuned Nmap scan profiles
Prompt: “Produce Nmap command tuned for stealth vs speed for a given target and network conditions.”
Next: examples: stealth nmap -sS -Pn -T2 -p- vs fast nmap -T4 -p- --min-rate.
Write custom NSE scripts with LLM assistance
Prompt: “Generate an Nmap NSE script that checks for exposed .git repositories and fetches HEAD only.”
Next: LLM outputs NSE skeleton (Lua) — save as .nse and run nmap --script ./check-git.nse.
Automate banner/protocol fuzzing
Prompt: “Create a small Python fuzzer for FTP commands to find parsing crashes.”
Next: LLM provides socket-based fuzzer that mutates commands and logs responses.
Service fingerprinting via active probes
Prompt: “List active probes to differentiate Apache/Tomcat/nginx even when headers are modified.”
Next: LLM suggests probes (error pages, HTTP methods, OPTIONS, TRACE) and regex for responses.
Enumerate SMB and Windows shares safely
Prompt: “Write a script using Impacket to list SMB shares and check anonymous access.”
Next: LLM provides impacket.smbconnection usage example to call from Python. (GitHub)
Automate SSH host key collection and weak config detection
Prompt: “Collect SSH host keys and test for weak kex/ciphers and password auth enabled.”
Next: LLM outputs ssh-audit invocation or nmap --script ssh2-enum-algos.
Identify exposed databases and default credentials
Prompt: “Scan for exposed database ports and attempt safe read-only queries using known creds list.”
Next: LLM provides connection strings and psql -h host -U user -c '\l' or mysql -e 'SHOW DATABASES;'.
Web service parameter enumeration
Prompt: “Given endpoint /api/search?q=, create a list of parameters to probe and a fuzzing policy.”
Next: integrate with ffuf or wfuzz using LLM’s generated wordlists.
TLS & certificate analysis
Prompt: “Analyze the target’s TLS chain for weak ciphers, expired intermediates, and misissued certs.”
Next: run openssl s_client -connect host:443 -showcerts and LLM parses output into findings.
SSH/HTTP timing analysis for side channels
Prompt: “Create a test harness that measures response timing differences to detect username enumeration.”
Next: LLM outputs Python with requests timing and statistical test (t-test).
Identify exposed CI artifacts (Docker images, registries)
Prompt: “Scan for open container registries and attempt to list repositories (read-only).”
Next: curl -s https://registry.example.com/v2/_catalog or LLM’s script with pagination.
Find insecure service configurations (default pages, debug endpoints)
Prompt: “Detect common debug endpoints: /debug, /actuator, /env, /wp-admin/admin-ajax.php; prioritize by probability.”
Next: LLM outputs ffuf -w patterns to scan.
Integrate scanner outputs into single JSON
Prompt: “Write a parser that normalizes Nmap, OpenVAS and Burp output into a single JSON vulnerability inventory.”
Next: LLM returns Python using xmltodict and mapping rules.

Web Application Testing (33–54)

Automate SQL injection detection and payload generation
Prompt: “Given an endpoint and parameter, produce SQLi payload sequences (boolean, time-based, error) and detection logic.”
Next: integrate with sqlmap or the LLM’s payloads for requests fuzzing. (GitHub)
LLM-assisted parameter pollution tests
Prompt: “Generate test cases for HTTP parameter pollution and duplicate parameter handling.”
Next: LLM outputs curl or Burp intruder payload sets.
Cross-site scripting (XSS) payload synthesis
Prompt: “Create context-aware XSS payloads for HTML body, attribute, JS, and JSON contexts, with encodings.”
Next: LLM provides strings; test with ffuf + --match on reflection.
CSRF detection and exploit generation
Prompt: “Find forms without CSRF tokens and produce an HTML exploit that automatically submits.”
Next: LLM crafts the exploit HTML form for proof-of-concept.
Auth bypass and logic flaws
Prompt: “Given a login flow with step descriptions, find logic race or sequence bugs (e.g., email change without reauth).”
Next: LLM proposes test sequences and replay scripts.
Automate forced browsing / hidden links discovery
Prompt: “Enumerate hidden endpoints via JS parsing, sitemap, robots.txt and brute force.”
Next: LLM provides wget/grep pipeline to extract URLs and pass to ffuf.
Header injection and host header attacks
Prompt: “Generate attacks that manipulate Host, X-Forwarded-For and other headers to test trust boundaries.”
Next: LLM supplies curl -H commands and expected server behaviors.
Automate file upload bypasses
Prompt: “Create a list of bypasses for file extension/content checks (double extension, content sniffing, magic bytes).”
Next: LLM provides Python script that crafts multipart uploads with altered headers and magic bytes.
Detect insecure deserialization
Prompt: “Explain how to probe Java/PHP/Python deserialization endpoints and generate proof-of-concept payloads.”
Next: LLM produces ysoserial/PhpGGC usage or serialized payload templates.
API abuse: excessive rate and logic manipulation
Prompt: “Design test cases to abuse pagination, bulk endpoints, and quota systems.”
Next: LLM generates scripts using aiohttp for concurrency and test orchestration.
SSRF detection and exploitation workflow
Prompt: “Find SSRF via URL-type parameters and craft OOB detection using collaborators (Burp Collaborator / interactsh).”
Next: LLM supplies curl calls that redirect to interactsh payloads.
Auth token analysis (JWT)
Prompt: “Given a JWT, decode, check algorithms (alg none), critical claims, and craft a forge test.”
Next: LLM gives code: jwt.decode(tok, options={"verify_signature": False}) and script to alter alg.
Automate business-logic testcases
Prompt: “Given e-commerce flow, generate tests for price manipulation, discount stacking, and inventory race conditions.”
Next: LLM outputs sequences and sample HTTP bodies for concurrent requests.
Plugin and framework exploit hunting (WordPress, Drupal)
Prompt: “If WordPress vX.Y is detected, list high-probability vulnerable plugins and proof-of-concept checks.”
Next: LLM drafts wpscan args and plugin enumeration strategies.
Identify insecure CORS configurations
Prompt: “Given response headers, evaluate CORS policy and craft a malicious origin exploit if vulnerable.”
Next: LLM builds a PoC HTML that performs an XHR from a malicious origin.
Automated Content Security Policy (CSP) analysis
Prompt: “Analyze CSP header and report misconfigurations that allow inline script execution or unsafe eval.”
Next: LLM outputs canonical checks and mitigation suggestions.
Automate detection of backup files and source leaks
Prompt: “Create checks for typical backup filenames (.bak, .swp, ~, .sql.gz) and crawling policy.”
Next: LLM provides ffuf -w template for backup discovery.
Use LLM to write Burp extensions
Prompt: “Generate a Burp extension (Jython or Java) to detect a custom token pattern and highlight requests.”
Next: LLM outputs extension skeleton — compile / load into Burp Extender (note: Burp API required).
GraphQL security exploration
Prompt: “Enumerate GraphQL schema, generate introspection queries, and create complexity/fuzzing tests.”
Next: LLM crafts curl -X POST -d '{"query":"{__schema{types{name}}}"}' and rate/complexity tests.
SSO and OAuth misconfiguration testing
Prompt: “Test for open redirectors, weak client secrets, and scope escalation in OAuth flows.”
Next: LLM generates OAuth flow sequences and checklists for redirect URIs.
Automated detection of client-side logic vulnerabilities
Prompt: “Find dangerous client-side evals, dynamic script insertion points and CSP bypass vectors.”
Next: LLM gives regex to find eval(, new Function, innerHTML =.
Fuzz JSON and binary endpoints with context-awareness
Prompt: “Create a grammar-aware fuzzer for JSON APIs that mutates specific fields intelligently.”
Next: LLM outputs a Python grammar mutation engine using jsonschema to keep valid shapes.

Exploit Development & Binary Testing (55–69)

Use LLMs to write exploit skeletons from CVEs
Prompt: “Given CVE-YYYY-NNNN details, produce an exploit skeleton in Python and explain required adjustments.”
Next: LLM outputs a proof-of-concept and guidance to adapt offsets and gadgets.
Automate buffer-overflow triage
Prompt: “Given a crash log and core dump, produce steps to find the crash point, register state and possible exploit vectors.”
Next: LLM suggests gdb commands and pwndbg checks (use pwntools for payloads). (GitHub)
Create ROP gadget chains automatically
Prompt: “Automate gadget discovery and chain generation for a given binary and ABI.”
Next: LLM produces ropper or ragg2 commands and how to combine gadgets with pwntools.
Automate format-string exploit generation
Prompt: “Given a format-string leak and memory layout, generate payloads to write to arbitrary addresses.”
Next: LLM outputs printf write plans and example exploit code.
Binary instrumentation harness generation
Prompt: “Create an AFL/LibFuzzer harness for this library function with seed corpus and sanitizer flags.”
Next: LLM emits C harness, compile flags, and suggestions for ASAN/UBSAN.
Automate heap spraying / heap feng shui scenarios
Prompt: “Given a vulnerable allocator, produce a sequence to shape the heap layout prior to the vulnerability trigger.”
Next: LLM outlines malloc/free patterns and example PoC code in C.
Symbolic execution harness with angr
Prompt: “Write an angr script to find inputs that reach a vulnerable function.”
Next: LLM returns angr skeleton (project load, entry state, exploration).
Automate reverse shell payload creation
Prompt: “Generate a platform-aware reverse shell payload in C/Python that avoids null bytes and explains constraints.”
Next: LLM produces shellcode or small binary wrapper; test in controlled VM.
Exploit reliability improvements
Prompt: “How to add heap-spray, retries, and NOP sleds to improve exploit stability across ASLR variations.”
Next: LLM provides code patterns and fallback sequences.
Automate gadget finding with ROPgadget/objdump
Prompt: “Find gadgets for x86_64 and generate a minimal payload to call system(‘/bin/sh’).”
Next: LLM gives ROPgadget --binary=binary and pwntools payload assembly.
Write kernel exploit triage checklist
Prompt: “Given a kernel crash log (oops), produce a triage plan for exploitability and required environment.”
Next: LLM lists required kernel symbols, debug info and harness suggestions.
Automate exploit verification and sandboxing
Prompt: “Create a VM automation script to safely run an exploit and capture network callbacks.”
Next: LLM outputs Vagrant/libvirt or docker commands to spin up test env.
Fuzz network protocol implementations
Prompt: “Construct a grammar fuzzer for a custom TCP protocol with stateful interactions.”
Next: LLM provides Python harness using scapy/boofuzz.
Binary patch generation script
Prompt: “Given a patch diff, generate a binary patch (bspatch/bsdiff) and test script.”
Next: LLM explains objdump/ldd checks and provides bspatch usage.
Automate symbolic patch detection across versions
Prompt: “Detect changed functions between binary versions to find likely fixed vulnerabilities.”
Next: LLM outputs radiff2 or BinDiff automation hints.

Post-exploitation & Lateral Movement (70–81)

Password/credential harvesting automation
Prompt: “Given access to a Windows host, enumerate cached credentials, DPAPI blobs and LSA secrets safely.”
Next: LLM references commands and Impacket/PowerSploit equivalents (note: use only in allowed environment). (GitHub)
Automate persistence checks and proof scripts
Prompt: “List common persistence mechanisms on Linux and Windows and produce detection queries.”
Next: LLM gives systemd service checks, cron, startup folder scans.
Enumerate Active Directory via LDAP and Kerberos
Prompt: “Create an Impacket/LDAP script to enumerate users, groups, SPNs, and test for unconstrained delegation.”
Next: LLM outputs impacket/examples/GetUserSPNs.py style usage and parsing. (GitHub)
Kerberoasting and AS-REP roasting workflows
Prompt: “Implement a Kerberoast/AS-REP roast PoC using Impacket and explain decryption steps.”
Next: LLM generates commands and hashcat example cracking modes.
SSH pivot automation and SOCKS chaining
Prompt: “Create an SSH-based pivot script that sets up dynamic port forwarding and routes traffic through compromised host.”
Next: LLM outputs ssh -D 1080 -N -f user@host and proxychains config.
Automated lateral movement via SMB/PSExec
Prompt: “Given valid creds, automate remote command execution using SMB and SMBNamedPipe (impacket psexec).”
Next: LLM gives impacket/examples/psexec.py usage and safe logging.
Data exfiltration simulation (safe)
Prompt: “Simulate an exfil test with staged and encrypted channels to a test collector, minimize noise.”
Next: LLM suggests chunking, AES encryption, and timing obfuscation.
Credential replay detection
Prompt: “Create SIEM detections for unusual Kerberos TGS requests and replay patterns.”
Next: LLM outputs sample Sigma rules.
Automate cleanup & forensic artifacts generation
Prompt: “List files, registry keys, and logs that an operator must clear after lateral movement (for simulation).”
Next: LLM provides scripts to revert changes in isolated lab.
Build automated discovery of Windows GPOs and security settings
Prompt: “Enumerate GPO applied settings and privileges to find likely weak local admin configurations.”
Next: LLM provides PowerShell commands (Get-GPOReport) and parsing.
Privileged escalation toolkit automation
Prompt: “Given kernel & installed software versions, suggest likely local privilege escalation paths and PoCs.”
Next: LLM references local exploits (search terms) and suggests safe testing harness.
Deploy beacon / callback frameworks for testing
Prompt: “Generate a minimal HTTPS beacon that sleeps and polls for commands; include JARM-friendly options.”
Next: LLM gives C/Python/Go beacon skeleton (use only in permitted tests).

Automation, Orchestration & LLM-Specific (82–95)

LLM prompt engineering for vuln triage
Prompt: “Given scan output, produce a short classification (false pos/likely vuln/needs manual check) and remedial steps.”
Next: LLM can output a triage JSON mapping fields to priority.
LLM as an interactive pentest assistant (chat agent)
Prompt: “Build an agent that loads scanner output, accepts questions, and issues follow-up scanner commands.”
Next: LLM provides architecture: vector store for context, tool calls to nmap/sqlmap, and prompt templates.
Automate exploit parameter extraction from PoCs
Prompt: “Parse a text PoC and extract targets, offsets, required versions, and build a checklist.”
Next: LLM returns structured data that can feed CI pipelines.
Create LLM-driven fuzzing hypotheses
Prompt: “Propose likely malformed inputs for this API based on observed parsing behavior.”
Next: generate grammar rules used by boofuzz/afl harnesses.
LLM building of YARA rules from observed malware artifacts
Prompt: “Given sample strings and file metadata, create a tight YARA rule with metadata and tags.”
Next: LLM emits YARA syntax and test harness for yara CLI.
Integrate LLMs with CI for continuous pentest scans
Prompt: “Design a pipeline that runs scheduled scans, then uses an LLM to triage and open JIRA tickets.”
Next: LLM outputs YAML for GitHub Actions or GitLab CI that calls scanners and posts results.
LLM to convert natural language findings into exploit scripts
Prompt: “Translate this finding: ‘SQLi at /search?q=’ into a runnable exploit script that tests payload set A.”
Next: LLM produces requests-based PoC and sqlmap commands.
Automate report drafting from raw findings
Prompt: “Produce an executive summary + technical appendix from the following JSON scanner output.”
Next: LLM transforms into markdown or PDF-ready sections.
Create guided remediation guides with code fixes
Prompt: “Given this XSS proof-of-concept, generate framework-specific fixes (Django, Express) and code patches.”
Next: LLM outputs code diffs and patch commands.
LLM to summarize patch diff & exploitability
Prompt: “Given a GitHub commit diff, explain if it fixes an RCE and how an attacker abused it.”
Next: LLM highlights lines and suggested tests.
Train a lightweight local LLM with your pentest corpus
Prompt: “How to fine-tune a small model on my internal PoC and scan logs to improve recommendations?”
Next: LLM produces data curation steps (anonymize, label) and training script hints.
Create a plugin that turns LLM suggestions into Burp intruder payloads
Prompt: “Generate a Burp extension that takes LLM output, formats it into an Intruder payload set and starts attack.”
Next: LLM gives extension skeleton and API use.
Automate regulatory & compliance mapping for findings
Prompt: “Map each vulnerability to OWASP Top10, PCI DSS, and recommended CIS control.”
Next: LLM outputs mapping table and remediation priority.
Adversarial prompt testing for LLMs used in triage
Prompt: “Design tests to show how an LLM triage assistant can be tricked by adversarial artifacts.”
Next: LLM suggests test cases and mitigations (input sanitization, TTL for context).

Detection, Blue Team, and Defensive Use (96–100)

Generate detection signatures from PoC network IOCs
Prompt: “From this exploit’s network behavior, generate Snort/Suricata rules and Zeek scripts.”
Next: LLM outputs rule examples and testing pcap harness.
Convert findings to Sigma, YARA, and Suricata rules
Prompt: “Produce a Sigma rule to detect suspicious Windows command lines seen in PoC.”
Next: LLM emits Sigma YAML and a test event to validate.
Automate threat hunting playbooks
Prompt: “Given a lateral movement PoC, create a step-by-step hunt playbook for SOC analysts.”
Next: LLM returns checklist, queries for Splunk/ELK and exact fields.
Hardening scripts for typical misconfigurations
Prompt: “Provide idempotent Ansible/PowerShell scripts to remediate the top 10 findings in this scan.”
Next: LLM generates Ansible roles and PowerShell DSC snippets.
Blue/Red team tabletop automation
Prompt: “Create a tabletop scenario that simulates exploitation of an exposed app, with detection triggers and automated scoring.”
Next: LLM produces scenario timeline, detection rules, and scoring rubric.

Example low-level code snippets (quick reference)

Nmap NSE skeleton (Lua) — save as check-git.nse then run nmap --script check-git.nse

description = [[Check for /.git HEAD]]
author = "LLM-assisted"
categories = {"discovery"}
portrule = shortport.http

action = function(host, port)
  local url = "http://"..host.ip..":"..port.number.."/.git/HEAD"
  local resp = http.get(url)
  if resp and resp.status == 200 then
    return "Found /.git/HEAD: "..(resp.body:sub(1,200))
  end
  return nil
end

Simple Python script to test JWT alg none

import jwt, base64, json
token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." 
hdr,pl,sig = token.split('.')
decoded_header = json.loads(base64.urlsafe_b64decode(hdr + '=='))
decoded_payload = json.loads(base64.urlsafe_b64decode(pl + '=='))
print("Header:", decoded_header)
print("Payload:", decoded_payload)
# Test by setting alg to none
new_hdr = base64.urlsafe_b64encode(b'{"alg":"none","typ":"JWT"}').decode().rstrip('=')
new_token = new_hdr + '.' + pl + '.'
print("Test token:", new_token)

Impacket usage example for share enumeration

python3 /path/to/impacket/examples/smbclient.py domain/username:password@target
# or from Python using Impacket's SMBConnection class per repo examples.

(see Impacket examples for exact API). (GitHub)

Tooling & Repos to integrate (start here)

Metasploit Framework — Rapid7 (exploit dev & modules). (GitHub)
sqlmap — automated SQLi and takeover tool. (GitHub)
Nmap — network discovery and NSE scripting. (GitHub)
Impacket — low-level Python network protocol classes for SMB/Kerberos. (GitHub)
pwntools — exploit dev & binary tooling. (GitHub)

If you want, I can now do one of these concretely for you (for example: produce an LLM-tuned ffuf or nmap pipeline + wrapper script for recon, or generate specific PoC payloads for a web parameter you paste). Pick one item number and I’ll produce the full runnable script + testing instructions (no background tasks — I’ll generate it right here).

**100 concrete things to explore** when using LLMs to assist penetration testing