Winter Soldier

Paste an indicator. Watch Claude investigate.

A Kali-native OSINT and pen-test operator console for cybercrime victims, attribution investigators, and authorised pen-test engagements. The whole stack — Next.js UI, Python MCP bridge, full Kali Linux toolkit, Flask backend — runs inside one Docker image. One docker run, one volume to back up, one OAuth token in Settings.

See it work →How it's built

Scroll

1image

Docker artefact

12+

Indicator types

/data

One volume

0host prereqs

Beyond Docker

Paste-and-pivot

One indicator. A fleet of investigators.

Drop any indicator into the paste bar — a wallet address, a domain, an email, a file hash, a phone number. Winter Soldier auto-detects the type, spawns a Claude Code CLI subprocess to investigate, and queues every newly-discovered IOC as a pending pivot for the operator to approve.

>● detected: domain

investigatingevil-domain.example.testdomain

whoiscrt.shwaybackdnssubfinderamass

subs47 subdomains discovered · 12 with active SSL certs in last 90d

whoisregistrant proxied (Whois Privacy Corp, BS) · registered 2024-08-12

reuseNS records match 3 known phishing clusters · pivots queued ↓

turns 12tokens 184kelapsed 0:47+2 pivots queued

complete

pivoted to203.0.113.42ip-v4evil-domain.example.test

shodanrdappassive-dnsnmap

asnAS200000 — BulletProof Hosting Ltd · BG · 142 hosts on this /24

ports22 (OpenSSH 8.2) · 80 (nginx) · 443 (Let's Encrypt) · 3389 (RDP)

historypreviously hosted: 14 domains across 4 phishing kits since 2024

turns 8tokens 96kelapsed 0:31+1 pivot queued

complete

pivoted tobc1q…7t6jznwallet-btcdonation footer link

mempool.spaceOFAC listblockstreamarkham

volume₿ 14.82 received · ₿ 14.79 forwarded · 312 deposits since 2024-11

clusterco-spending cluster of 8 addresses · 2-of-3 multisig pattern

sanctionsno direct OFAC match · 2-hop link to a sanctioned mixer (suspect)

turns 6tokens 74kelapsed 0:24leaf node

complete

↳ each subprocess streams findings live via SSE; pivots fan out unbounded until depth caps hit

Twelve indicator types

Everything an investigator pastes, mapped to its own pipeline.

Each indicator type has its own prompt template, its own skill-augmented Claude workflow, and its own auto-pivot rules. The detector recognises them on paste; the dispatcher routes them automatically.

wallet (BTC)

wallet (ETH)

domain

url

ip-v4

ip-v6

phone (E.164)

username

person

hash (MD5)

hash (SHA1)

hash (SHA256)

repository

CVE

package

tx-hash

onion

Pen Test Mode

One toggle decides what the investigators can see.

The Kali toolkit is always installed and always running inside the container. The toggle is a pure MCP-config switch: it decides which tools are registered with each new spawned investigator. Toggling is instantaneous — no container start, no compose dance, no healthcheck wait.

Tools registered for this mode

What this means

Architecture

One container, three concerns, one published port.

s6-overlay runs as PID 1 and supervises two long-running services. Per-investigator Claude Code CLI subprocesses spawn on demand inside the same container. The audited bearer-token boundary between the MCP bridge and Flask survives intact — just on loopback instead of host.docker.internal.

winter-soldier (kalilinux/kali-rolling)

s6-overlay (PID 1)

supervises both services · forwards SIGTERM cleanly to docker stop

▲

nextjs (standalone)

operator UI · OAuth-token Settings · paste-and-pivot

:3000

🐍

kali-flask

Kali HTTP API · bearer-token gate · zebbern fork (audited)

127.0.0.1:5000

↑ spawns on demand ↑

⌘

claude CLI

one per active investigation · --mcp-config scratch/mcp.json

⇄

python mcp_server.py

MCP bridge · stdio in, HTTP to flask via loopback

one command, anywheredocker run -d \
  --name winter-soldier \
  -p 3000:3000 \
  -v winter-soldier-data:/data \
  --cap-add NET_RAW --cap-add NET_ADMIN \
  --device /dev/net/tun \
  ghcr.io/flightxcaptain/winter-soldier:latest

# open http://localhost:3000
# paste OAuth token in Settings → done.

What's in the box

A Kali-powered operator console, ready to docker run.

Winter Soldier collapses what used to be four host prerequisites (Node, npm, Python 3, Docker Desktop) into one. The full Kali pentest toolkit, the OSINT investigator pipeline, the Settings UI, and the audited security boundary all live inside one image. The operator runs docker run, pastes a token, starts investigating.

◈

Single unified container

Next.js on 0.0.0.0:3000, Kali Flask on 127.0.0.1:5000, Python MCP bridge as a loopback child, supervised by s6-overlay. One docker run, one volume to back up.

⌖

Paste-and-pivot OSINT

Wallet, domain, URL, IP, email, phone, username, person, file hash, repository, CVE, package, tx-hash, onion — auto-detected and routed to the right investigator template.

⌁

Pen Test Mode toggle

Switches what tools investigators see. No docker compose dance. Mode applies to new pastes only; in-flight investigations keep whatever mode they were spawned with.

◆

Audited security boundary

Inherited verbatim from the upstream zebbern-kali-mcp fork audit. Stripped arbitrary-shell tools. Bearer auth on every Flask request. /api/exec returns 403 regardless of caller.

⌬

Subscription-billed

Paste a year-long OAuth token into Settings. The container injects CLAUDE_CODE_OAUTH_TOKEN into every spawned investigator. No ANTHROPIC_API_KEY required.

⛁

Single-volume persistence

OAuth token, auto-generated Kali bearer, audit logs, per-paste workspace, case files — every operator-state artefact lives under /data. Container is otherwise immutable.

◉

Per-investigator MCP

Each Claude Code subprocess gets its own MCP server registration in its scratch dir at spawn time. The operator's global Claude config is never touched.

◎

Glass-card streaming UI

Live SSE event stream — indicator detection, structured findings, tool-call chips, severity rollups, follow-up chat. Per-card abort + per-paste kill switches.

▣

Evidence-grade case files

Every active-mode Kali tool invocation appended to /data/workspace/audit.jsonl. cases/ holds long-form material ready for law-enforcement handover.

⌕

Trail of Bits skills

YARA-X rule authoring for hash investigations. Chain-specific scanners for wallet pivots. Burp Suite parser for victim-supplied evidence. Skill-driven workflow upgrades per indicator type.

⊕

Deploy anywhere

The image is an artefact, not a deploy script. Any host that accepts NET_RAW / NET_ADMIN / /dev/net/tun works. Operator state migrates with the volume.

△

Authorisation-first

Never scan anything you don't own or have written permission to test. No hack-back. No vigilantism. Investigate, document, hand to law enforcement.

Security boundary

Five hardening patches, all inherited from the upstream audit.

The vendored Kali MCP fork ships with five Winter Soldier security patches applied before the bridge ever talks to Flask. They survive the unified-container consolidation byte-for-byte; the boundary just moved from host networking to container loopback.

01Arbitrary-shell tools stripped

zebbern_exec, exec_stream, send_input, read_output in mcp_tools/command_exec.py are removed. Only health and system_network_info survive from that module.

02/api/exec neutralised at Flask

Both /api/exec and /api/command endpoints return 403 regardless of caller. Defense in depth: even if the MCP layer's patches were bypassed, the backend itself refuses to run arbitrary shell.

03Bearer-token auth on every endpoint

Flask refuses to start without WS_KALI_API_TOKEN set; every request from the MCP client carries the token in Authorization: Bearer …. The token is auto-generated on first boot and lives in /data/.config/kali-api-token (mode 600).

04Per-tool passive/active classification

A _MODULE_CLASSIFICATION dict drives passive-vs-active loading. Passive mode filters every invasive module out at registration time — Claude literally cannot see those tools. Active mode logs every invocation to audit.jsonl.

05Build from reviewed source

Container always builds locally from the source pinned in REVIEWED_COMMIT.txt — never pulls GHCR latest. Upstream CI pushes on every main commit; that image may not match the code that was audited in this repo.

Tech stack

Boring choices where it matters; sharp ones where it counts.

Next.js 16React 19TypeScript (strict)Tailwind v4Kali Linux (kali-rolling)Dockers6-overlayPython 3FlaskClaude OAuthMCPPlaywright (headless)

Rules of engagement (non-negotiable)

Authorisation is mandatory. Never scan, intrude on, or attack any system you do not own or have explicit written permission to test.
Legal tools and sources only. OSINT means publicly or legitimately accessible. No stolen credentials, no illicit data brokers, no unauthorised access.
No hack-back. No vigilantism. Retaliation against identified attackers is illegal in most jurisdictions and destroys evidence. Investigate → document → hand off to law enforcement.
Protect the victim twice. Redact PII before sharing. Treat their data with at least the care you'd want for your own.
No fabrication. Unverified leads are labelled as such. Attribution needs corroboration — single-source claims are hypotheses, not conclusions.

Engagement-only · not for resale or redistribution

Winter Soldier bundles a full Kali Linux pentest stack with an AI investigator pipeline. It is a dual-use toolkit, and Alchatex does not license or sell it as an off-the-shelf product. There is no public download, no SaaS tenant, and no reseller channel.

Engagements are scoped per client, gated on demonstrated written authorisation to test the targets in question, and delivered as a private container build owned by the operator commissioning the work. Redistribution, sublicensing, or transfer of the image to third parties is not permitted under those terms.

The case study on this page exists so you can decide whether the capability fits the work you have. The artefact itself does not.

Have a case that needs a console?

Whether you're responding to a phishing incident, helping a small business through a ransomware tail, hardening a stack before launch, or running an authorised pen-test engagement — let's talk about how Winter Soldier (or a tailored variant) fits.

Start the conversation →Other case studies