documentation

Agentic Pentesting Platform

Autonomous security testing for the x402 protocol ecosystem.

AuthX is an agentic pentesting platform where multiple specialized agents collaborate to test, verify, and validate x402 Coinbase listings. With advanced tool calling, managed payments, and autonomous workflow capabilities, it ensures protocol integrity through continuous probing.

The platform experience lets you launch any agent on demand, run follow-up prompts, and inspect tool calls, payments, and responses in real time without touching the demo flow. It’s built for iterative testing: start a chat, pick a model, and drive deep API and surface probing with the same managed x402 payments and tool stack.

Treasury Contract

EKmSMPrkw2PFDndY6CR3svQRzapQiZyg9R8LUHbSCeVB

Live Agentic Demo

Watch our agents in action as they autonomously test x402 listings on the Coinbase network. This real-time demonstration shows the agent swarm discovering services, negotiating payments, and verifying API integrity without human intervention.

Agents autonomously testing x402 coinbase listings

The Platform

The AuthX platform provides a unified command center for managing agent campaigns. Monitor real-time telemetry, track agent reasoning chains, and visualize attack vectors as they are discovered. The system handles all underlying infrastructure, from wallet management to proxy rotation.

AuthX Platform Overview

Treasury

AuthX is funded entirely by $AUTHX token creator fees. When tokens are created or traded, a portion of those fees flows into the platform treasury. This creates a self-sustaining ecosystem where the agents can operate autonomously and platform users can access x402 services without managing their own crypto wallets or payment infrastructure.

Treasury Contract

EKmSMPrkw2PFDndY6CR3svQRzapQiZyg9R8LUHbSCeVB

Token Funding

The $AUTHX token serves as the economic backbone of the platform. Creator fees collected from token activity are directed to the treasury contract, which holds the funds that power all agent operations. This model means that as the token ecosystem grows, so does the platform's capacity to run campaigns and serve users. No separate subscription or payment is required from operators.

Automatic Conversion

The x402 protocol requires payments in USDC on the Base network. Our treasury system automatically converts incoming $AUTHX fees to USDC, maintaining a liquid reserve that agents can draw from when they need to pay for protected API endpoints. This conversion happens transparently in the background, so neither operators nor agents need to think about token swaps or bridge transactions.

The platform also maintains ETH reserves on Base for gas fees. Every x402 payment requires a blockchain transaction, and gas costs are covered by the treasury. Users interact with the platform through a simple web interface while all the on-chain complexity is handled automatically.

Abuse Prevention

Because the platform pays for x402 requests on behalf of users, we enforce strict rate limits and spending caps to prevent abuse. Each user has a maximum number of agent requests per hour, and each individual x402 payment is capped at a few cents. These guardrails ensure that the treasury remains healthy and that no single user can drain resources intended for the broader community.

The limits are designed to be generous enough for legitimate security research while blocking attempts to exploit the platform as a free payment gateway. All spending is logged and auditable, so operators can review exactly how treasury funds are being used across the platform.

Tool Capabilities

Our agents are equipped with a powerful suite of tools that allow them to interact with the web, negotiate crypto payments, and maintain campaign state. These tools bridge the gap between LLM reasoning and real-world execution. Without them, the agents would be limited to generating text without any ability to observe or affect the systems they're testing.

Smart Scraper

The web scraper is the eyes of our agent collective. Modern web applications hide critical security surfaces behind JavaScript rendering, authentication walls, and aggressive bot detection, and traditional curl-style requests simply fail. Our scraping engine handles all of this transparently: it rotates residential proxies, solves CAPTCHAs when necessary, and renders pages in a full headless browser environment before extracting content.

We built this tool because reconnaissance is the foundation of any meaningful security test. Agents need to read API documentation, parse error messages, analyze client-side code, and observe how applications behave under different conditions. Without a robust scraping layer, our agents would be blind to the very surfaces they're supposed to probe.

HTTP Interface

While the scraper handles observation, the HTTP interface handles action. This tool gives agents the same networking capabilities as any standard HTTP client: arbitrary headers, custom payloads, full control over request methods, and detailed response introspection. Agents can craft GET, POST, PUT, DELETE, and PATCH requests to any endpoint they discover during reconnaissance.

Pentesting is fundamentally about sending requests and observing responses. The HTTP interface allows our agents to test authentication flows, probe for injection vulnerabilities, replay captured tokens, and interact with REST APIs as if they were a legitimate client or a malicious one. Every response is parsed and made available for the agent's reasoning, enabling it to adapt its approach based on what the server actually returns.

x402 Protocol Handler

The x402 protocol introduces a new paradigm: APIs that require cryptocurrency payment before they'll respond. This creates a unique security surface that traditional pentest tools cannot reach. Our x402 handler speaks the protocol natively: it discovers payment requirements, negotiates pricing, constructs valid payment proofs, and executes USDC transactions on the Base network without human intervention.

This tool exists because x402 services would otherwise be invisible to automated testing. An agent that can't pay can't test. By giving our collective autonomous spending authority (within operator-defined budgets), we unlock the ability to probe payment validation logic, test for nonce reuse vulnerabilities, and verify that protected endpoints actually enforce their payment requirements. The handler also tracks every transaction so operators maintain full visibility into where pentest credits are being spent.

Campaign Memory

Language models have no inherent memory. Without external state, every agent invocation would start from zero, forgetting discovered endpoints, losing track of which payloads succeeded, and potentially repeating expensive payment operations. Campaign Memory solves this by providing persistent storage that agents can read from and write to across runs.

We built this tool to enable long-running campaigns that span hours or days. Agents checkpoint their progress, log every finding, and share context with each other through this shared state layer. When a campaign resumes, agents can pick up exactly where they left off. This also prevents hallucination about past events. Instead of reconstructing history from an unreliable context window, agents query the database for ground truth. The result is campaigns that are both more efficient and more reliable.

GitAnalyzer

GitAnalyzer is our specialized tool for deep code provenance and security analysis. It scans GitHub repositories to detect "slop"—low-quality, copy-pasted, or AI-generated boilerplate—and potential security vulnerabilities. By analyzing commit history, code structure, and dependency patterns, it generates a comprehensive "Slop Score" and security report.

We built GitAnalyzer because in the age of AI-generated code, volume does not equal value. Investors and users need to know if a project is genuine innovation or just a wrapper around standard libraries. This tool provides that transparency, allowing agents to verify the technical substance of a project before any interaction occurs.

XAnalyzer

XAnalyzer focuses on social identity verification and reputation analysis. It connects directly to X (formerly Twitter) to fetch real-time profile data, tweet history, and engagement metrics. It doesn't just look at follower counts; it analyzes consistency, engagement quality, and authenticity to generate a "Trust Score".

Social engineering and fake identities are primary vectors for crypto scams. XAnalyzer gives our agents the ability to perform due diligence on the people behind the protocols. By quantifying social signals and detecting bot-like behavior, we ensure that the "who" is as verified as the "what".

Agent Collective

Every campaign runs four independent AI agents in parallel, each powered by a different large language model provider. All four agents receive the same objective, have access to the same tools, and work simultaneously, but they reason independently. This design isn't redundancy for its own sake; it's how we validate findings and reduce the risk of hallucination.

When multiple agents reach the same conclusion through different reasoning paths, confidence increases dramatically. When they disagree, that divergence itself is valuable: it surfaces ambiguity in the target system or exposes edge cases that a single model might miss. Operators can compare results across all four terminals, cross-reference findings, and make informed decisions with multiple perspectives rather than trusting a single AI's interpretation.

ChatGPT

Powered by OpenAI's GPT-4o, this agent brings strong general reasoning and extensive training on web content and documentation. GPT models excel at parsing complex API specifications, understanding error messages, and generating well-structured payloads. When testing x402 endpoints, ChatGPT often provides the most readable explanations of what it's attempting and why.

DeepSeek

DeepSeek's reasoning model takes a methodical, step-by-step approach to problem solving. It tends to be more deliberate in its tool usage, often pausing to analyze responses before proceeding. This makes it particularly effective at catching subtle inconsistencies in API behavior or payment validation logic that faster models might overlook. DeepSeek frequently identifies edge cases that other agents miss.

Grok

xAI's Grok model brings a different perspective shaped by its training methodology. Grok tends to be more aggressive in its testing approach, willing to try unconventional payloads or probe boundaries that other models might consider out of scope. This makes it valuable for discovering unexpected behaviors in x402 implementations. It's often the first to find authentication bypasses or pricing logic flaws.

Claude

Anthropic's Claude is known for careful, nuanced analysis and strong adherence to instructions. In our collective, Claude often serves as a stabilizing presence, less likely to hallucinate findings and more likely to clearly articulate uncertainty when results are ambiguous. Claude's responses tend to be well-organized and thorough, making it easier to extract actionable intelligence from its findings.

Run Workflow

Each campaign follows a consistent execution pattern. The operator submits an objective, all four agents spin up simultaneously, and results stream back in real-time. Here's what happens at each stage.

Objective Intake

The operator describes what they want to test: a specific x402 endpoint, a class of vulnerabilities to probe, or a general reconnaissance task. The platform enriches this prompt with context: current date and time, available tools, and any conversation history from previous runs in the same session. Each agent receives an identical, fully-formed objective.

Parallel Execution

All four agents launch simultaneously and begin working through their reasoning loops. Each agent decides independently which tools to invoke, what requests to make, and how to interpret responses. They stream their progress to individual terminals in real-time, so operators can watch four different approaches unfold in parallel. Agents continue iterating until they reach a conclusion or hit their iteration limit.

Tool Invocation

As agents work, they call out to the tool layer: scraping documentation, making HTTP requests, or executing x402 payments. Every tool call is logged with full request and response details. If an agent triggers a paid x402 endpoint, the payment is automatically constructed and executed within operator-defined budget limits. Failed requests are captured alongside successes, providing a complete audit trail of what was attempted.

Result Comparison

Once agents complete, the operator reviews findings across all four terminals. Consistent findings across multiple agents indicate high-confidence results. Divergent conclusions warrant deeper investigation since they often reveal genuine ambiguity in the target system or expose edge cases worth exploring. The platform preserves the full conversation history, allowing operators to continue the session with follow-up prompts that build on what was already discovered.