Show HN: A2Apex – Test, certify, and discover trusted A2A agents (a2apex.io)

Hauk307 1 day ago

Hey HN,

I built A2Apex (https://a2apex.io) — a testing and reputation platform for AI agents built on Google's A2A protocol.

The problem: AI agents are everywhere, but there's no way to verify they actually work. No standard testing. No directory of trusted agents. No reputation system.

What A2Apex does:

- Test — Point it at any A2A agent URL. We run 50+ automated compliance checks: agent card validation, live endpoint testing, state machine verification, streaming, auth, error handling.

- Certify — Get a 0-100 trust score with Gold/Silver/Bronze badges you can embed in your README or docs.

- Get Listed — Every tested agent gets a public profile page in the Agent Directory with trust scores, skills, test history, and embeddable badges.

Think of it as SSL Labs (testing) + npm (directory) + LinkedIn (profiles) — for AI agents.

Stack: Python/FastAPI, vanilla JS, SQLite. No frameworks, no build tools. Runs on a Mac mini in Wyoming.

Free: 5 tests/month. Pro: $29/mo. Startup: $99/mo. Try it at https://app.a2apex.io

I'm a dragline operator at a coal mine — built this on nights and weekends using Claude. Would love feedback from anyone building A2A agents or thinking about agent interoperability.

c5huracan 1 day ago

The trust scoring layer is the interesting part here. The agent ecosystem has a discovery problem and a trust problem, and most tools today only tackle discovery. Being able to evaluate reliability before you connect changes the calculus.

Curious how the trust score works in practice. Is it purely automated test results, or do you plan to incorporate usage signals over time (uptime, response quality)?

Hauk307 1 day ago

Right now it's purely automated,50+ compliance checks against the A2A spec (agent card validation, endpoint testing, state machine, streaming, auth, error handling). Each check is weighted and rolled into the 0-100 score.

But you're right that automated spec compliance only tells part of the story. The roadmap includes usage signals, uptime monitoring, response latency tracking, and community ratings from developers who've actually integrated with an agent. The spec tells you if an agent CAN work. Usage data tells you if it DOES work.

The profile pages are designed with that in mind, test history over time already shows trends, and adding real world signals is the natural next layer.