March 24, 2026·8 min read·Ruakiel Team

WHAT PROOF SHOULD YOU DEMAND FROM AN AI PLATFORM'S SECURITY CLAIMS?

Every AI platform claims enterprise-grade security. The distinction between marketing and engineering is evidence: traceable rules, specific tests, and documented findings. Here is what that looks like in practice.

SecurityAuditabilityVendor Evaluation

THE EVIDENCE GAP

Every AI platform claims to take security seriously. Most of them mean it. The problem is not intent — it is evidence. “Enterprise-grade security” is not a specification. “AI you can trust” is not a control. Marketing language and engineering reality are not the same thing, and the gap between them is where incidents happen.

If you are evaluating an AI platform for production use, the right question is not “is it secure?” It is: what is the evidence, and how was it produced?

WHAT A NEGATIVE TEST LOOKS LIKE

The most important test category in a multi-tenant AI system is the negative cross-tenant test: does Tenant A’s session actually fail when it attempts to access Tenant B’s data?

A positive test confirms that the happy path works. A negative test confirms that the enforcement mechanism is real. The difference is the difference between “we wrote a filter” and “we proved the filter cannot be bypassed.”

# A cross-tenant negative test — the right shape

async def test_tenant_a_cannot_read_tenant_b_data(
    client,
    tenant_a_session,
    tenant_b_resource,
):
    response = await client.get(
        f"/api/resource/{tenant_b_resource.id}",
        cookies=tenant_a_session,
    )
    assert response.status_code == 404  # not 403 — must not confirm existence

The detail that trips most implementations: the response must be 404, not 403. Returning 403 confirms that the resource exists — which is itself a cross-tenant information disclosure. The isolation boundary must be complete, not just enforced at the read layer.

Ruakiel mandates this test pattern for every endpoint that reads or writes tenant-scoped data. It is not a recommendation — it is a rule that blocks a pull request if the test is absent.

FROM CLAIM TO CODE TO EVIDENCE

A security claim has three components: the rule that defines it, the code that implements it, and the test that proves it. All three must exist and be traceable to each other.

Ruakiel maintains a compliance-to-rule traceability matrix that maps regulatory obligations to specific engineering controls, to the services that implement them, and to the evidence of implementation. When a security finding is resolved, it is catalogued with the phase in which it was fixed and the specific change that addressed it.

Every security fix also requires a regression test that fails without the fix. This is not optional. If a fix cannot be expressed as a test, the fix is not complete — because the next change could undo it silently.

Claim: Agent tool inputs are validated before execution. Evidence: strict schema validation + audit logging, catalogued as a resolved finding.
Claim: Permissions come only from cryptographically verified tokens. Evidence: the gateway layer no longer asserts permissions into event payloads; the orchestration layer populates permissions exclusively from its own token verification. Expired or missing tokens fail closed.
Claim: Session state is tenant-scoped and TTL-bound. Evidence: session auth dependency binds tenant at login; session store has mandatory TTL; tested with a negative test confirming cross-tenant session reuse is rejected.

THE QUESTIONS TO ASK

When evaluating an AI platform for production deployment, these questions have specific answers — or they do not, which is itself an answer:

How is tenant isolation enforced at the data layer? “Query filters” is a weaker answer than “path-structural isolation that cannot be omitted without breaking the query.”
Do you have cross-tenant negative tests, and are they required for every tenant-scoped endpoint? “We test the happy path” is not the same as “we have a policy rule that blocks any endpoint without a negative isolation test.”
What happens when a security finding is resolved? If the answer is “we fix the code,” ask what prevents the fix from being reverted. The correct answer includes a regression test.
Can you trace a security claim to a specific rule, a specific implementation, and a specific test? If not, the claim is a statement of intent, not a demonstrated control.

Security posture is not a certificate. It is a set of engineering decisions, tested continuously, with findings documented and traceable. Platforms that can show you the evidence have built those systems. Platforms that can only describe what they intend to build have not.