AgenticVision

What happens when an AI agent can see, remember what it saw, and reason about visual change over time?

12 capabilitiesClick to explore each

Core Perception Capabilities

4

Comparison Capabilities

3

Quality & Memory Bridge

4

Parameter Safety

1

All Together Now

It's Wednesday morning. A QA engineer reports: "The product detail page looks broken on mobile. The 'Add to Cart' button is hidden behind the image carousel. It was fine last week."

Step 1: Capture the broken state

vision_capture at mobile viewport (375x812). CLIP embedding computed in 47ms. Quality score: 0.84. Labels: "product-detail-mobile-broken." Session token in URL redacted to [redacted-secret]. Capture ID: #1089.

Step 2: Find the last known good state

vision_similar searches 847 observations in 1.5ms. Top match: capture #1034 (similarity 0.88, from last Tuesday). Selected as baseline.

Step 3: Pixel-level diff

vision_diff in under 1ms. Result: 18% of pixels changed. Three regions: carousel area expanded (+120px), Add to Cart button shifted and overlapping, product description pushed down.

Step 4: Compare with baseline

vision_compare: broken vs good = 0.88 similarity. Good vs two-weeks-ago = 0.97 — the page was stable before. Regression happened between Tuesday and today.

Step 5: Link to cognitive memory

Decision node created in cognitive memory. vision_link connects both captures as evidence_for. Protected from storage pruning. CAUSED_BY edge to the pixel diff fact.

Step 6: Quality-weighted confidence

Both captures above 0.80 quality. Pixel diff unambiguous. CLIP similarity drop (0.97 → 0.88) confirms genuine regression. Confidence: 0.92.

Diagnosis (confidence: 0.92): Mobile product detail page regression. Image carousel height increased ~120px, pushing "Add to Cart" below the fold at 375px. Visual evidence: 18% pixel diff. The page was stable for 2 prior weeks (0.97 similarity). Both evidence captures high quality (0.81 and 0.84). Recommend checking carousel CSS changes — likely a max-height or aspect-ratio change.

Seven steps. Two captures compared. Three regions identified. One diagnosis with 0.92 confidence, backed by quality-scored visual evidence linked to cognitive memory. The entire workflow took under 200 milliseconds of computation.

In plain terms

This is the difference between a developer who says "something looks off" and one who says "here's the before screenshot, here's the after, here are the three regions that changed, and here's exactly when it broke." AgenticVision turns visual debugging from guesswork into forensics.

View Repo Explore Capacity Surface