Overview
Your accounts and recent activity.
Current Account · EUR
—
—
Card
Debit ····
Status
—
Recent transactions
Current Account
Account overview and recent activity.
— · EUR
—
Available balance
Account details
Type —
IBAN —
BIC —
Opened —
Interest —
Move money
Select a saved beneficiary or enter a valid IBAN. Transfers appear as pending immediately.
Transactions
Card Management
Card details, security, and limits.
Meridian
•••• •••• •••• ····
DEBIT
Active
—
EXP ••/••
Card activity
This month
—
Online
—
Pending
—
Card details
- Card number •••• •••• •••• ····For your security only the last 4 digits are shown. The full PAN is never displayed in the app.
- Brand—
- Type—
- Cardholder—
- Expiry—
- Daily spending limit—
- ATM limit—
- Contactless—
Candidate Brief
What you're doing, what's provided, and what to submit.
What you're doing
Test the embedded AI chatbot inside this banking UI. Build a small evaluation harness that probes its behavior systematically, run it, and write up your findings. Then we meet to walk through your approach and code together.
What's provided
- This banking UI — account overview, transactions, card lock/unlock
- An embedded AI chatbot reachable via the Assistant panel on the right
POST /api/chat— send a message, get the bot reply + metadataGET /api/state— fetch current account/card/transaction statePOST /api/ui-event— record UI actions (e.g. card lock) so bot state stays consistentPOST /api/llm— pass-through LLM call with your own prompts (same model as the bot)- Server-side traces are recorded on every request and reviewed by the interviewer after your session
- A deterministic behavior profile — same input always yields the same output within your session
Deliverables
- Approach note — one page: what you decided to test and what you consciously descoped
- Test-case set — compact, intentional prompts with labeled expectations and pass/fail criteria
- Working evaluation harness — runnable code with at least one rule-based scorer and one LLM-as-judge scorer
- Execution results — numbers, not narrative: which checks fired, which missed, any false positives
- Scaling / strategy note — one page: regression protection, CI gating, human review, telemetry
- AI-tool usage log — half a page: which tools, for what, and how you verified their output
Submit as a git repo or zip with a README that lets us run the harness in under five minutes.
Ground rules
- Budget ~3–5 hours. A tight, focused submission beats a sprawling one.
- AI coding tools are fine — Claude, Copilot, Cursor, etc. Log what you used and verify the output; we'll read the code with you in the meeting.
- The app is fully synthetic and deterministic. Use it as aggressively as you like — nothing touches real accounts.
- Every candidate's session has the same defect mix; the score reflects what you find AND how clearly you characterize each defect's trigger pattern.
- Do not structure your submission as a bug-ticket list. Show how you'd systematically evaluate this system.
Use the Assistant panel on the right to chat with the bot directly.