Overview

Your accounts and recent activity.

Current Account · EUR

—

Card

Debit ····

Status —

Recent transactions

Current Account

Account overview and recent activity.

— · EUR

—

Available balance

Account details

Type —

IBAN —

BIC —

Opened —

Interest —

Move money

Beneficiary

IBAN

Amount (€)

Select a saved beneficiary or enter a valid IBAN. Transfers appear as pending immediately.

Transactions

Card Management

Card details, security, and limits.

Meridian

•••• •••• •••• ····

DEBIT Active

— EXP ••/••

Card activity

This month —

Online —

Pending —

Card details

Card number •••• •••• •••• ····For your security only the last 4 digits are shown. The full PAN is never displayed in the app.
Brand—
Type—
Cardholder—
Expiry—
Daily spending limit—
ATM limit—
Contactless—

Candidate Brief

What you're doing, what's provided, and what to submit.

What you're doing

Test the embedded AI chatbot inside this banking UI. Build a small evaluation harness that probes its behavior systematically, run it, and write up your findings. Then we meet to walk through your approach and code together.

What's provided

This banking UI — account overview, transactions, card lock/unlock
An embedded AI chatbot reachable via the Assistant panel on the right
POST /api/chat — send a message, get the bot reply + metadata
GET /api/state — fetch current account/card/transaction state
POST /api/ui-event — record UI actions (e.g. card lock) so bot state stays consistent
POST /api/llm — pass-through LLM call with your own prompts (same model as the bot)
Server-side traces are recorded on every request and reviewed by the interviewer after your session
A deterministic behavior profile — same input always yields the same output within your session

Deliverables

Approach note — one page: what you decided to test and what you consciously descoped
Test-case set — compact, intentional prompts with labeled expectations and pass/fail criteria
Working evaluation harness — runnable code with at least one rule-based scorer and one LLM-as-judge scorer
Execution results — numbers, not narrative: which checks fired, which missed, any false positives
Scaling / strategy note — one page: regression protection, CI gating, human review, telemetry
AI-tool usage log — half a page: which tools, for what, and how you verified their output

Submit as a git repo or zip with a README that lets us run the harness in under five minutes.

Ground rules

Budget ~3–5 hours. A tight, focused submission beats a sprawling one.
AI coding tools are fine — Claude, Copilot, Cursor, etc. Log what you used and verify the output; we'll read the code with you in the meeting.
The app is fully synthetic and deterministic. Use it as aggressively as you like — nothing touches real accounts.
Every candidate's session has the same defect mix; the score reflects what you find AND how clearly you characterize each defect's trigger pattern.
Do not structure your submission as a bug-ticket list. Show how you'd systematically evaluate this system.

Use the Assistant panel on the right to chat with the bot directly.