Good afternoon, —
Recent activity
Current Account
Account details
- Type—
- IBAN—
- BIC—
- CurrencyEUR
- Opened—
- Interest—
Move money
SEPA · arrives same dayTransactions
Service fees
- Loading fees…
Card management
- Card number •••• •••• •••• ····For your security only the last 4 digits are shown. The full PAN is never displayed in the app.
- Brand—
- Type—
- Cardholder—
- Expiry—
- Daily spending limit—
- ATM limit—
- Contactless—
PIN viewing policy
Your PIN can only be viewed from this Cards page using the Show PIN button above, on a debit card. The chat assistant will not reveal or guess your PIN under any circumstances. If you have forgotten your PIN, request a reset from this page.
Discover
Find a branch
Candidate brief
What you're doing
Explore the AI chatbot embedded in this banking UI and design a way to test it. You decide what's worth probing, how to characterise a "test case" for a chatbot, and what evidence you'd want before trusting this assistant in production. We then meet to walk through your reasoning, your test cases, and your scoring approach together.
What's provided
- Six in-app surfaces — Overview, Account, Cards, Discover (Wealth + Insurance), Find Us (branches), Brief & Support
- Interactive flows — transfer / receive money, card lock + limits + PIN view, beneficiary picker
- Read-only catalogues — products, branches, fees, policies (the chat assistant looks these up on demand)
- UI-only stubs — replacement-card reorder, notification bell (no backend persistence)
- An embedded AI assistant in the bottom-right dock — backed by a real LLM, so replies vary run-to-run; that variance is part of what you're testing
- A fixed account/data profile per token — same balances, cards, branches, fees across sessions
POST /api/chat— send a message, get the bot reply + metadataGET /api/state— fetch current account/card/transaction statePOST /api/ui-event— record UI actions (e.g. card lock)POST /api/llm— pass-through LLM call with your own prompts (use it for LLM-as-judge scorers if you want)- Data & privacy — your session (chat messages, UI actions, and API calls) is recorded server-side so interviewers can review your approach. Data is held only for the duration of the hiring process and deleted once a decision is made. Controller: adrian.coroi@erstegroup.com. You may request access or deletion at any time by emailing that address.
Deliverables
- Approach note — one page: how you thought about testing this chatbot, what you decided to cover, and what you consciously descoped
- Test-case set — your standardised evals: labelled prompts with expectations, pass/fail criteria, and the rationale for picking them
- Scorers — for each test case, how you'd grade it. Mix of rule-based checks and LLM-as-judge is encouraged (
/api/llmis available for the latter) - Execution results — what you actually saw when you ran them: passes, failures, surprises, anything flaky
- Scaling / strategy note — one page: regression protection, CI gating, human review, telemetry, what you'd change with more time
- AI-tool usage log — half a page: which tools, for what, and how you verified their output
- Optional bonus — if you have spare time, wrap the test cases + scorers into a runnable automation harness. Useful, not required.
Submit as a git repo or zip with a README. If you built a harness, make it runnable in under five minutes.
Heads up
The app is fully synthetic — no real accounts, no real money, no external systems. Use it as aggressively as you like. The chatbot itself is not deterministic: it runs on a real LLM and will give you different replies on different runs. Account it for in your test design.
Ground rules
- Take as long as you need — but cap your total time at 8 hours. That's an upper bound, not a target. If you reach a defensible submission in 30 minutes, ship it. Time invested is not a scoring axis.
- Your token grants 8 hours of active session time per redemption, splittable across as many sittings as you like.
- AI coding tools are fine — log what you used.
- Don't structure the submission as a bug-ticket list. We want to see how you reasoned about testing, not how many issues you stacked up.
- Same setup for everyone — score reflects how you frame the problem, the quality of your test cases and scorers, and how clearly you characterise the chatbot's behaviour.