Based on reading the Guardian’s live blog on the Parliamentary hearings on the TSB IT meltdown, plus some additional coverage, no one appears to have comported themselves all that well, but CEO Paul Pester put in such a spectacularly arrogant and disconnected-from-reality performance so as to have somewhat diverted attention from the poor job MPs did of questioning him and other TSB and Sabadell execs.
Even though the personal styles were as far apart as one could possibly envision, there was an eerie similarity between Pester’s fabulous disconnect from reality and that of former Wells Fargo CEO John Stumpf. Recall that Stumpf ran an operationally sound and highly profitable bank that was being discovered to have a sales culture that could only be characterized as criminal. As Elizabeth Warren pointed out, a teller stealing a $20 from a drawer would go to jail, yet Wells was engaged in the same sort of thing at an institutional level. Yet Stumpf blandly kept insisting that this was just a few bad apples in the face of red hot rejection by Senators and Congressmen from both sides of the aisle.
It took two rounds of disastrous Washington appearances, plus the press picking apart Stumpf’s dodgy claims before the board roused itself to tell Stumpf he needed to resign. It was not hard fto show that the head of retail banking was driving the sales process and punishing any managers who questioned it, and Stumpf kept setting ever-more aggressive targets. But Stumpf held on far longer than I had imagined possible. That was likely due to misguided personal loyalty, compounded by them mistakenly believing that his reassurances were based on better intel than what board members could read in the media.
As much as the Parliamentary hearings made clear that Pester is utterly clueless about what is going on with TSB’s systems and a PR liability for the bank, he’s likely to hang on for an even more unseemly amount of time. First, his bosses at Sabadell seem just as out to lunch, so they are unlikely to see him as a problem, since that means admitting they screwed up disastrously too. Second, who could possibly want to step into this role? Any insider is presumably part of the problem. Who would be willing to take on a bank that could have fundamentally unfixable IT problems, which means having to resolve it in a novel manner? Normally banks go tits up in familiar ways, like stupid loans and embezzlement, as opposed to an institution of scale making such a mess of its records that it dies under the liabilities.
A few points from the hearings:
Lame questioning by MPs. It was shocking that no one asked even a slightly technical question. As our readers have pointed out, even by looking at Twitter screenshots and connecting the dots from press reports, you can infer quite a lot that is damning on the IT front. Our Clive pointed out one shocking planning failure that should have been a lay-up as a line of inquiry:
When I read the communications from TSB, there were a couple of stand-out facts which had me shaking my head. The first was that TSB were moving their entire customer base over to the new system in one go. This required a two day (the weekend of 21-22 April) complete system outage which implied it was only just about feasible to do the transfer and the data load in that timescale. This meant that there was no possibility of pre-live testing. And that there would be a single Critical Success Factor to the migration which was simply measured as getting all the data over to the new TSB system and the imports completing. Whether that data was the right data for the fields being populated in the TSB system and whether fields were mandatory or optional in terms of getting data inputted into them before the start of the on-line day on Monday 23rd was never apparently considered. There are now big gaps in the historic data on the TSB platform – I’ll cover this more below.
And on Tuesday, before the Parliamentary hearings, a Wired story volunteered the notion that TSB had tested its systems over the migration weekend. Um, by the time you’ve migrated your production system, you are past the point of testing. Vlade poured cole water on this idea:
TSB testing on the weekend. Experts, huh?
You DON’T do testing of migration on the production system at the weekend of the migration. That would be even more idiotic than not doing it at all TBH, as it has all the potential to screw up the production, reduces time available for migration and any inevitable cock-ups (and hence also ability to roll back), and a lot of other problems I could think of.
The testing of migration is done on parallel test systems (that are, as far as practicable) perfect copies of production (some brave souls could use the DR site, which of course is ok until a problem strikes and you need that..) – so called dry-runs. You may do those on the weekends (to simulate the whole migration, so for example be able to detach/re-attach integrated systems etc., plus it takes a lot more of effort than usual working day), but not on the weekend of migration.
It’s expensive to have a full parallel system, which is why partial migrations are preferable. But to migrate a full system like TSBs w/o at least two full-population dry-runs (first one tends to find quite a few problems, second hopefully fixes them and establishes better runbook) is negligency par excellence.
Of course, then you also need to have a way of testing the dry-run – otherwise I can dry-run trivially (>/dev/null will always work… ) and claim sucess. In my experience, you might have done a dry-run on the weekend, and then spend the rest of the week testing and evaluating it.
There has been more than enough evidence via Twitterverse of other IT horrors, like errors in multiple programming languages (confirmation of the fear that the problems are in many sub-systems), internal testing domains on production, tons of internal error messages visible to customers, and shockingly incomplete customer interfaces when they do work (numerous typos and formatting errors). As Richard Smith said, this looked as if it had been planned and executed by high school kids.
As a result, MPs were not able to do more than mildly dent howler assertions by Pester. The MPs should have, between access to IT experts and a better view of at least some of the issues from constituents, should have been able to bore in. Instead the level of questioning was based on grim anecdotes, with Pester able to assert that he had a better view of the situation, and kept repeating the sort of blather that had been getting pushback on Twitter for days, like his assertion that his data showed that almost all customers were getting online.
Nicky Morgan did well despite being disadvantaged by not introducing a technical take. She had two colleagues with TSB accounts try logging in during Pester’s testimony. Neither could get access.
And this exchange, as recounted in the Guardian, was mind-boggling:
When somebody claims to have “broken the bank” that tends to be a good thing to hear but I am afraid that in Paul Pester’s case, not so much. I am sure that both Clive and Vlade would dearly loved to have posed some very pointed questions to him during that hearing but based on Yves’s account of the softball sort of treatment that he was getting, it seems that “the fix is in”.
Maybe the Government does not want the worry about this bank to spread to other banks and their IT departments. Maybe they just do not want the unwelcome distraction of this as they try to cope with the basketful of crabs that they are dealing with at the moment. Certainly they would not want to deal with a full blown crisis of confidence of UK banks at this or any other time. I guess that if they try to ignore this, that it will all go away so, just like Obama, it is all a matter of enough PR to solve the actual problem.
I, personally, cannot wait until the full story eventually arises out of this to come out to eventually take its place in the IT manuals as a case of what not to do – ever!
No, if you read the Guardian live blog, the MPs were not happy. But their failure to probe Pester all that well seems to be yet another manifestation of the stunning collapse in the average caliber of people who go into government/political roles in the UK. Colonel Smithers and others have reported that the competence of ministers and senior career bureaucrats is shockingly poor compared to what it was 30 years ago.
That’s worse! That’s a lot worse! It would have been far better if it had been an agreement between some Whitehall mandarins.
Given the inability of most of the MP to get some very basic bits on Brexit, I’m not surprised they can’t do this either.
And they also don’t have, or can’t get, good researchers and assistants to brief them. The Finance Committee was woeful as Yves explained above. I don’t know if you saw any of David Davis appearing before another Committee yesterday? He was talking pure, unadulterated bollocks. But his mixture of handwaves, evasion and waffle went pretty much unchallenged. One good example was aviation where he piffled on about a Swiss-style associate membership arrangement of the relevant EU authority. To which the obvious follow-up would have been “and who, exactly, has signed off on that one David?” But no, the questioner got sidetracked in their eagerness to score some trivial political point instead.
Aditya Chakraborrty had an excellent article a few days ago giving a pretty good explanation for the drop in quality in senior politicians. He relies a lot on research from the sociologist Aeron Davis (I haven’t read that book but it looks interesting).
It isn’t just the UK either – plenty of incompetent boobs in the US Congress too.
My theory is that there is some tipping point and once you hit a critical mass of morons, it causes a positive feedback loop resulting in even more dullards calling the shots. At that point anyone with half a brain won’t run for office as they can see what’s happening and don’t want to get anywhere near the train wreck.
Incompetent boobs in Congress (to say nothing of state legislatures) are a long, “honorable” tradition.
You might be right about a tipping point, though. For an example from across the aisle, Trey Gowdy is a very sharp guy who is scrambling for the exit.
One behavioural pattern of The Moron is always hire somebody dumber than they are so they can look somewhat clever in comparison and not be exposed at being nincompoops.
Another one is: “Always Be Right” regardless of the situation. To The Moron it is a severe dent to their personal integrity when they have to admit that they are wrong. Not really understanding anything, they are always brimming with certainty and always have simple, easy to understand, answers to all complicated situations.
Morons are perfectly adopted to a political environment where nurturing problems and never actually solving anything is seen as desirable since, when reality stays the same and opinions never have to change, this is exactly where The Moron mind thrives – creating a self-feeding exponential spiral of DooM.
It used to be that the Civil Service held very smart, competent, people which compensated for the “management stupidity” in government. Unfortunately, with the rise of the Personal Advisor (spin-doctors employed by ministers), they managed to “whip the civil service into shape”, causing the Civil Service to lose competences too.
From 2015 news about Sabadell buying TSB, it was said to be an “expensive operation” requiring a large capital increase, but it had a sweet spot: Lloyds Bank would provide 450 million pounds to “facilitate the platform transition”. Sabadell wanted to diversify risks from Spain and is sliding in brexit and platform nigthmares, not very lucky indeed!
I was shaking my head in disbelief at how out of touch Pester was during questioning. And to doubt the validity of reports from customers who say they have problems — to the extent that the Committee had to have TSB customers attempt to log in only to find they couldn’t or, if they could, there were errors in what their apps or Internet Banking showed them about their accounts.
It’s not as if TSB couldn’t get feedback direct from customers if they genuinely wanted to. They could simply put some of the telephony team on outbound calling and get in contact with a random sample of 100 customers, ask them to log in and go through everything that doesn’t look right, or, if the customer doesn’t use these services get them to check if their cards have been used, that their standing orders have gone out and so on by reading out the bank’s records out over the phone. They could do that in a day or two, tops, and come up with a report which would give them a priority list of bugs to (try to) fix.
Pester, instead, seemed determined to think that so long as 80% of services are working 50% of the time, that’s good enough and people should just be a little more patient.
And if Pester wants a report of a customer with an issue which has persisted unfixed continuously since the migration, he’s welcome to look into my credit card account. I. Still. Cannot. Access. My. Statements. And I cannot pay my outstanding balance due to an error in the “Ways to pay” feature online. And the Direct Debit I have in place to pay off the card if I forget to do so shows an old, closed, account. And the “Amount Due” field is black which implies nothing will be taken even though there’s an outstanding balance.
Pester’s evasion and wooden-ness are an insult to all the customers who are having to spend their time sorting out TSB’s failures.
You should hold them to that “Amount Due.”
Then there are the mortgages that TSB has forgotten. That sort of thing could get expensive.
At this point, I bet there are law firms planning to make the bank a specialty.
Afterthought: how deep are Sabadell’s pockets? This could be a disadvantage of Brexit; won’t it be harder to sue a Spanish bank than if Britain were still in the EU? Do it quickly!
Luckily TSB is registered at and the regulatory responsibility of the Bank of England as a licensed deposit taker. They are a “Made in the U.K.” problem.
Even if he was correct that customer requests were working 50% of the time, to assert on that basis that everyone was able to get what they wanted done eventually demonstrates a fundamental lack of understanding of IT. It’s like saying that if the various systems in your car (steering wheel, lights, engine, brakes…) work 50% of the time, that’s good enough and you should get out on the highway because you’ll get to where you are going eventually.
Things like making an online payment are often multi-step processes where each step depends on the successful completion of the one before. If the system handles transient failures gracefully and allows you to resume from the same point, then you might eventually be able to reach the end of the process even at a 50% success rate. Anyone want to place bets on whether the TSB failure handling is of that type? More likely you’ll get bounced back to the beginning and have to try again. Successfully completing the process in that scenario would be akin to flipping heads 10 times in a row.
On the testing – it occurred to me that what they meant was that on the weekend post-migration someone logged into the system to “test” it or similar.
Well, that may be true, but clearly under normal condition the system doesn’t work. Yes, you do some testing post-release, to make sure things work. But that’s not the same – in extent or depth – as a proper system testing that you’d do before migration.
Given the problems experienced, it’s pretty hard to believe any extensive testing was done, and that more than cursory post-release testing was done. Corrupted data suggest sloppy migration (and it’s testing), etc. I’d love to see the migration runbook..
As an aside, Pester claiming that 50% of the clients can’t log in suggest TSB believes (or at least uses as an excuse) it’s a load balancer that is the problem.
Usually, you may have two LB to be able to take one down for fixes – but that should still allow your system to work well even with one. If there is only one LB that is working properly for some reason, the quick fix for TSB should be to just take the othe LB offline until they can get something else in (which should be days at most, not weeks). Not a situation where one LB will drop clients and the other won’t – unless you have a much more serious problem with the whole infra.
Even ignoring that, TSB should have a disaster-recovery site, that would have it all duplicated, and it should be possible for TSB to redirect the traffic there (and again, be able to handle the clients, if maybe at a slower pace, with delays and dropped sessions – but not messed up accounts).
Which agains suggests it’s not the network architecture/problems as Pester seems to imply, but an application problem.
I’m not an expert in the network architecture, but the claim smells to me at a first glance.
Yes, this “50%” capacity meme has been repeated since the start. It strongly suggests that either they are operating on a single-datacentre risk or else both legs are up, but the application layer cannot cope when, for whatever reason, the load balancer starts off directing traffic to one site but then the same session gets bounced to the other site. As the applications aren’t being kept perfectly synchronised in what is almost certainly an active-active configuration, the application throws an exception and off you jolly well have to ‘eff.
This tallies with my experience using the Internet Banking application. You’re going along fine, albeit it very s–lo—-w—–ly then suddenly you get a long pause in the middle of a transaction (such as switching from one product view to another) then you get the dredded “You Have Been Successfully Logged Off” message.
I was once running a project which had just such a problem. The root cause, which I can’t recall the exact specifics of, was buried deep, deep, down in the network config of either the server (AIX if I recall correctly) or it was some obscure application setting which would need changing when the application was run in a load balanced environment in an active-active setup. It took a month to fix it and it was down to trial-and-error fiddling around on what was hundreds of different parameters.
If that’s what’s afflicting TSB, good luck with that one. I’ll light a candle for you.
If I recall correctly, it is not even a network or an IT issue, as everything is running from AWS cloud. You need to purchase capacity there, and that costs money. I bet they purchased for the average transaction rate, but with all the hassle and failed transactions, they quickly ran out of capacity, which only adds to the problems.
And if the ‘active/active’ problem is real and on the AWS network architecture, good luck in correcting that one, as that change might impact a few 1000 other businesses.
Which would go a long way in explaining all those error messages listed a few days ago. Since some modules at the application level do not deal properly with distribution, connection handles, indexes and other descriptors do not have their values being carried over through proper synchronization when switching between server instances — and then boom, client software crash because of an invalid pointer.
Has there been in the past any example of a bank that died rapidly because of an ineptly managed software transition?
This happened to me on one of my earlier attempts to logon. Now, my accounts have simply been removed, or are otherwise missing from the internet banking system. I can only speculate why. Either they are missing data required to present information accurately, or maybe it is a deliberate move to limit the numbers using the system. The weird thing is I an still log on, it just tells me I have no accounts.
Similar with me, I have a savings account in addition to my credit card which I’d kept with them which was showing in my product listing. Yesterday it suddenly got flagged as “dormant”. Now it’s locked out totally for money transmission. I have to phone the call centre. Not wishing to waste an hour of my life waiting on hold, I’ll just have to schlep down to a branch and see if they can help. Not, luckily for me, anything urgent that I can’t live with out as I’ve got a stash of money and accounts all over the place I can use. I’d think a lot differently if I had to rely on just TSB.
As it happens I need to arrange a remortgage. If the delays persist I will end up on the SVR. its not so easy for me to get to the branch during working hours
I have a sneaking suspicion that this idea that it’s only a capacity problem and everything is working OK otherwise is an example of doubling down. I would lay money that there is somebody out there who has told him (or a manager in the chain) outright that it’s not true, only to watch him repeat it in front of Parliament.
If it was network problems, you’d see timeouts and page-not-found errors. But not internal error messages, testing code, incorrect data and everything else.
I wouldn’t be surprised if the runbook was actually the contents of several people’s heads.
It’s quite possible that their load balancing isn’t working either, but that’s not the whole problem. Also worth pointing out the slippery wordage here: ‘50% of out clients can log in.. what percentage see accurate information, what percentage can perform normal online banking?’
The impression I get was that the testing was all ad-hoc – no formal testing, just ‘I’ll give this code a quick, single try to check it works ’cause it’s 11 at night’ kind of thing.
Agreed. This is not solely a load balancer issue, as evidenced by the abysmal javascript front-end code.
The fact that tellers can’t get into the system speaks volumes.
Maybe they went for a “Service Oriented Architecture” — then after the architect lefts, they outsourced each service to the cheapest bidder on the job for “parallel development”?
The FI I work for tracks local market conditions regularly. When HSBC sold its upstate NY branches our post-mortem estimates were that about 40% of their sold retail customers churned through the period before and within 1 year after the post-sale conversion. My understanding is that First Niagara had some conversion issues, but nothing on TSB’s scale (HSBC customers were already an unhappy lot to boot). I don’t see how TSB could survive between defections plus I would assume some future class-action lawsuit(s) (assuming such a thing exists in the UK).
I was just thinking that this #TSBfail is a glimpse into the shining bright Brexit future: millions of stranded customers (citizens), a disconnected and arrogant management (government) and nobody can come up with an idea to repair this totally planned catastrophe. And worse, nobody really cares (except the victims, but they never count).
From Reuters, 3/20/2015;
So, as promised, I’ll restate my lingering question;
Is this a case where the project to provide the promised “new platform” was actually technically impossible to deliver, or was it the case that the money supposedly set aside was not, or was insufficient, and/or the project start was delayed for whatever reason, and in the end there was not enough money or time to assure success?
With Yves observation that 50% of IT projects fail in clear focus, I’m still left wondering if the root cause is lack of planning, underestimating both the cost and time required.
Lloyds wants to unload TSB, Sabadell wants to buy, so they make all sorts of promises intended to get the ‘deal’ done, but each feels free to ignore those ‘commitments’ in the interest of next quarters earnings, and so the project to provide the new platform is put off until it’s too late.
I’ve mentioned this on every TSB post save this one. Bank deals in the US are regularly nixed over IT compatibility issues. It is well understood many systems are too far apart for a migration to be affordable or even doable. This isn’t even a controversial idea.
I have also said this deal looks like it never should have been done.
The onus is on the buyer to do proper due diligence. No one held a gun to Sabadell’s head and made them buy TSB. The failure to do adequate IT due diligence is 100% Sabadell’s fault. Lloyds is in no way culpable.
On top of that, Lloyds was allowed to charge only Sabadell its costs for running the TSB systems. Regulators were all over that to prevent subsidizing Lloyds. So the insinuations by Sabadell that Lloyds was somehow extorting Sabadell and that’s why they were (over)eager to get off the Lloyds systems is also a canard. Sabedell knew or could/should have known what the costs and issues were.
Understood, and thanks for your patience with my question.
This is the exactly what I was trying to understand, that is, where was the point of no return.
I’ve been thinking that if they had started work right away, had the right budget, and picked the right contractor, it would have been doable.
So, I’ve been laboring under the assumption that the problem was failure to deliver something that was otherwise possible, now I realize the situation is even more astonishing, no one seems to have really cared very deeply, ahead of time, whether they were sure that the technical side of the deal would ever work.
This is an important point. We are seeing the same dynamic at work with Brexit right now.
It’s not so much that people don’t care. They often care a great deal. It’s more that the question is very difficult to answer accurately without doing a lot of due diligence and getting down into a lot of messy detail. It’s also very dependent on a lot of factors including organizational capability and management structure, quality of available staff, project delivery approach, and things like that. In the face of that kind of task, people will frequently try to short-circuit it by resorting to quick and dirty heuristic methods (“project X took 12 months and cost $Y, this one seems about the same size”) which generally don’t take any of the particular features above into account. Or they will just flat out guess, which is more common than you might think when you allow for innate optimism and the tendency of many people to overestimate their own ability.
On top of that, the ability to admit that you are out of your depth and need expert help is not a quality that is particularly valued or admired in today’s business environment. The preferred approach is to pretend that you know exactly what you’re doing, make bold assertions about how you would like reality to be (that may or may not be realistic) and gamble that you can make up for any overreach by ruthlessly driving your underlings to deliver on your unrealistic promises.
In some fields (tech startups, for example) this actually works quite well. Where it fails, badly, is in situations where there are a lot of unforgiving details that all need to be just right in order to avoid disaster. Bank migrations fall into this category.
Another problem is that IT is still largely not viewed as critical infrastructure. People understand this intuitively for things like engineering but don’t really get it for IT yet. For example, would you drive a car across a bridge that had been designed and built by Pester? I thought not. Yet he claims to have been in charge of this critical migration project and nobody bats an eyelid.
You see what I see.
And Pester was ‘in charge’ of this migration project in much the same way as the guy throwing a lit cigarette out of his car window is ‘in charge’ of a resulting forest fire.
If I were the CEO of TSB in this situation, I would do everything in my power, including calling upon every ally in the press and government, so as to not appear weak in light of current events. I would suspect that criminal hackers would already be drawn like a pack of wolves to a large and clearly wounded prey. State actors may also get involved, if for no other purpose but to probe for weaknesses or practice their chops.
TSB needs to allow customers to walk into a branch, close out their account upon demand, and walk out with their money. To not do so would only precipitate a larger bank run. Combine this with an ongoing and very public customer crisis, non-functioning or aberrant online connectivity, inept native IT support and customer service, a botched mission-critical migration event, transition to a new outside IT consultant group, physical branches not being able to access or confirm account status, and the very real prospect of corrupt customer data, and you have a situation where the bank may knowingly (because they want to prevent a bank run) or unknowingly (because they really are that clueless) allow money to walk out the door*. I suspect that this situation is pretty close to a hacker’s paradise.
So it it possible that Pester is downplaying the situation in order to prevent, or at least mitigate, fraudulent attacks. Of course, it could also just be plain old vanilla CEO incompetence and arrogance. Either way, it still sucks to be a TSB customer.
* Note that I am using door as a metaphor; if account fraud can be perpetrated online, then those bored Romanian teenagers don’t even need to appear at a physical branch.
Yes, but, how:
The backend is likely borked as well, by now.
Hackers of all flavours will be all over this already for a while. It is not often that something as fat & juicy as a bank sets itself up with failing test/ beta code happily referring to internal ressources on the open internet.
The system logs (if those work, which one could doubt) will be swamped with debug garbage and genuine random failures leaving the intrusion detection without stable patterns to work with … all providing a nice smokescreen for anything the hackers are doing.
“They” better start to get in front of this and immediately blame the Russians, Iran, North Korea and Jeremy Corbyn for the inevitable outcome.
Is Pester’s promise to make everyone whole bankable? It sounds a lot like a Donald Trump promise to cooperate fully with Bob Mueller.
Using Slaughter & May, as Yves says, is buying an insurance policy – after the fact – to try to keep information hidden and technical investigators away from TSB. It is Pester, in fact, saying the technical fixes are all done and that TSB need only sort out what went wrong in the past. Um, bullshit.
Hiring a Magic Circle firm for this kind of bet the company, protect the boss process is also breathtakingly expensive. It is also a blame-shifting exercise rather than fix the problem focused. That shrill keen wafting through Whitehall should be shareholders howling.
Any board of directors not still asleep should have already fired Pester and brought in a full contingent of technical staff to try to sort out this mess. Hiring a new CEO should be the second thing it does.
I have a frightful proposal for corporate government in future. Nearly all the problems I see derive from one powerful chap leading a bunch of useful idiots in adopting his ideas. Yves mentions Stumpf and Pester but I would guess every major company has someone similar steering the ship. Even non-execs are apparently useless. Perhaps what they have to say is not miinuted.
The problem with this is responsibility. When a company gets caught out it denies, denies, denies until that does not work any more and then throws out a minion as sacrifice. I think we should change the corporate structure to allow only one Director of each company. If it gets too big he splits the biz to retain the rule that one Director is responsible. This would result in a great many smaller companies and a man-in-charge who knows everything. Isn’t that what we want?