More on the Systemic Risk of Bank IT Systems

Posted on July 31, 2015 by Yves Smith

As we’ve written about how the complexity of payment systems and the creaky legacy which sits at the core of the transaction processing engines of banks, which perversely run fault intolerant, mission critical operations on such a shaky foundation, the more astute readers, once they get over their incredulity, quickly recognize that this combination isn’t just a Grexit obstacle, but also a source of systemic risk.

Our Richard Smith, who has extensive experience in capital markets IT, first mentioned this looming problem years ago, but it did not seem timely to make it a major focus. But we are seeing more and more evidence that legacy systems are starting to break down at major firms, both from media stories and some private accounts we’ve received.

Members of our audience who have technology experience but have not dealt with either large lumbering corporate and/or major financial firm information technology infrastructure and assert that we are exaggerating how bad things are have revealed that they don’t know what they don’t know. Some comments from recent posts:

Arthur Williams:

What many of you “it’s easy” people fail to understand is that mainframe programming is nothing like today’s coding. COBOL, PL/I etc. do not support modern concepts like objects, polymorphism or anything else. Think assembly language with nicer mnemonics. XML ? Hah, there is virtually no such thing for the mainframe. There’s no git, no mercurial etc. Virtually none of the tools that exist for Wintel/Linux are available to mainframers.

In large organizations there are hugely cumbersome change management processes. Where I am, a simple code change might take a minimum of eight weeks to deploy, and we only have a dozen systems. Actual application changes like envisioned here would take at least six to twelve months for coding and testing, and then another four months for deployment. For large banks, I would expect the timeframes to be even longer because the systems are so critical.

Anyone who says it’s trivial simply has zero knowledge of the large systems environment.

Synoia:

All modern systems become a part of the legacy patchwork, because no enterprise ever completely replaces all its legacy systems. My experience is little gets discarded and the integration links to other system multiply over time. Think n factorial in linkages or interfaces, where n increases every year.

JustAnObserver:

I’m in the IT biz, h/w not s/w but still …, Louis & Clive (+ other commentators) have pretty much nailed the difficulties of modifying, testing, and then deploying new code into an exiting, running, mission-critical system stuffed fully of dusty deck legacy. They are legion even before you get to what might be called the “user interface” aspects – in this case POSs, ATMs, PIN delivery, check clearing etc. Its not called spaghetti code for nothing (*).

However what has struck me is that there’s an assumption here that seems to be going without question: All the s/w that has to be changed is still available in source code form – be it COBOL, PL/1, Assembler, Fortran or whatever – and can be recompiled. Maybe in amongst the decades old stuff are binary blobs of machine code whose source has long since been lost and whose interface is barely defined/understood, if at all.

Timo:

One thing that a lot of people who aren’t dealing with software development day-to-day don’t realise is that while writing good code is hard enough, reverse engineering it is an order of magnitude or two harder and that’s when you’re dealing with good code.

For the old banking systems, you have to figure out the intent of the original author 30-40 years back, plus an accumulated, monstrous hairball of quick fixes. At this point it doesn’t matter if the code is OO, functional, procedural or handcrafted assembly written by someone stoned out of their minds. The programmers working to maintain the system normally aren’t the best, brightest and most senior either, more likely the cheapest and just-about-adequate ones because maintenance isn’t sexy and whatever top notch programmers the financial institution has are working on new projects that also have deadlines attached to them and often tight budgets.

There aren’t a lot of competent people that are willing and able to wander into that morass and do what it takes to get it cleaned up and fix the issue. In fact you may find more willing people than ones that are able – I made a pretty good living for a decade dealing with first generation trading systems (ie, stuff developed in the late 80s/early 90s and then carried forward for decades) that needed fixing and updating. Original authors are long gone, documentation is, err, sparse and you have a tight deadline because as usual in investment banking, everything needs to be completed yesterday.

Now look at an interconnected payment system that is even older, created using languages and technologies that people have been actively avoiding for decades and see how many people you can find who can do even a half baked job.

Yves here. Let’s look at an example of a major bank whose IT systems have started to fall apart, RBS. We’ve taken sections from a detailed July Herald Scotland article, The ticking time-bomb at the heart of our big banks’ computer systems. I strongly urge you to read it in full. The article begins by describing how a half million transactions simply disappeared from RBS’ systems in June. Here are some of the high points:

Experts have for years been warning that the legacy systems of high street banks – some dating back to the 1960s – are an IT disaster waiting to happen. Systems built on costly and unwieldy platforms regularly buckle under the strain amid warnings over vulnerability to hacking attacks.

Last year the 79 per cent taxpayer-owned RBS paid a record £56 million fine following a much larger systems outage in 2012 which left millions alienated from their cash. In 2013 a further glitch left customers unable to pay by card or withdraw cash on one of the busiest shopping days of the years….

Although RBS has the worst record IT failures, there have also been well-publicised outages at Lloyds, the Cooperative Bank, TSB and the Bank of Scotland.

And the UK’s banks are not alone. The technological meltdown narrative is mirrored worldwide in the sector. Last year banks in Europe spent an estimated £40 billion on IT but only £7bn of that was investment in new systems: the remaining £33bn was spent patching and maintaining increasingly creaky and fragmented legacy systems.

The big players throughout the developed world (with the exception of countries such as Turkey and Poland whose banks were slower to install computers in the sixties and seventies and, as a result, have fewer legacy issues to cope with now) use computer systems that have been built up and adapted over several decades.

Acquisitions have led to compromises being found and new product launches have led to bolt-on systems being patched onto layers of other complex systems surrounding the “legacy core” – typically a 1960s mainframe which carries out the bank’s central ledger functions.

A report by Deloitte from as far back as 2008 said that “many banks have now reached a tipping point where the cost and risk of doing nothing outweighs the cost and risk of taking action”. And yet, seven years on, little has since changed.

According to Chris Skinner, author of a leading book on digital banking and chairman of the Financial Services Club, banks don’t want to take the risk of replacing their “groaning” legacy systems because any serious outage would lead to more fines from regulators and could even put them out of business. Because the stakes are so high, there has been a lack of leadership in the industry to tackle the issue.

There is a kinda-sorta solution as our Richard Smith points out:

Rather than taking the plunge and replacing entire systems there is now a move among some multinational operators to set up new smaller banks unencumbered by legacy issues. A notable example was the successful launch in 2013 by BNP Paribas of its new Hello Bank! in four European countries.

While most banks keep their hardware up to date the opposite is true of their software according to veteran capital markets analyst and blogger Richard Smith.

“Software in the industry can be very ancient indeed, and if the software is very old it doesn’t really matter how new the hardware is,” he says.

Many of the designers of the original bank software are now retired or dead and this loss of knowledge about their own core systems has been exacerbated by outsourcing data operations overseas, principally India, to reduce costs.

“Cheap inexperienced graduates were employed overseas and knowledge was lost in the UK,” observes Smith. “The banks decided to go for cost reduction rather than reliability and maintainability and they now have an accumulating and accelerating problem brought on by the loss of that knowledge.”

“The problem was evident to the far-sighted 15 to 20 years ago but it has now become a runaway problem with an awful lot of momentum and it’s getting worse.”

Why is this of setting up entirely new banks less germane than it appears? The most immediate problems are showing up at retail banks, and it’s also those types of businesses and services that are easier to balkanize and move onto other platforms (and “easier” is not “easy”). But where a seize-up would have the most serious ramifications is the wholesale banking and capital markets operations of the major international banks. And we’ve heard reports of 24/7 emergency operations to keep those systems running after some serious processing issues at a major international dealer.

But it’s hard to overstate how difficult the difficulty of moving off legacy systems is. As Ryan Lynch of Lafferty Group put it:

The banks would like to move to new systems but it would be like trying to change the engines on an aeroplane while it is in flight. Because of the need to back up data continuously while processing payments, calculating interest on a daily basis and so on it is almost impossible to stop everything to migrate to a new system

So the banks keep putting more and more layers of new applications and interfaces on the same creaky core, creating even more interdependencies to sort out. It’s not clear how this ends but the odds are high the end won’t be pretty.

Print Friendly, PDF & Email

Subscribe to Post Comments
57 comments

Paul July 31, 2015 at 10:11 am

Odd you should mention RBS! Their payments system went down today – again!

LINK
JEHR July 31, 2015 at 10:34 am

It is strange how short-termism seems to apply to most things banks do now. Scrimping on IT maintenance will eventually take its natural course. If I were the Master of the Universe, I would create new banks and shut down all the present day banks (and put those who committed accounting control fraud in jail!)
Anonymous July 31, 2015 at 11:00 am

I work for a medium-sized bank. We have systems upon systems upon systems. I have been saying pretty much since the day I started (and it’s a very good job overall) that everything should be scrapped and rebuilt from the ground up. Of course, that’s not a job for the private sector…

The bank I work for is actually making itself smaller. Now, this may just be in response to the economy (I don’t sit in executive board meetings). But it seems to realize that too much size is not good for its operations. I do not work for an investment bank, so I can’t comment on that area.

In my own personal opinion, the bank I work for is one of the better-run banks. But even it has plenty of IT issues that it ignores.
IsabelPS July 31, 2015 at 11:14 am

FAIW, here is a comment I read in a thread in the Guardian:

“I like Yves, but she is totally wrong on this. She is also talking to the wrong kind of experts in banking and IT. She should talk to the experts in countries that have capital controls or those countries whose currency is not widely accepted the world over.

There is no reason for Greece to call their new currency the Drachma or have an initial conversion rate that is other than 1:1 with the ECB euro. Greek banks also will not be holding deposits denominated in any currency other than the Greek euros. That immediately rules out having to make any changes to the banking system. The Greek euro will depreciate against the ECB euro after Greece detaches from the ECB, but that is not a problem for Greek banks to handle.

That only leave the problem of foreign currency (including the ECB euros) inflows and outflows. For this purpose there should be one entity through which all such transactions are handled. That entity will have an account each with a bank that is located in the currency zone whose currencies Greece is willing to hold. All foreign currencies will be channeled through this entity which will buy and sell the foreign currencies for the Greek euros.”

Someone else answered:

“I respect Yves blog, might well like her also I guess if I ever met her. I’d usually give her the benefit of the doubt on issues where I’m not an expert myself, like IT, but your criticique is interesting. However, as one of the purposes of Grexit is to allow Greece to devalue, the fixed 1:1 rate V the Euro wouldn’t hold, making things a bit more complicated than you envisaged.

i think the concept and method of currency conversion as part of Grexit does bear looking into for snags. It sounds easy to have all internal bank accounts and transactions simply re-denominated and new notes, the same size as the euro ones but with different pictures and values, issued for bank counters and ATMs. The external transactions would have to be funneled initially through the central bank or a foreign trade bank, how easy that would be to do electronically is another matter. Even though foreign trade could continue in theory this way, the constriction effect might handicap Greece as both exporter and as tourist destination.

From my own limited experience of bank IT, as a user of counterparty exposure and risk management systems, having them changed was very time consuming. From decision to alter to introduction of workable system could be a year for something minor (additional fields or some new functionality) to 2-3 years for coping with new processes.

So I think GRexit might well be more feasible than Yves suggested but not in any way simple. And the manual workarounds needed to make up for Greece lacking the iT functionality of its peers would be a drag on growth. But I’m certainly open to persuasion otherwise!”

I myself know nothing of this, of course.
1. Synoia July 31, 2015 at 11:49 am
  
  Dealing with concepts is easy. Conceptually I can replace any enterprises software, all that is required is money. The devil is in the details, scattered widely in million of lines of code.
  
  We’ve known for over 40 years that 10% of IT budgets are sent on new projects, and 90% on operations and maintenance.
  
  As an example: After 5 years of 10% on new projects, than one has a code legacy of 50% of the IT infrastructure where these is no budget to replace it, because it would take 50% of IT budget to replace these systems, and there is only 10% available.
  
  After 40 years of 10% investment in new code, the cost to replace is the same 10% compounded over the last 40 years, and inflation adjusted to today’s labor rates ~ somewhere between 300% and 500% of today’s IT budget.
  
  I can see the discussion now:
  
  CIO “We need to replace all our antiquated systems”
  CEO “How much will that cost?”
  CIO “We need 5 times the current IT budget, for a huge set of IT projets with a less than 50% chance of success…”
  
  Another way to look at IT
  
  The general capital cost of IT in 6% to 8% of the capital budget. The only time this number changed was in 1999, 2000 and 2001, y2k remediation, where it rose to about 40% of the capital budgets of enterprises, and this was reduced to $0, ZERO, in the capital budgets for 2002 through 2005 .
  
  That caused the tech crash. Setting capital budget for IT to zero.
  
  Before one makes any comments on how easy or hard it may be to replace all an enterprises IT systems, think about the budget numbers, and please explain how to fund these huge changes.
  
  The US Government is a similar case, I’ve see IRS auditors using mainframe based systems in audits.
  
  IT systems today are so interlinked that the suggestion of placing the IT systems of a country “in isolation” to float a new currency is ridiculous.
  
  The Irish did this when they severed the Punt, the Irish Pound, from Sterling, in the late ’70s. One of the groups who have to change was the MasterCard processing center in Southend in the UK, who had to adapt to have two currencies and not one. It took them 2-3 years to adapt, and that with relatively small volumes, and no merchant terminals.
2. Dr. George Oprisko July 31, 2015 at 5:20 pm
  
  I see that I am not alone. Earlier I said that Greece should take control of the Euro by inserting
  an ExIm Bank between itself and ROW. That Greece should establish a “Comptroller of the Currency” whose job is to manage the mint, the Bureau of Engraving and Printing, and the Bureau of Electronic Money. The Greek FinMin would have an account at these. The Greek Central Bank ditto. GCB would borrow GEUROS from the BEM at an interest rate determined by the GFM, and lend them to their member banks. This would replace the current supply from the ECB via the same software portal, only the source of funds would change.
  
  GFM would spend GEUROS into circulation ala MMT.
  
  The debt instrument tax would eliminate all bank lending, except that by Greek Govt Banks, under interest rates specified by GFM for infrastructure deemed critical.
  
  The ExIm bank would eliminate virtually all imports from Germany/France etc via punitive surcharges on purchases.
  
  Greece would leave the third energy package, strike a deal with Gazprom on pipelines in exchange for oil/gas and on development of the Aegean Gas Fields.
  
  Greece would create demand for Geuros by taxing it’s outstanding debt, forcing creditors to accept Geuros in settlement or else pay the tax.
  
  INDY
  1. Clive August 1, 2015 at 5:03 pm
    
    What you’re describing INDY when you stipulate the requirement for an ExIm bank to act as a firewall between Greece and the rest of the world is actually just a fancy-dress version of what exists between every central bank, their commercial (private bank) counterparts and the entities they interface with outside of their particular currency zone. Both the central banks and the private banks maintain what is usually abbreviated in the industry as a “treasury system” used for the purposes of treasury management (a reasonably approachable summary of what this entails is given here https://en.wikipedia.org/wiki/Treasury_management).
    
    The nuance you’re proposing, if I understand this right, is that there’d only need to be one new treasury system because you’d blur the lines between the central bank and the private banks (I think you’re suggesting nationalisation of the private banks). This would still though require new IT development – you’re talking about creating a brand-new treasury system to handle the treasury management tasks of the new (combined?) central / “private” banks. You couldn’t just re-purpose the existing setup because this has been converted to handle euros. If you want to handle a new currency, you need a new treasury system to do it with.
    
    And you can’t just switch off euro treasury management overnight, you’d need to keep the existing euro treasury system up and running while Greece transitioned. As I’ve pointed out, in the current landscape of a central bank and private banks in Greece, there isn’t a single treasury system – even if you could do a “hard stop” and switch off the euro in Greece and remediate the existing treasury systems to handle the replacement currency, you’d have to integrate the existing multiple systems into a single, unified treasury system to achieve what you’re proposing. Either that, or you’d have to “pick a winner” from amongst the existing multiple systems and use that as the new single treasury system. But then you’d give yourself a migration programme to implement because you’d have to wind down the existing euro positions on the existing treasury systems and decommission them over time. Decommissioning can be just as hard as doing a clean-slate implementation in my experience.
    
    I’ll turn now to how it is proposed to address the inevitable consequences of a lack of hard currency to purchase imports if Greece exited the euro. You come up with various ideas. I cannot help but be reminded of something that I’d lost in the mists of time, back, if I recall correctly, when I was in maybe third grade. When my English teacher, Mrs. Crawford, gave me essay assignments, I started on what was to become a life-long love of writing. But after a few exercises, I was – rightly – chastised by Mrs. Crawford (after writing maybe a page, it must have seemed like a novel at the time) for, having painted myself into a plot development corner, trying to quickly get myself extricated through lazy thinking. Aside from the usual “… but then I woke up and it had all been a dream” evasion, things like “… but my dad had been working in the garden shed and had invented the death-o-ray blaster and came out and shot the mutant aliens who had been stealing the sheep” (when this was the first mention of my dad or the shed) were also, in Mrs. Crawford’s opinion, inexcusable because their presence had not been robustly established previously in the narrative. I doubt she put it quite like that, but that was the point being made in her teaching.
    
    In my mitigation, being brought up on a diet of cartoons from the Hanna-Barbera studio (“… I would have got away with it too, if it hadn’t been for those pesky kids” as most Scooby-Do episodes were resolved by) or Skippy the Bush Kangaroo (“… what’s that Skip? Timmy is trapped down an old mineshaft and the fuse on the dynamite is going to burn down in five minutes? You’re a beaut’ Skip, we’ll get in the pickup you have just found the missing key for and go and rescue him”) hardly got me off to a good start. But luckily I had Mrs. Crawford to guide me and show me the error of my ways. From that day on, I never allowed myself to get into a position where I didn’t know exactly who the characters in play were, what they were doing, what they might get themselves into and how they could realistically get themselves out of it in the timeframe available. No previously unmentioned people (or animals) who could be conjured up to save the day from out of nowhere. No magical thinking about what was feasible.
    
    So when you talk about “(a) deal with Gazprom on pipelines” and “development of the Aegean Gas Fields”, I cannot help but think how you might benefit from a lesson from Mrs. Crawford. My dad in the garden shed inventing the death-o-ray blaster seems the model of plausibility and tight plot construction by comparison.
    1. ChrisPacific August 1, 2015 at 11:38 pm
      
      …inexcusable because their presence had not been robustly established previously in the narrative.
      
      Ah, the Deus Ex Machina. The bane of many writers, including some very talented and famous ones, like Tolkien (“…and then the eagles came and saved everyone!”)
    2. Lambert Strether August 2, 2015 at 1:22 am
      
      A burn that builds slowly is still a burn. Ouch!
3. Joe Firestone (LetsGetitDone) July 31, 2015 at 11:02 pm
  
  Link please!
rusti July 31, 2015 at 11:21 am

So the banks keep putting more and more layers of new applications and interfaces on the same creaky core, creating even more interdependencies to sort out. It’s not clear how this ends but the odds are high the end won’t be pretty.

This is a bit tangential from the bank IT software discussion but maybe illustrative of the same point. In the automotive industry, when new features are added it’s often coupled with the introduction of an additional microprocessor (or ECU in industry terms). There’s always a strong push to finalize new features and little time or energy dedicated to maintaining old ones, so the result is often an ad-hoc expansion of the vehicle’s electrical architecture to the point where most all modern vehicles have dozens of separate embedded systems linked through a web of communication buses using a mix of communication standards.

A modern luxury car has upwards of 100 million lines of code in total from the vehicle manufacturer together with their tier 1 vendors (subsuppliers), tier 2 vendors (subsuppliers to subsuppliers) and even lower in some instances. This is all for a relatively self-contained final product which, unlike banking systems, can be largely redesigned with the introduction of new models.
1. Lambert Strether July 31, 2015 at 5:18 pm
  
  I have always resisted the term “technical debt,” but perhaps I am wrong.
Stu July 31, 2015 at 11:33 am

Long time listener, first time caller. I am an executive in enterprise IT; I’ve worked on equity and fixed income trading systems, and consulted with bank IT executives in the U.S., Canada, and Japan, along with other industries.

The RBS failures of late seem related to the way most IT systems handle critical hardware failure, especially disk corruption or network outages: poorly. Mainframe systems run amazingly well for years.. Until they don’t.

But there is no good general solution to this problem.

Replacing a legacy is often a bad strategy because it assumes the successor system will be better than the old one. This is not usually the case. Better to boat anchor the legacy and build a new system that subsumed parts of its functionality. The reason is that IT systems integration and core banking systems are a market for lemons. Decades old and broken practices, that treat software development as the equivalent of constructing a building, are still being used. Even if a company uses modern practices, all they’ve done is traded a new devil for the old one. The ROI is also usually abysmal so there is no incentive and lots of disincentive from shareholders. They could also buy a system, and customize it, but that’s also a mess.

Vultures also abound in the consulting and outsourcing industry often will confuse and distort the above because billions of dollars in contracts are at stake if people catch on that its possible to rebuild complex systems with smaller teams at a fraction of the cost they’re used to. IBM for example is known to try to get CIOs or other IT staff fired via their board or other business relationships if that CIO threatens their revenue stream by insourcing or going cloud. HP/EDS as well though they’re not as good at it.

Even if you do get a lot of this right, not all of a new system will be well designed. And so you’re stuck with a new legacy that will be bad and brittle in slightly different ways from the old one. In some ways it is a lot like surgery: the risks might outweigh the benefits due to the side effects.

This is why legacy systems rarely get replaced – there’s almost no ROI or same business reason to do so if they are doing the job. Software doesn’t rust, it continuous to work for decades.

So, what about packaged software? A company like Deutsche Bank bit the bullet and dropped over $1 billion on SAP Core Banking. Not all customers are migrated over, years later. Other regional banks have done this move usually 200-300% over budget and 200-300% over time. It works, but it isn’t pretty. And you have a new legacy that replaced the only one, now vintage 1998 rather than 1978. I guess it’s an improvement? The challenge now is that SAP has the same failure characteristics of the old legacy – network and disk failures: Good luck, have fun. Then there are the internal politics to move past “the cult of SAP” or “SAP first” that arises in such companies, wherein any banking changes are slow and expensive because they must be done in SAP’s proprietary language and database… And so you begin building new , more agile and reliable systems that surround SAP to cover its weaknesses and the cycle of building a city of systems around a big system begins anew for another 20 years.

Like I said, there is no good solution to this problem, though there are solutions.
1. IsabelPS July 31, 2015 at 4:45 pm
  
  “Like I said, there is no good solution to this problem, though there are solutions.”
  
  Like Greece, then?
2. Dr. George Oprisko July 31, 2015 at 5:27 pm
  
  Back in 98 I was hired by Barnett Bank to design and implement an online banking system. While there, I made friends with the two surviving members of the formerly large in house programming staff. I learned that through error, virtually all source for mission critical systems had been lost, and the staff used report generators against the database to create and / or modify reports.
  
  I reported this to David Kominiak, a Director of our firm and Y2K Coordinator for the US FED. He brought in a team, who confirmed my findings and Barnett was merged with Nations Bank, which meant closure of the Jacksonville campus, firing of the entire staff, including the Mata Hari Barnett assigned to spy on me.
  
  None of the above is news to me.
  
  The solution is quite simple. Conversion of banks to a modern software package. It’s not rocket science.
  
  INDY
  1. Synoia July 31, 2015 at 9:24 pm
    
    s and Barnett was merged with Nations Bank
    
    And Nations Bank had more up to date IT systems, or more clout with the regulators?
    
    You story, while interesting, does not support your conclusion, as you make no statement abont Nation Bank’s It systems.
    
    Conversion of banks to a modern software package.
    
    And Banks can run all of their business on a single software package? Please name it.
NotSoSure July 31, 2015 at 11:57 am

As another person who’s worked in some of the biggest banking/finance projects including replacement for legacy systems:
1. Software is hard.
2. Software is really hard. And partly it’s because finding a good developer is hard. And a good developer is not one who just knows technical stuff, but also knows to ask the right domain questions.
3. The response by management when something is less optimal is to hire another manager who’s done it in other banks. Since I am a developer, I am somewhat biased, I think the money’s better spent in hiring really good developers, but nope, often times in a 15 person team, there would be 3 really good developers with the other 12 just there to fill in hours, making the JIRA burndown chart look somewhat pretty, etc, etc. And worse those 12 would often create re-work to be done by the really good 3 developers.

Another practical thing that’s not often discussed is that there’s often a gap of understanding between developers and the business. The practical consequence of this is that requirements are often poor and developers who don’t understand the domain then would often come up with suboptimal solutions. Business analysts are good with coming up with general descriptions, but when it comes to breaking those down to manageable units of work, they are often very much wanting. Additionally, business requirements often only describes the happy path, but not the “exception” conditions. Anyone who’s written proper code knows that most code (80%) is about handling those erroneous conditions.
1. Oguk July 31, 2015 at 12:30 pm
  
  Additionally, business requirements often only describes the happy path, but not the “exception” conditions. Anyone who’s written proper code knows that most code (80%) is about handling those erroneous conditions. LOL +1
2. grayslady July 31, 2015 at 12:52 pm
  
  I agree that it is the exceptions that are critical. Although I haven’t worked with computer programs in a financial industry capacity, years ago I had a corporate system installed for financial data and inventory control. My IT advisor recommended using an out-of-the-box system and modifying it for our specific needs. While the upfront costs were certainly less, the ongoing expenses required to continually patch the system were hefty.
  
  I also think it’s unfair to ask the developer to be omniscient. Institutions have become too specialized, so that IT is viewed as a separate entity rather than an information system designed to serve the end users. A developer needs to design a system that provides meaningful information to actual users, but how often are the various stakeholders invited to contribute to developing the module? Almost never, from what I’ve seen.
3. Synoia July 31, 2015 at 1:43 pm
  
  If-then-else code is replete with errors, because of incomplete legs for the error conditions. OO code is worse because orphan objects become memory leaks. Java garbage collection is a failure because it causes persistency problems. Persistent values need to be carried outside of their proper container objects.
  
  FSM (Finite State Machine) design is the way to go, to sample the system create a current state and then use a case statement to decide what to do.
  
  Very few programmers can write this form of code. It is very, very hard. I put one together as middle-ware, and it took to 2 years part time to get the structure correct. When put into production as middle ware between a call center database and mainframe system, ran 7×24 for 7 years. It was taken down by a SQL net DNS lookup error, because the sub-processes kept timing out and being restarted.
  
  It worked, but failed from a system point of view, because of the environment.
  1. NotSoSure July 31, 2015 at 2:42 pm
    
    Let’s be real. All code eventually gets translated to if-else. Your higher level abstractions eventually gets translated to the equivalent of “jump” instructions based on testing a value. In my career, I’ve used OOP (Java, C++, etc, etc), Functional programming (Haskell and the functional parts of Python), utilized all sorts of architectures, design patterns, and processes (Agile, Waterfall, etc). At the end of the day it all boils down to having good people full stop.
    
    Some people shouldn’t code just like I shouldn’t be a surgeon because my hands tend to shake. Some people shouldn’t be business analysts either.
JTMcPhee July 31, 2015 at 12:08 pm

Well, there’s one entry in a list of “vulnerabilities” that might be a starting point for the broader-scale indexing of all the vulnerabilities we Modern Humans face. Beating an old drum, the venerable IT vulnerabilities partake of the same modes of operation as cancer: formerly sort of healthy tissues with built-in predispositions to “go rogue,” where the tumors don’t give a sh_t that they might eventually kill the host, because Success! and Growth! and of course, stepping outside the metaphor, these systems (an inapt moniker that apparently way over-states the “integrity” of the shemozzle), the folks that create the neoliberal and control-fraud culture and the outsourcing and cost-cutting and all that are individuals, grasping Gekko individuals, who know or at least believe they will Get Theirs and Be Long Gone, in comfort and pleasure, free from any consequences, when teeth start falling off the gears and the machinery wrecks.

I guess I have to hope that my local credit union is keeping its “systems” up to date — and with the various “vulnerabilities” of market-making “systems” on the Big Exchanges, flash-crashes and Out Of Service signs and all, one wonders whether that mattress-stuffing approach to investment might not be the way to go. Since the Big Players and HFTers and all have their Really Smart Stable Software and all skimming all the real wealth out of the system, in exchange for shiploads of Funny Munny…

Have “banks” always had the seeds and now wildly proliferating jungles of corruption built into them? Even the Northbrook Trust and Savings Bank of my youth? Is it just a change in usage that has people calling JPM, and GoldenSocks and RBS and all, “banks?” Or is that nomenclature just another example of the Big People PR victories over us mopes, like the renaming of that monstrosity typified by the Pentagram as the “Department of Defense”?
flora July 31, 2015 at 12:16 pm

Great post. Thanks.
TG July 31, 2015 at 12:30 pm

Hm, yes. I can see how a bank changing to a new system could be like an airplane changing to new engines while it is still in flight.

However, here’s another take on ‘modern’ software. Sure, modern software tools are great for creating games or AI development etc. But for reliable stable understandable systems maybe we don’t want ‘modern’ systems at all. For example, NASA uses very old and limited and conservative systems when launching space probes. In my lab I have a couple of old DOS systems that I use for primitive control tasks – they have never crashed in 16 years. More recently I use arduino’s for real-time control. I am not saying that we should run our financial system on DOS. I am saying that, for things that are really critical, maybe we should take a step back and not use ‘modern’ tools, but try and develop simpler, more robust, more verifiable platforms. Basic finance, like space probe navigation, does not require very much raw computer power. What, you say this might make financial ‘innovation’ harder? That is a problem exactly why?
1. Synoia July 31, 2015 at 1:46 pm
  
  NASA can produce error free code.
  
  They have both the money and process to achieve amazing quality code.
  
  No commercial software operation would tolerate NASA’s costs.
  1. norm de plume July 31, 2015 at 6:51 pm
    
    No, but if governments are willing to step in with trillions to save banks who get themselves into existential hot water through malinvestment, failure of due diligence, excessive leverage, the Greenspan put, loan fraud and/or plain ol’ greed.. because they’re ‘too big to fail’ (and provide most of the government’s gravy).. then why wouldn’t they man up and do the same for IT snafus that pose a risk to the financial system?
    
    I suppose it comes down to whether there is a comparable risk in IT meltdown worst case scenarios of ‘contagion’, industry-wide global fallout from the say a black swan event occurring at more than one TBTF behemoth simultaneously. I am not au fait enough to know if they are sufficiently interconnected for this to be plausible. I guess the infrastructure of the banking business, unlike the debt/credit/investment superstructure, is more easily fire-walled at each node.
    
    If however there was a real systemic risk from the growing complexities of aging financial platforms, might there be a case for the provision and maintenance of a sound financial platform to be included in the basket of ‘public goods’ that government should provide, as part of its mandate to protect the nation and its citizens? That this effort, involving NASA-level costs, would align with rather than oppose the now irresistible patron/client nexus between finance and politics should be considered a plus, however difficult it might be to stomach.
    
    It would provide another reason, if any were needed, to move banking toward public utility status. If the guvmint of We the People is the only entity with pockets deep enough to guarantee stability, and cover the costs of instances of instability, we might find an avenue thru which to regain some purchase on the rampant and destructive FIRE sector, making it less sexy as we make it more robust, and less profitable to a few as it provides for the many.
    
    I’m dreaming of course. How could you prevent something like this turning into the biggest boondoggle of them all?
    
    Well, first I would put the physicists and mathematicians from NASA in charge, and outlaw the involvement of central bankers and economists…
2. dw July 31, 2015 at 1:54 pm
  
  all using ‘modern’ tools and system (like what are those? the latest fad language and middle ware? )does it change what bugs you will encounter are. and as many have noted, software is hard and risky. relating it to changing the engines in a flying airplane is very apt. its actually worse if you are changing the hw too. cause not only are you changing the engines you are going from one type (say piston) to another (say jet). and by the way there is a high likelihood of failure while doing it (probably greater than 50%). plus there really is no guarantee that it will be cheaper. or be on time. and by the time it is done, it will be the legacy system
  1. Synoia July 31, 2015 at 9:29 pm
    
    Nope. Use FSMs and you will find it is different. Surprisingly so. It does require rigorous state analysis, which is hard.
    
    Please do not use the argument of the underlying binary code, that mechanism is concealed (thank god) by the compiler.
    
    IBM 360 Assembler programming – my first language. Ugh.
Gerard Pierce July 31, 2015 at 1:39 pm

You left out some additional kludges. Back in the days when all of the smaller banks were being acquired by today’s mega-banks, Parallel systems were “integrated” using standard engineering techniques: “beat to fit, paint to match”. You have pieces of legacy systems doing the same things in different ways, all combined into one mess that no one understands.

So not only to you have the problems described above but they are multiplied by five or six.

There are also hybrid systems where divisions of CitiCorp provided limited services to something like fifty to one hundred banks for specific functions like demand-deposit accounting. The new amalgamated organization may not have access to the source code for these services – and may not be aware that some other organization owns the code.
Kris Hansen July 31, 2015 at 1:45 pm

As a (recovering) former banking chief architect I believe that the source of the problem is cultural – the answer to most problems in a bank is to bring in a new point solution and hook it up to the rest of the stuff. This ‘incrementalism’ creates an n^n problem of complexity.

Unwinding all of that complexity in one big project is very difficult, so the answer as I see it is to begin with measurement and then people to care about the sprawl. Having replaced over 1400 business applications with ~10 I’ve seen firsthand the value of moving to a simpler operating platform. Having less moving parts results in less support cost, easier change costs, and greater stability and resiliency.

I have developed a KPI and scoring approach but was not always able to get people to care about complexity as a problem. It’s simply too easy I think to throw up one’s hands and pretend that complexity is intractable and just an inevitable part of banking.

It’s a problem that can be tackled and just takes a common sense approach – reduce, reuse, eliminate, and continue to track and address sprawl.

Tying operational risk to complexity is a step in the right direction and would certainly help drive the industry in the right direction.
1. TheCatSaid August 1, 2015 at 7:15 pm
  
  I appreciate your comment.
  
  Too often I’ve been encouraged to “think big” and ended up creating situations or processes that were unnecessarily complex.
JustAnObserver July 31, 2015 at 1:53 pm

One other, somewhat ironic, thought strikes me. Given that the core s/w of most financial systems is this largely incomprehensible binary goo it seems to me that one saving grace may be that its probably (nearly) unhackable. Any attempt to do e.g. some buffer overflow attack will most likely result in whatever the mainframe equivalent of “The Blue Screen of Death” is … along with all the others its connected to.

A good, if entirely unintended, example of “security through obscurity”.

Which raises the question of how the replacement code for all legacy s**t can be secured against the bad guys (incl., of course, NSA, GCHQ, DSG, etc etc) ?
Maude July 31, 2015 at 1:57 pm

As a veteran of bank operations from 25 years ago there was some concern about that back then, but nothing was done. And, this was before programming was outsourced. The big problem with legacy systems is they have very spotty (if it exists at all) documentation. I am continue to preach this risk management tool 25 years later and business and IT still think it is a waste of time and money.

The point: “Another practical thing that’s not often discussed is that there’s often a gap of understanding between developers and the business. The practical consequence of this is that requirements are often poor and developers who don’t understand the domain then would often come up with suboptimal solutions.”

This has been my problem throughout my career. When will someone figure this out?
1. washunate July 31, 2015 at 4:55 pm
  
  That’s the thing, isn’t it? We have known this is a problem for decades.
  
  I think it has been figured out – the costs of suboptimal solutions are socialized onto the public, from IP law to bank bailouts.
Paul Tioxon July 31, 2015 at 2:01 pm

https://www.fdic.gov/consumers/consumer/alerts/check21.html

http://i.zdnet.com/blogs/check-21.pdf

When the congressmen grilled bankers after 9/11 on why grounding the air traffic in the US stopped all banking dead in its tracks, they were shocked to learn that paper checks were physically transported by trucks and jets crisscrossing the US from city to Federal Reserve city for clearance. The paper check would go from bank, to truck, to bank clearing house, to truck, to airport jet, to truck, to Federal Reserve, to truck to bank and back to you with your statement and a pile of cancelled checks. Shutting down all air line service shut down all check clearing for banks that people used for mortgages, rent, car payments, insurances, loans, credit card payments, utility bills, etc. This financial disruption was an economic bomb blast to the economy by slowing the cash flow to almost nothing.

The congressmen could not believe in the 21st Century, with the internet, that banks would not be using the fastest means available and the most state of the art technology. Little did they realize that all businesses operate with the cheapskate mentality of a slumlord who only spends money when a gun is put their heads. So congress put a gun to the head of the banking industry in order to create a check clearing system for the 21st Century, aka Check 21. That is why today, you take a pic of your check, and it is deposited in your account. You do not get cancelled checks anymore because they changed the law saying a digital image is a legal check. The politicians never want to see the US relying on the mercy of the banking system if this country is again attacked in the future. The business man understands the costs of everything and does not want to spend, especially in a mature well developed sector that can coast along on past investments. They are more than happy to spend on the cheap with systems integrators patching together antiquated main frame equipment, cobol PL/1 and web enabled instead of building new from scratch.

The politician however, needs to control what is going on in the physical territory of his/her nation. These are 2 competing authorities, but in this instance, the banks were told how to conduct their business by the political class. This should be a lesson to those who think that bankers have staged a silent coup and completely dominate the government. They do not. It is just that politicians and businesses and bankers are united in operating the capitalist system and are a united front on most issues most of the times. If the banking threat is clearly identified as being an unacceptable risk to the political class, there will be a conflict that will not be easily resolved, but as long as a majority of the political class think and see the world the same way as business/banking does, there is not much chance for laws like Check 21 shoved down the throat of banking anytime soon. Just as one swallow does not make a Spring, one Liz Warren does not make a sea change in the thinking of the entire political class.
1. juliania July 31, 2015 at 4:47 pm
  
  I suggest you read “The Shock Doctrine.”
  
  I do not believe the millionaires in Congress put a gun to the head of the banks. Perhaps you have forgotten that money is now speech and speech can only occur behind roped in areas. The political class you speak of might be wise to pay a bit more attention to us ‘oi polloi, lest they end up like the elite Great House occupants of Chaco Canyon in deserted complex high rises where the elevators don’t work any more and all the ‘modern’ conveyors of energy are overcome by the extremes of climate change – as we puebloans simply depart, whether through increased mortality (likely) or physically taking to our heels, shaking the dust from our sandals and continuing to grow our veggies where we can in the world you have left us.
  1. Paul Tioxon July 31, 2015 at 10:10 pm
    
    Juliana
    Thank you for your reading recommendation. However, you need to look at your logically inconsistent argument about who you describe as “millionaires in congress”. If as millionaires, the monthly source of their revenues are cut due to the grounding of planes that carry the receipts for the many businesses that they own and provide them with their wealth, it is in their interest to see that the banks have a state of the art system that gets them money as fast as possible. Doesn’t that make sense? Banks are regulated entities as much as we discuss otherwise here on NC, there are some legal rules and also rules of the nuts of bolts of finance that are real and require rational management so that checks can clear, payments are properly made and placed into the correct accounts out billions and billions of such transaction yearly in this nation alone. One of the things I have learned on this site is the various mechanisms of finance which enable the capitalist system to function day after day, year after year.
    
    An antiquated institutional financial system can hurt everybody and especially the 1% or the .001% that owns a disproportionate amount of the wealth in this nation alone. Sometimes, a little nudge is required by one group of the power elites to make sure all of the power elites can remain so. Check 21 contributes to this united goal of making sure the rich get their money and now even quicker than before when they relied on the US Postal Service to deliver them paper checks. It was just the banks that did not want to spend money to move into the digital age, so congress push them into doing so. If not for the change in laws, we would still have trucks and jets hauling a huge amount of paper checks around. As you may or may not know, even Social Security has gone paperless. If you absolutely have to, you can request a check to be mailed to you like the good old days, but I am sure that will be completely abandoned over time.
    
    So, it is not impossible for politicians to force change on banks. We just had to be attacked in NYC and at the Pentagon for it to happen. Politicians acted in their roles of controlling what goes inside the territorial USA, including defending the payment systems from being an unintended victim of future attacks. That is all that they have done with this law to effect change. They were not leading any revolutionary movement for social justice but they did do something the banks had not yet done, were probably not going to do for a very long time.
Oregoncharles July 31, 2015 at 3:43 pm

More “code as law.” This is turning into a fundamental problem that Y2K only began to raise.

Personally, I’m a dunce on the subject. I did notice, some years ago, that both the state of Oregon and a major federal department (maybe someone else remembers which one?) tried and failed, abjectly and very expensively, to replace their legacy systems. As far as I know, Oregon is still using its. This appears to be a classic example of unintended consequences, whole institutions falling into a trap that maybe should have been foreseen.

It’s tangential, but my son’s experience may apply: he’s a computer jockey (not IT person) for a very large, national….contractor (trying to maintain some anonymity here). It recently converted to a corporation and moved its headquarters. He complains constantly about the lack of IT support and about the (mission-critical) programs he uses. And they don’t even have the problems that large financial institutions do they could shut down for short periods.

Seems to me he’s seeing the introductory end of the trap the huge banks are facing.

And incidentally, this whole issue is yet another strong argument for keeping such critical institutions fairly small. Not only are they a threat when they fail, but they’re essentially unmanageable – exactly the issue my son is seeing.
washunate July 31, 2015 at 4:43 pm

Continuing to enjoy this series. It’s almost like basic functionality of the banking system shouldn’t be left in the hands of private banks that are backstopped by public support.

Why would a bank executive do anything other than the easiest short term fix, even if it makes the overall system more unwieldy? Unless banks (and airlines and everybody else) with crappy code are allowed to fail, there’s no incentive to plan for the long term with a good vision and good documentation.
1. Lambert Strether July 31, 2015 at 5:15 pm
  
  Maybe the Post Office Bank should be built with new clean infrastructure, with NASA-level code quality, and used for a payments system treated as a public good.
  
  I know this is like waving my whole body instead of just my hands, but I bet we could do it for a few F-35s.
  1. René July 31, 2015 at 5:42 pm
    
    Agree, but just to add to that: the payment infrastructure is just fine, the sole access points for non-banks (i.e. banks) are unstable. Once you have Post Office Bank types of services, preferably with no interbank loans outstanding (or just no lending at all; just a few 100% reserve banks among ‘normal’ banks), any other bank’s IT-system can break down and payments can still continue through the Post Office Banks. It won’t take more than one crash for consumers to move their deposits.
  2. Synoia July 31, 2015 at 9:40 pm
    
    It would be starting from zero, and have to accumulate the 40 years or more of work on legacy system to provide the same level of function, and then, if successful, would become JALS.
    
    Just Another Legacy System.
  3. Deloss August 1, 2015 at 1:30 pm
    
    Pardon, Lambert, but you should read NO DOWNLINK by Claus Jensen, by about the Challenger disaster. It’s a complete history of NASA, starting with von Braun, and it turns into the Adventures of Richard Feynman at the end, when he’s appointed to the investigating committee. He was amazed at the age and inadequacy of their computers. They were circling the globe and feeding the landing program in, because the machines didn’t have enough memory for takeoff AND landing programs. Maybe they’ve improved.
    
    When I worked for one of the major stock exchanges, a vice president formed a committee to speed up software delivery. One of his (and the exchange’s) bright ideas was that we could save time by not doing regression testing (that is: everything worked before we made this little change; does everything still work afterwards?). The Quality Analysis people told him this was a terrible idea. He yelled at us for an entire hour, using all sorts of uncomplimentary epithets.
    
    When I worked for a way-too-big-to-fail bank, we built a huge Java database from scratch, to replace an antique. When I read the business specs, I was amazed to see that they did not include the one thing the database was supposed to do–but we built it in anyway. I was also amazed that said bank had end-to-end testing, for when everything was build, but no QA department for module testing. As tech writer I became the de facto Quality Control tester.
    1. Lambert Strether August 1, 2015 at 2:34 pm
      
      To me, NASA is the NASA of Apollo, not the NASA of the space shuttle, which was a debacle from beginning to end. Can readers clarify, correct, expand on the transition?
  4. washunate August 2, 2015 at 12:30 am
    
    I’m not sure the post office specifically exactly has the requisite capacity for customer service for national retail banking…but the general concept is definitely something that makes sense to me, too.
JCC August 1, 2015 at 2:20 am

Having worked in the IT area for many years in a variety of industries other than Banking, I feel I can safely say that all the above observations on software in banking may easily be observed across the board for, among other areas, large corporate inventory software, HR software, large corporate internal accounting software and, unfortunately, systems control software. Software technology is far more fragile across the modern world than most realize.
Jon H August 1, 2015 at 2:21 am

I will not dispute anything about the complexity of banks’ IT system but the idea that introducing a new drachma will cause a systemic collapse is just plain wrong. A new currency was introduced in Europe as recently as 2005 – Romania redonominated their currency by removing 5 zeros and changing the currency code from ROL (Romanian Leu) to RON (New Leu), a few years previously Turkey did the same with 6 zeros and changing from TRL to TRY (the word for new in Turkish is yeni). Russia, Mexico, Poland to name some others. More recently banks dealing with China have had to introduce a total new currency code for offshore renminbi: CNH (H for Hong Kong) to be used along side CNY (yuan). So introducing a new code GRN for New Drachma and changing the relevant EUR transactions to GRN at whatever the conversion rate may be is hardly unprecedented in recent times – those legacy systems from the 80s and 90s will have achieved this many times. Of course Greece is more integrated with the European banking system than Romania was in 2005 but measures have been taken to mitigate this since at least 2011. Banks dealing with Greece (and other peripheral countries) have ensured that as many contracts as possible are in their home country law or international law (some German banks have even sought to avoid contracts in French law) so that EUR payments cannot be unilaterally changed to GRN by the Greek government. There has also been a process of matching assets and liabilities by country and contract law so that changes on one side of the balance sheet will be matched on the other. This is not to say that there will be no losses from a Grexit or that some people may end up manually changing a lot of payment currency codes – but this has all been done before.
1. Clive August 1, 2015 at 5:14 pm
  
  The main difference in what happened with the Romanian ROL / RON conversion was that this was a simple re-denomination. Both ROLs and RONs were valid currencies, they ran in parallel and – this is key – the Romanian central bank remained a sovereign currency issuer. Both ROLs and RONs had to be accepted – and, interestingly for the purposes of our discussion on the timescales for any currency swap-over, there was an 18-month period when merchants had to legally display both ROL and RON prices https://en.wikipedia.org/wiki/Romanian_leu#Fourth_leu_.28RON.29:_2005-Present as well as allowing whichever notes the customer had to hand as payment. For remittances from outside Romania inward, they could be either in ROLs or RONs for the duration of the switchover – and the central bank would always negotiate either ROLs or RONs at par so it did not matter what had been remitted. This could not possibly happen with a new Greek currency vs. the euro – the whole reason for introducing a new Greek currency was to allow a variable exchange rate.
  
  Similarly, it didn’t matter if your bank statement was denominated in ROLs or RONs. Apart from the number of zeros, it referred to the same basic currency in terms of it being a store of value.
  
  The Wikipedia information doesn’t state how long the central bank would continue to accept ROLs, but if they were allowing notes to remain in in circulation for 18 months, then ROL negotiation for electronic transfers must have been at least that long. Nor does it say when the re-denomination plan was announced but the statutory instrument http://www.bnr.ro/files/d/Legislatie/En/Circ7.pdf is dated 2004. So that must have given a lead time of at least two years, possibly as much as three. All of which confirms the assessment of the required lead time for any Greek euro exit planning made by Naked Capitalism of around three years.
  
  Finally, just to give some indication of the depth of planning and design work needed to accommodate even something as (comparatively) straightforward as a currency re-denomination, consider this spec sheet from MasterCard which details the interchange fee schedule: http://www.mastercard.com/us/wce/PDF/Romania.pdf — the fee schedule has to be carefully constructed to cover MasterCard’s costs but also to demonstrate to regulators that the Card Network is not rent seeking. At least, not rent seeking too overtly. This fee schedule can only be set once certain assumptions are validated (currency denomination, volatility of currency pairs, liquidity of the applicable ForEx market).
  
  Good faith dealing is essential for any counterparty to engage in serious planning. Romania tried as far as possible to be clear and overt in what it was intending to do and how it would do it with its ROL / RON re-denomination. In the event of a Greek exit from the euro, such well thought-out strategy and clear communications would be an absolute minimum for enticing agents like the Card Networks (MasterCard, VISA, AMEX etc.) to accommodate any new Greek sovereign currency. Getting things like, for instance, the interchange fee schedule wrong would potentially force the Card Networks to eat losses as their costs wouldn’t be covered by the fees which were being applied. That ain’t gonna happen. If Greece thinks the Troika are a tough crowd, wait ‘til they have to start pleasuring the embedded financial players like the TBTFs and the Card Networks.
equote August 1, 2015 at 7:32 am

I worked in government and JCC is correct, the problems described occur ‘across the board’. If anything they are worse, because of the political control, politicians (elected and appointed) never seemed interested in solving ‘technical’ problems. If you push them then you are reassigned to an office with only a chair, or if you are lucky you can retire.
Kemal Erdogan August 1, 2015 at 7:36 pm

I am also a long time systems designer. I find the arguments in the article not convincing. In fact I would argue that fragmented nature of IS makes them more robust. Take note on the problem on RBS: was any other bank affected, or even some other units of RBS? The answer is no. The reason is that interconnected systems that was not designed together will almost always include some redundancies, unneeded controls, etc. And, they continue to work without each other (because they were not aware of each other before) The resilience of seemingly endless patchwork is indeed remarkable. The problem in understanding comes from the fact that people would like to think in terms of physical objects and software systems are usually likened to a large building, its design is compared to blueprints of the same. Of course a building is subject to the laws of physics and they cannot be expanded safely beyond a certain limit determined by the original design. No such rules exist in software: any part of software can be replaced or augmented, its performance does not deteriorate over time, nor it rusts. There are different reasons to replace aging software (difficulty in making changes, finding people with right expertise, etc.) but systemic risk is not one of them. And the smart move would be to stay away from wholesale changes and make peacemeal upgrades just like how it has been done before
1. Clive August 3, 2015 at 5:33 am
  
  Kemal, you have I think succumbed to a type of faulty generalization, this started off with you correctly rejecting a false analogy — that which occurs when software is “likened to a large building” as you put it. Software does not of course work in exactly the same way as a large building because it isn’t constrained by things which were designed in at the start, like a foundation depth or its land allocation within a city block.
  
  But your logic then broke down because, in your rush to dismiss the aspects of the analogy which were false, you forgot to look at where parts of the analogy are true and so ended up with another faulty generalization all of your own making — that because software is not exactly like a building, you then made assertions that software is nothing at all like a building. But that isn’t right. Sometimes software is, in certain specific aspects, very much like a building.
  
  A building presents its own requirements on the environment is must by necessity integrate with. If a building owner tells the utility company that their structure will present, say, a load on the electrical grid of 2MW, but then (say) the HVAC contractor puts in a chiller which draws 1MW, the architect specifies lifts which draw 500kW, plug loads hit 750kW and lighting 200kW, then the city block will get brown outs because the interface that the power company has specified for the building isn’t up to the standard inherent in the design which the building owner eventually implements to.
  
  So then you fall into a logical fallacy and conclude that “any part of software can be replaced or augmented” without, you imply, any effect on other software “structures”, if I can call them that.
  
  Many who should know a lot better have ended up making the same mistake and inflicting adverse consequences on other parties who depend on a system which the system owner thinks it can change in a minor way without considering possible impacts on those who depend on it. No less than the Bank of England got it spectacularly wrong when it miscalculated the effect of a minor tweak to the message queue in its Real Time Gross Settlement system (http://www.bankofengland.co.uk/publications/Documents/quarterlybulletin/qb120304.pdf) — it failed to take account of how some older CHAPS terminals didn’t implement the latest specification of the message design and so couldn’t handle a new message specification when it was rolled out. It took out great chunks of the entire British economy for the best part of half a day.
  
  That is one “piecemeal upgrade” — as you describe it — which certainly fits my description of something which turned a systemic risk into a real, live systemic issue. Just because a change is done piecemeal doesn’t mean that one of those seemingly pieces can’t blow up in your face.
  1. kemal erdogan August 3, 2015 at 8:37 pm
    
    Clive, I don’t deny similarities. But, when I suggest that any part of software systems can be replaced without any side effects, I was talking about a theoretical possibility. The aptitude of the engineers who pursue such goals is completely another matter. There are certainly some systems that is very hard to replace (and that is one of the reasons why we have so many decades old systems still working) But unlike some other engineering systems, this is doable and have been done many times. So, yes, I believe any software sub-system can be replaced albeit with some difficulty. That is why software is so much dynamic and that is why there are not many standard software components like in auto industry where everyone uses these standard parts.
    
    Your example, I think, supports my point. I am sure bank of england have spent a very hard half a day but in the end they have made the new system work, right? And, if they had tested a bit better, they could have prevented that, as well. Additionally, they probably have added some more seemingly stupid patches to the already super-patchy network of unintelligible code.
    
    I believe those patches are extremely valuable and represent the collective wisdom of the engineers who developed those systems. Of course, it would have been better if the system was designed to account for all the problems and issues those later patches addresses, which would have made it internally consistent. But, as we software people all know, software is hard, no-one is that smart, and we don’t have time to wait for a single person to design all sub-systems for consistency, and it is near impossible to design a complex system that has full internal consistency from ground up that also accounts for the future issues. This is simply impossible, thus it is waste of resources to pursue. In fact, there are so many re-engineering projects that failed with such grandiose objectives.
    
    Besides, there is no examples of systemic failures due to interconnected software systems which are very old. In fact, software becomes more stable as it ages (you know we always wait for service packs of windows before migrating). In fact, the reverse is true: there would be indeed systemic failures if the same software is deployed widely, no matter how stable and well-engineered the system is. Variety and very loose coupling is certainly an advantage in that respect, and makes the global software systems more resilient. Remember that “heart bleed” problem on secure connections? it would have been a trivial problem if it was not deployed so widely
    1. Clive August 4, 2015 at 7:15 pm
      
      Kemal, when you speculate “the Bank of England spent a very hard half a day but in the end they have made the new system work, right”, no, you couldn’t be more wrong.
      
      The official report into the live incident and the outage it caused (report http://www.bankofengland.co.uk/publications/Documents/news/2015/rtgsdeloitte.pdf and the BoEs response http://www.bankofengland.co.uk/publications/Documents/news/2015/rtgsresponse.pdf ) shows that not only did the change cause a significant disruption, it could have been a lot worse than it was and highlighted major failings in how the BoE designs, builds and — crucially — tests (or, rather, it didn’t test properly) changes. It has since the incident suspended anything but, as the report puts it “compelling market or policy” driven changes until September (that is for approximately a year since the outage occurred).
      
      So, a large, supposedly sophisticated operator of a payments system that almost defines the expression “systemically important” has found itself, right now, unable to make any material changes on it for a year. Describing such things as being “with some difficulty” barely even scratches the surface.
2. Synoia August 3, 2015 at 7:49 am
  
  A Systems Designer. Really. LMAO.
  
  The people who are most qualified are the System Testers (and debuggers) to comment.
  
  They are the people who resolve the side effects omitted, ignored and just plain misunderstood by designers such as yourself.
  
  And that’s where I make a significant amount of money, and why over 50% of large IT projects fail.
  1. kemal erdogan August 3, 2015 at 8:42 pm
    
    What does LMAO mean?
ChrisPacific August 1, 2015 at 11:24 pm

I work in the industry and have experience dealing with large businesses and government clients. All the quoted remarks provide a pretty good sampling (note: not a comprehensive list) of the risks and challenges that arise.

Additionally in banking there are two other concerns that are particularly important: reliability and security. Reliability means that interactions should either definitively succeed or definitively fail, and all parties concerned should understand which case applies and agree on it. You cannot ever have a situation where one party thinks the interaction succeeded and the other party thinks it failed, and you cannot ever have a situation where an interaction ends up in an indeterminate state and is never eventually resolved as either a success or failure. If either of those two scenarios arises in a system that’s used for banking, it amounts to either creating money out of thin air or making it vanish into thin air. Another term for this is fraud. It might be accidental fraud, to be sure, but try explaining that to an auditor.

Security is closely related. The average system out there in the wild might have dozens or hundreds of security vulnerabilities, but the vast majority of them get away with it because nobody particularly cares what they are doing and it wouldn’t be worth it for an attacker to deliberately target them and expend the resources that would be needed in order to compromise them. That is very much not true of banking systems (see the part about creating money from thin air, above). Anything that handles real money transactions might as well have a giant target painted on it. They are constantly under attack, and need to be absolutely bulletproof and untouchable.

I once had a developer tell me (when we discovered an obscure bug) that we shouldn’t worry about fixing it because “it will almost never happen.” Almost never turned out to mean about 0.01% of the time. Imagine you are running an exchange that processes a million transactions a day. An error rate of 0.01% means that 100 of those transactions could be faulty on average. Depending on the size of the transactions you deal with, you could be leaking tens or hundreds of millions of dollars on a daily basis. You think anyone would ever trust you with their money again if that happened? You’d be lucky to stay out of jail.
FedUpPleb August 2, 2015 at 12:24 pm

Yves, in case you haven’t read it already, I highly recommend the short story “The machine stops”, about a society which became utterly dependent on a quasi automated, interconnected machine and communication network. It was written in 1909 (!) and is unbelievably prescient to the present situation.

Comments are closed.