This weekend should have been Wimbledon finals weekend. Maybe Roger Federer would have extended his record to nine wins in the men’s. Or the tournament could have done what it does best and throw up a surprise like Simona Halep’s resounding victory over Serena Williams last year. Alas, we’ll never know. For the first time since the Second World War, the All England Lawn Tennis Club took the decision to cancel.
Not that the Club will be out of pocket. Unlike most other major events planned for 2020, it had pandemic insurance in place. For seventeen years it paid an annual premium of around £1.5 million for insurance; its payout this year is expected to be around £114 million.
Please FORWARD this to your friends and colleagues - it's a FREE article.
Feel free to share this with others and on social media.
Or tell your colleagues and friends to sign up for free
The story goes that the Club was motivated to take out insurance by SARS. But that would underestimate the Club’s literacy with extreme events.
Anyone who’s been to Wimbledon in the past ten years will have seen this plaque on a brick wall outside Court 18.

It commemorates the longest tennis match in history. In June 2010, American 23rd seed John Isner played French qualifier Nicolas Mahut in the first round of the men’s singles. Played out over three consecutive days the match took 11 hours and 5 minutes to conclude, with Isner the eventual winner after 183 games. The previous record was a match in Paris that lasted six and a half hours.
Unlike the pandemic, the event didn’t have a negative financial impact on the Club. But it was no less extreme. The probability of a match lasting 183 games or more is 1 in a billion. On those odds, another pandemic will occur before a match lasts that long again. (And to prove it, the same pair-up Isner vs Mahut was drawn at Wimbledon the following year, itself an unlikely event although not as unlikely, and Isner won in straight sets in just over 2 hours.)
A pandemic and a tennis match lasting over 11 hours are both infrequent events. When we think about risk it can be useful to think about it in two dimensions. One dimension is frequency — the likelihood that an event is to occur; the other dimension is severity — the impact if it does occur.
Isner vs Mahut was a low frequency, low severity event — unlikely enough to merit a plaque, but not an event that has long-lasting impact. A pandemic is a low frequency, high severity event. No-one is in any doubt as to the severity of what we are currently going through, even though they may have underestimated it in February.
Over the years a number of risk management models have emerged to measure risk across both of these dimensions. It is the low frequency occurrences that typically confound.
One model financial institutions use to measure risk is Value-at-Risk (VaR). It harnesses historical data of price movements to estimate a worst loss in a given interval of confidence. To illustrate, take a look at Goldman Sachs. In the first quarter of this year Goldman reported an average VaR of US$81 million. This means that 95 times out of a 100 – it’s chosen level of confidence – the firm shouldn’t lose more than US$81 million in a single trading day. As it turned out, Goldman lost more than US$81 million twice in the 62 days that comprise the quarter, which is within its bounds of confidence.
The problem with Value-at-Risk is what happens when the future doesn't look very much like the past. Goldman samples from five years of historical data, weighted to give prominence to more recent data. That’s fine as long as the future reflects the past five years, but what if it doesn’t? The firm recognises the shortcoming. It acknowledges that “VaR is most effective in estimating risk exposures in markets in which there are no sudden fundamental changes or shifts in market conditions.”
Unfortunately sudden changes are a feature of markets. A former Goldman Sachs CFO told the FT back in 2007 that his firm was “seeing things that were 25-standard deviation moves, several days in a row.” A 25-sigma event is really infrequent — more so even than an 11 hour tennis match. None of us should have the fortune to witness one. Yet the number of multi-sigma events we’ve seen even since then exceeds what the Chinese would wish on their worst enemy.
As Howard Marks’ backgammon buddy, Bruce Newberg, says: “There’s a big difference between probability and outcome. Probable things fail to happen – and improbable things happen – all the time.”
In order to accommodate this another model is used — the stress test. Unlike VaR measures, which have an implied probability because they are calculated at a specified confidence level, stress tests simply model outcomes based on predefined scenarios. They completely ignore the concept of frequency and dwell entirely on the severity of a particular scenario.
Although stress tests have long been used by portfolio managers and trading desks to manage risk, they became part of regulators’ armoury at the tail end of the financial crisis. Initially they were employed as an extension of informal ‘burn down’ exercises that analysts were conducting on banks at the time — to see how bank capital positions could withstand severe losses. In his memoir, Stress Test, Tim Geithner, former US Treasury Secretary, says:
“I first described the plan as a ‘valuation exercise’. We would come to call it the stress test. The plan aimed to impose transparency on opaque financial institutions and their opaque assets in order to reduce the uncertainty that was driving the panic. It would help markets distinguish between viable banks that were temporarily illiquid and weak banks that were essentially insolvent. Then it would help stabilize the strong as well as the weak by mobilizing a combination of private and public capital.”
He goes on to say that the stress test would end up having many other virtues he didn't foresee at the time, that it would be “the gift that keeps on giving” as a regulatory tool.
That’s not a bad thing. Stress tests have many merits. They continue to provide investors with the transparency on financial institutions Geithner aimed for. They form the foundation for how much capital a bank is required to have. And in addition they provide regulators with a consistent, horizontal view of system-wide risk.
But they also have drawbacks. Three in particular stand out, two of which have echoes in other domains where stress tests are used — healthcare and construction.
The Millennium Bridge problem
Stress tests have been used in the construction industry for many years. You wouldn’t want to drive over a bridge that hadn’t been stress tested. So when the Millennium Bridge in London began to sway from side to side moments after opening to pedestrians in June 2000 it caused some surprise.
The bridge was quickly closed while engineers conducted an investigation. Here’s what they found:
“Chance footfall correlation, combined with the synchronization that occurs naturally within a crowd, may cause the bridge to start to sway horizontally. If the sway is perceptible, a further effect can start to take hold. It becomes more comfortable for the pedestrians to walk in synchronization with the swaying of the bridge. The pedestrians find this makes their interaction with the bridge more predictable and helps them maintain their lateral balance. This instinctive behaviour ensures that the footfall forces are applied at a resonant frequency of the bridge, and with a phase such as to increase the motion of the bridge. As the amplitude of the motion increases, the lateral force imparted by individuals increases, as does the degree of correlation between individuals. The frequency ‘lock-in’ and positive force feedback caused the excessive motions observed at the Millennium Bridge.”
In other words, a feedback loop develops. People naturally fall into step with one another; that causes a bit of swaying; people respond in a way that makes them feel more comfortable; the swaying gets worse. In order to capture that feedback loop, stress tests would have had to have incorporated pedestrians’ response to each other and then their response to the bridge.
Second-order effects are a difficult thing to capture. Many people predicted Brexit but only a subset of them predicted the market’s response; many people predicted Trump but likewise only a subset predicted the market’s response. The pandemic’s a bit different — not many people predicted it, but if they had, it is highly unlikely that many of them would have predicted the market’s response.
Portfolio managers typically stress their portfolios according to historic scenarios — the unpegging of the Swiss Franc, the ‘Taper Tantrum’ and the like; if the Trump or the pandemic scenarios would have been used before they happened, they wouldn’t have appeared credible. (“Yeah, yeah, we’re modelling a complete shutdown of the global economy with unemployment rising to 15% and GDP cratering by 30% so we’ll pencil in, what, a 5% fall in equity markets?”)
Probing in the wrong place
A second drawback, drawn from medicine, is that the stress test may be probing in the wrong place.
Doctors perform cardiac stress tests by getting patients to run on treadmills and monitoring their pulse and blood pressure. Stress tests are able to detect arteries that are severely narrowed as a result of cholesterol build up; these are what cause symptoms. However, they don’t necessarily cause heart attacks, which often result from lesser blockages that rupture and form clots. Not until the blood clot is formed and heart muscle is starved for oxygen will symptoms be felt, which is why people who have heart attacks often have no warning symptoms and can undergo a perfect stress test days before keeling over.
In July 2016 Banco Popular of Spain passed a stress test conducted by the European Banking Authority. Within 12 months it had failed. The stress test examined solvency in an adverse scenario for the general economy; what brought Popular down was liquidity in an adverse scenario for the company specifically.
Gaming the system
The third drawback is that the system can be gamed. This is where financial stress tests differ from cardiac stress tests or engineering stress tests. Neither heart muscle nor steel girders know they’re being stress tested; people do. As Richard Feynman said, “Imagine how much harder physics would be if electrons had feelings.”
There is some evidence that regulatory stress tests are increasingly being gamed. Researchers have found that as stress tests in the US have evolved, the outcomes have become more predictable. Others have shown that in Europe the flexibility that exists in the design and use of banks’ own models is systematically exploited to minimise projected losses. And Risk magazine reported last year on a leaked memo in which a US bank approached a European bank to swap out risky assets as a window-dressing exercise ahead of its forthcoming stress test.
A similar phenomenon has long been observed on trading desks. If interest rate risk is being measured, traders will optimise instead to the slope of the yield curve; if yield curve risk is being measured they will optimise to the curvature of the yield curve. At every iteration, more complexity is being layered in and as complexity is layered in, more cracks for gaming open up.
Like the Millennium Bridge problem, this drawback also highlights a feedback mechanism. But this one’s different; the feedback here is designed specifically not to fit into a model but to thwart it. This is the reason why no single risk management model can cover all risks. Once a model is specified, people will try to find a way round it.
This gives rise to a paradox. Perhaps it’s a corollary to Goodhart’s Law. Overt risk management has a tendency to add complexity in a system; and complexity increases endogenous risk within the system. (In fact, the concept of risk is riddled with paradox. The Stanford Encyclopedia of Philosophy states: “When there is a risk, there must be something that is unknown or has an unknown outcome. Therefore, knowledge about risk is knowledge about lack of knowledge. This combination of knowledge and lack thereof contributes to making issues of risk complicated from an epistemological point of view.”)
The best way out is not to focus on a single metric. Unfortunately bank regulators have fallen into the trap of focusing on one — the stress test. This came to the fore this year when US regulators did not want to be distracted from their annual stress test cycle by a real-life stress test. So rather than re-running the 2009 ‘valuation exercise’, they fudged it.
Another way out is not to overthink it. At my hedge fund we had a story pinned up on our wall. It was about Bobby Steinberg, ex-head of Bear Stearns' risk arbitrage department and Ace Greenberg, who was running the firm:
“The arbitrage department was long a stock involved in a merger, and the department was at the firm's position limit. Steinberg thought it was incredibly attractive, and he went to Ace for permission to exceed the limit. Ace listened very carefully, and when Steinberg was done, he looked up and said, 'Bobby, that sounds really great, that sounds like an amazing story, but I have just one question for you. Do you think we have position limits around here to stop people from buying stocks they don't like?'”
There’ll be no surprises out of Wimbledon this year. But our broader exposure to infrequent events remains undiminished. This week US banks begin reporting their results and we’ll get a peek at how they’re faring so far in this real-life stress test. Modelling specific outcomes is hard, but simple things like position limits and low leverage provide resilience to defend against risk.
Marc Rubinstein is an angel investor in fintech and writes on financial sector themes. He is a former hedge fund manager at Lansdowne Partners, one of the oldest long/short hedge funds in Europe. Prior to that, he was a managing director of Credit Suisse, one of the youngest in his cohort. He started off as an equity research analyst at BZW (investment banking arm of Barclays) specialising in banking sector and moved to Schroeders.
Help us grow 🌱
If you enjoy Breezy Briefings, there are three things you can do to help us grow and reach more people. Which would be lovely!
Share it with someone else. Forward the email. Post on social.
Click/tap the little ❤️ icon at the top or bottom. It actually helps.