Thursday, March 14, 2013

FROM BIG DATA TO KNOWLEDGE

Talk to NatStats 2013 Conference, Brisbane, Thursday, March 14, 2013

I want to make a contribution that’s a bit offbeat and a bit challenging, but hope is constructive. This conference is based on a proposition that’s simple, seemingly obvious and even admirable: the better information we give people, the better will be the decisions they make. Indeed, I imagine that’s a proposition that guides the whole of the bureau’s work. And if you were to challenge me to state my own ‘mission statement’ as an economic commentator, I don’t think I could do any better: my objective is to contribute to making my readers as well informed as possible, in the belief that the more knowledgeable they are the better off they are and, as part of this, the better decisions they’ll be able to make.

But the older I get and the more I read, the more I conclude it isn’t nearly that simple. We proceed on the assumption that everyone’s well-intentioned, rational and focused, and all they’re lacking are better data. I’m afraid not all of us are well-intentioned (even if our motives aren’t merely self-serving, they can vary greatly from those the data-providers assume), we’re often far from rational in the way we use data and make decisions, and, whether we’re acting as part of an organisation or in our private lives, modern life is far too complex for us to want to be focused on information gathering and evaluation prior to every decision, even if that were possible, which it isn’t.

The Nobel-prize winning psychologist Daniel Kahneman has demonstrated how many of our decisions are made unconsciously and how many lack any kind of rational logic. The illustrious German psychologist Gerd Gigerenzer, of the Max Planck Institute, has gone further and argued that, in many circumstances, people make better decisions if they don’t allow themselves to be confused by reviewing too much data. It would be comforting to believe that, however relevant these findings are to the behaviour of individuals and their private decisions, they wouldn’t apply to the detailed and careful decision-making processes of big companies and government agencies. It would be comforting, but to believe it you need a lot more faith in the perfectibility of human nature - or the infallibility of organisations - than I possess.

Let me offer a few examples to remind you of the realities we are dealing with - of how far data and information can be from knowledge. A recent Buttonwood column in The Economist magazine invited readers to answer two test questions. First, suppose you had $100 in a savings account that paid an interest rate of 2 per cent a year. If you leave the money in the account, how much would you have accumulated after five years: more than $102, exactly $102, or less than $102? And, second, would an investor who received 1 per cent interest when inflation was 2 per cent see his spending power rise, fall or stay the same?

A survey of Americans over 50 found that only half of them could answer both questions correctly. This and many similar surveys demonstrate a remarkably low level of financial literacy - even practical numeracy - among the population. You may think it indicates the need for a lot more financial education. That wouldn’t be an easy thing to bring about but, in any case, it’s doubtful whether it would work. The Economist goes on to say that a report by the Federal Reserve Bank of Cleveland failed to find evidence that financial education programs lead to greater financial knowledge and better financial behaviour. A survey of American students found those who had not taken a financial course were more likely to pay their credit card in full every month (thus avoiding interest charges) than those who had.

If you think that’s bad, try this: consumer enthusiasm for learning about finance is limited, even among those with a pressing reason to want to know more. When a free online financial literacy course was offered to struggling credit-card borrowers, less than half a percent of them logged on to the website and just 0.03 per cent completed the course. The Economist observes that those who choose to be educated about finance may be those who are already interested and relatively well-informed about it.

Statistical agencies such as the ABS are, in the main, wholesalers of statistical information. While in principle it’s open to any member of the public to look up information on the bureau’s website, in the main the users of the bureau’s services are professionals rather than amateurs: people with training in the use and interpretation of statistics employed by government agencies, corporations and universities. Among the professional users of the bureau’s services are journalists, some of who have training in the interpretation of statistics, but many of whom don’t. The news media are the retailers of statistical information; we take it from the bureau, interpret it and communicate it to ordinary members of the public. When other government agencies, businesses and academics do further processing of the bureau’s data, it’s still usually the media that on-sell it, so to speak, to the public. That’s our role in the process: we convey statistical information to ordinary members of the public in their private capacity. This may involve alerting particular business people or public employees to the existence of certain statistical facts but, for the most part, I’d expect such people to do their own analysis and consult their own experts before making decisions on the basis of something they’d read in the paper or heard on TV.

That’s the context in which I work as user, repackager and on-seller of the bureau’s data output. And in this position I needed to be forever cognisant of the cognitive limitations of my audience and the almost infinite scope for misunderstanding. I’m not sure how much of what I’m about to say will be of use to you but, since I’ve been asked, let me get down to the nitty-gritty about the use of statistical data - big or otherwise - by journalists.

The commercial media are in the business (literally) of telling their audience things that will interest them. We find that when we tell people things they probably need to know, but are rather dull, we don’t sell many copies. The stories we write are called stories precisely because humans are a story-telling animal: people have an infinite interest is stories. Stories about what? In the main, about other people. Our audience isn’t particularly interested in concepts and analysis, it’s interested in people. This presents a major problem for economic journalists because, although economics is about the way people live and work, it deals with ‘the daily business of life’ in a way that’s highly conceptual and analytical, using aggregate statistics that seem most impersonal and hard to empathise with. It’s well known people aren’t particularly moved by, say, a story reporting the death of 5000 people in a flood in Bangladesh, or even a story saying unemployment has risen by 10,000 in the past month. In the philosopher Peter Singer’s book about personal giving to worthy causes, The Life You Can Save, he quotes well-known psychological research about the largest number of people in a news story that readers are able to empathise with. The answer turns out to be one. This explains why so many overseas aid agencies have a photo of just one person in their ads, why they use sponsorships of individual children to raise money, even though that money will probably go to the whole family or even the village. It also explains why so many news stories about government policy changes or stories from Australian Social Trends are built around a ‘case study’ and photo of just one person or family. And just think of what this focus on individuals means for journalists writing about inflation figures, unemployment figures or national accounts aggregates.

Many journalists - including political journalists, though not specialist economic and business journalists, thank goodness - come from Artsy, literature backgrounds where their maths is weak (they’re never sure how to work out a percentage change) and they find numbers a bit frightening. They generally steer clear of statistical data, and when they do quote a figure they not infrequently get it wrong. If so many journalists find data off-putting, what does that say about our audience - many of whom would nonetheless have a university education?

In my use of data I force myself to quote as few figures as possible. I try to round numbers wherever I can (which makes them both easier to mentally absorb and easier to remember) and rarely take any number out to more than one decimal place (which also avoids spurious accuracy). I try to use vulgar fractions rather than percentages and am always saying things like ‘more than a third’ and ‘almost half’. This aids comprehension and recollection, but also is less off-putting because it uses words rather than numbers. Research by Gerd Gigerenzer, who has done a lot of work on the comprehension of numbers, leads him to go even further and favour the ‘two people in five’ approach. One of my rules is that every number must be adequately labelled. In particular, it’s important to make it clear whether it refers to a stock or a flow. I’m a stickler for ‘percentage points’ rather than ‘per cent’ when that’s what I mean. I always try to make it clear when I’m taking about changes in the share of some total - eg a fall in manufacturing’s share of total employment doesn’t necessarily mean it now employs fewer people.

What we’re talking about here is avoiding a cognitive bias psychologists call ‘the curse of knowledge’ - the unconscious assumption that, if I understand something, everybody else does too. Take it from me - they don’t. But I suspect many statisticians suffer from the curse of knowledge. If you want to be a good communicator of statistical data, you need always to be reviewing the realism of your level of ‘assumed knowledge’.

Far from being coldly rational, as we commonly assume, people’s interpretation of statistical facts is heavily influenced by instinctive and emotional reactions. I think it was Gigerenzer who conducted experiments with many groups which found that people would much prefer a medical procedure with a 90 per cent rate success to one with a 10 per cent failure rate. It reminds me of the man who invented death insurance, but had trouble selling many policies until he renamed it life insurance. Lest you think all this is about terribly simple souls, the experiments found that even doctors much prefer a 90 per cent success rate. Doctors are also suckers for the base effect (as are many journalists). They’re always saying that doing some naughty thing doubles your chance of getting some terrible disease. But they recoil in puzzled silence when you reply: doubled from what to what?

This is saying all of us react differently to a piece of information - including statistical information - depending on the way it is spun (if it comes from a politician) or packaged or, as the psychologists say, ‘framed’. The psychologists remind us it simply isn’t possible to present information meaningfully without framing it in some way. All data is communicated in a context, and changing that context will change the way people interpret the data. Those who design survey questions understand this full well. But all statisticians need to understand it because it means they need to put a lot of effort into trying to make their framing as neutral as is possible.

Finally, and in the light of all this, I’m sure that, for many people, the effective communication of statistical information is greatly aided by data visualisation. The media - including my newspapers - are putting more effort into creating our own data visualisation graphics. In this endeavour we’re greatly aided when statistical agencies and other official data providers (such as treasuries) present the original data in ways we can easily ‘scrap’ from websites or in Excel spreadsheet files we can quickly and easily copy electronically. But I need to remind you that journalists are reporters, not researchers. We do little processing of statistics of our own volition. But if statistical agencies start producing their own whiz-bang data visualisation graphics, you can be sure we’ll be happy to retail them to our customers.

While on my feet giving the talk at NatStats last week I had a flash of insight that wasn't in the talk itself:

Econocrats are always dinning it into journalists not to base judgements about the state of the economy on anecdotal evidence rather than economy-wide statistical indicators. But the media are largely devoted to the provision of anecdotal evidence because anecdotes are just stories about people and stories about the experience of people (including themselves) are what our readers understand, identify with and are motivated by and use to understand their world. Good economic journalists try to do it the other way ie ascertaining what the macro indicators are telling us about the state of the economy and then finding stories about individuals which illustrate the stats.