Data: it’s the stuff of IT. We hear about it all the time. Databases, datasticks, data lakes, data quality, data mining, data analytics. Just as with IT and iPads, so the consumerisation of data has led to everyone having an apparently expert opinion on what it is, what it’s for, what’s wrong with it and how to fix it.
However a (pseudo) academic examination of data helps to differentiate what’s really important to us. No doubt you’re familiar with the trope that DATA + MEANING = INFORMATION; that is to say, we humans can invest raw data with a connection to the concepts of our lives. Perhaps you’ve also considered that adding CONTEXT to INFORMATION yields KNOWLEDGE, a firmer anchoring to specific real-world objects, locations and people. Those sages among us looking to improve the human condition will be able to work out how to APPLY KNOWLEDGE in order to attain WISDOM. These ideas are commonly illustrated using a DIKW pyramid.
A simple example of these concepts: the data “red” may be enriched by the meaning “red traffic light”, placed by the context “the traffic light in front of me has just changed to red”, from which you may glean the arcane wisdom that “I should press my foot on the brake”. Redness alone wouldn’t lead to this action, otherwise we would have to choose a different colour to paint the edge of speed limit signs. Then again, thinking about the implications for road safety…
The American logician Charles Sanders Peirce, known as ‘the father of pragmatism’, founded semiotics as a system of symbols or signs, representing signals or stimuli. He missed out the fifth ‘S’ – data is stupid. It is dumb and devoid of meaning; it can be read one way or another and care must be taken to correctly identify meaning and context.
So why the obsession with data, and not information, knowledge or wisdom? We are a society growing ever more confident with technology and technical concepts. For example, take the current resurgence of ‘big data’ in the housing sector. The definition of the term varies, but commonly accepted characteristics are that the dataset is too large to be processed by traditional software applications, and that there are insights that can be gained only from such a large scale. However, one could argue that you could take all of the data held in databases by all of the housing providers, put it all together and you still wouldn’t have enough data to qualify as true ‘big data’. The advent of IoT devices over the next few years will certainly start to generate truly large volumes, but in truth this has barely started.
This isn’t just semantic quibbling; the danger of focusing on the mechanics of advanced data manipulation is that we miss opportunities for more easily-obtained benefits, glossing over the need to address basic problems that obstruct these. Predictive analytics, K-means clustering, sentiment analysis and other such techniques won’t help you much at this scale if the data you’re throwing at them is rubbish, without meaning or context. Worse still, and as GDPR go-live on 25th May 2018 draws inexorably closer, organisations are beginning to wake up to the threat of ‘dark data’ – that which is collected or generated, but not used, maintained or even acknowledged. It’s difficult to assess these ‘Donald Rumsfeld-style’ unknown-unknown situations, but Veritas estimates that one third of enterprise data is ROT (redundant, obsolete or trivial) and perhaps one half is dark; unclassified, under the radar and potentially presenting an unknown threat.
Why do we suffer from these data silos, poorly-described data, duplication and conflicts? To understand why a lack of data standards and poor integration are endemic, consider the example of the traditional schism between the development and housing departments within a provider. In many situations I have encountered the complaint that detailed information about the configuration of new builds is held by the development department but not made available to housing management or sales; efforts to resolve this frequently fail not because of technical barriers, but because there is a lack of a strategic imperative from above and nothing in it for the development team. In short, data integration is actually a people problem.
There have been attempts to improve things. In the mid-noughties, the now-defunct ODPM poured millions into e-government projects, giving rise to laudable but ultimately doomed projects such as the Adapters Club, which sought to commoditise integration and thereby drive down costs. In 2008, Housing Technology itself launched a standards board with the aim of engendering cross-sector collaboration; sadly the inaugural and only meeting between housing providers and IT suppliers quickly descended into trench warfare (although this project is perhaps only dormant and remains an ambition for Housing Technology). More recently, a group of UK housing providers has been working to emulate the CORA project in the Netherlands, which saw the broad adoption of standard definitions for data entities and processes across the housing sector, and this has now been adopted by HACT. At the strategic business level, Home Group is leading a consortium of providers to define a set of performance indicators forming a ‘sector scorecard’, enabling meaningful performance metrics that can be used to negotiate with government agencies.
All of these initiatives will require significant investment by housing providers, and so present a familiar challenge: how do you convince your board to fund something inherently technical and with strategic but no obvious immediate tactical benefit? Well, there are some low-hanging fruit to consider, which can serve to illustrate the potential returns of giving your data some TLC.
One such area is stock assessment and options appraisal. In this era of cost-awareness, housing providers, when suitably armed with the right tools, can easily identify significant efficiencies by examining more closely the performance of individual properties. Further revenue and cost-saving opportunities can be revealed by projecting value based on various planned maintenance, disposal or development scenarios. Regulatory compliance around risk management can also be achieved by stress-testing the business model against scenarios such as tax changes, inflation and/or further rent reductions.
These critically important analyses will be reliable only if the underlying data is of decent quality, consistently defined and up-to-date; any executive who has been around the block will know to beware of the common ailment ‘garbage in, garbage out’. Nevertheless, establishing the value of such data-driven planning is relatively straightforward, and can help to create a culture of confidence in data. Only when this has been achieved should more arcane methods be proposed, if they are not to be dismissed as impractical nerdiness or, in more analytically-challenged settings, witchcraft.
Pursuing its interest in such magical arts, last year Orchard engaged with Lancaster University’s Centre for Forecasting in an experiment to see how accurately responsive repairs demand could be predicted. Anonymised datasets was provided by two clients, and the standard measure MAPE (Minimum Average Percentage Error) used to assess the accuracy of the predictive model. Benchmarks for this measure range from 13 per cent down to seven per cent in the retail sector. After 12 weeks’ work we were able to forecast demand by trade type to an error of between 15.2 per cent and 5.3 per cent, and this with only minimal exploration of the data.
Such accurate demand prediction on a provider’s most costly service offers exciting possibilities for significant cost reductions, and Orchard has subsequently hired a data scientist to further develop its expertise in predictive analytics, focusing in the first instance on financial distress and rent arrears in a partnership development project working with four leading housing providers.
Data analytics is very much a here-and-now technology, so what’s coming next? To sum up the general trend in consumer technology, one might consider the following assertion: nobody wants to use software.
The point is that software is not something that people want to use; it’s merely a means to some other end (unless you are, like the author, a geek). With this in mind, one should always question whether it is necessary to expect humans to type things into screens, or as Google design strategist Golden Krishna put it, “the best interface is no interface”.
Apps have been around for some time and are now commonplace, but it’s clear that their days are numbered. Alexa, Cortana and Siri and… um… Mr. Google all have it in for the touch user interface, and the integration of AI-driven micro-services into messaging is the way forward. So if this is going to be the new paradigm for customer interaction, then what about the data that underpins it?
Arguably one of the main underlying causes of the appalling state of enterprise data is the fact that everyone keeps their own records, using inconsistent methods and to very poor quality standards. Perhaps the answer to this is distributed ledger technology (DLT if you must acronymise, and nothing to do with old Radio 1 DJs), the best-known example of which is Blockchain which underpins the world’s most successful crypto-currency Bitcoin. DLT promises to globalise data integrity by enabling any transaction anywhere to be recorded in an indelible, incorruptible and unique record. It is predicted that this will eventually supersede private databases and solve the problems that they bring, but it is likely to be many years before Blockchain gains sufficient traction to be regarded as mainstream.
There is therefore plenty of opportunity in the interim to be realised from investment in data cleansing, data mining, standardisation, integration, sharing and aggregation. We need to realise that it’s our data that fuels everything, and if it’s of poor quality then we shouldn’t be surprised when we get poor outputs. The traditional approach to the implementation of IT systems in the housing sector is sometimes rather like buying a new car, filling it with coal then trying to pull it with horses. In the new world of the digital customer this won’t be tolerated, so we must improve quickly. A good place to start is simply to look at your data and ask, “what is it, what does it mean and why do we hold it?” The answers to these questions should begin to uncover some rapid benefits, lead to more questions and start to build a culture of confidence in data; this way you can hope to avoid the expensive futility of endlessly searching for the perfect system.
Aidan Dunphy is head of product strategy at Orchard.