Rob Woods has worked in the analytics industry for over 20 years. He’s currently an Analytics Solutions Architect working on IBM’s Watson FSS suite of products. All views here are his own and not IBM’s.
Describe your own background and how you came to be working with statistics / data / analytics
I started as a university lecturer and research assistant in the early 1990s working with econometrics and GIS software. When I look back on it now, we were focussing on areas which are now widely accepted as important work but at the time were seen as pure research pieces.
We built models that predicted football transfer prices. Football clubs now apply some of these techniques for predicting injuries and identifying whether a player represents good value for money.
I worked on a number of publications but then decided I wanted to develop my career further by working in a commercial area. I then moved to London and joined SPSS where I worked as a consultant in many different industry and application areas.
I moved to IBM when SPSS was acquired in 2009 and have had a range of different roles, working with customers in industries including manufacturing, retail, financial services and utilities. I’m currently working on Watson Financial Services focusing on fraud protection and safer payments.
What is it that you like about working with data?
I love the creative aspect of taking a raw set of ingredients (i.e. the data), building models and systems that can bring real benefit. Having worked in different sectors I’m inspired by the limitless application areas where data science can be applied.
I realised the power of data science when a surgeon rang me up in the 1990s and explained he was using analytics to reduce rejection rates in kidney transplants. He’d built models to predict a potential patient’s likelihood to accept or reject a kidney donation and by doing this he had significantly improved the success rate from his operations.
I see data science as being like the car engine, but people don’t buy a car for the engine alone they want to know how the car drives. Likewise, data science can only work if it is deployed in a wider context and in a way that works.
Whilst it is called data science these days, I actually think that it is more of an art than a science. Data scientists can be given the same data set and achieve a different result.
What are the particular challenges facing financial services organisations today?
Banks and insurance companies are having to address regulatory compliance, fraud and operational efficiencies.
In banking, financial regulators such as the Financial Conduct Authority (FCA) have increased the frequency and number of fines, particularly over the last five years. In addition to this, the amount of regulation is also rapidly increasing.
Every year thousands of new regulatory actions are issued around the world. Banks are having to respond to this. It is increasingly important that financial institutions are looking to more sophisticated RegTech based solutions to show that they are addressing any non-compliance issues they have.
In Europe, we have seen more focus on fraud management systems in the run up to the roll-out of the Revised Payment Services Directive (PSD2). This regulation will improve customer protection for online and mobile payments, such as through standardised open banking, making cross-border European payment services safer. Real-time fraud management systems align well to this process and provide additional customer protection.
Enhanced data security has also become mandatory. PSD2 has also focussed on stronger authentication when making payments.
There are now some very strictly defined standards designed to protect customer data. Additionally, some very important standards have been developed including Payment Card Industry (PCI-DSS) and Payment Application (PA-DSS) Data Security Standards. These include best practices to protect customer data, such as not retaining CVC codes, security logging and data encryption.
What are the main applications of advanced analytics in financial services?
Advanced analytics solutions can detect and discover fraudulent behaviour. They can also be applied to combat money laundering. In insurance, claims can be scored to establish whether they have a high risk of being fraudulent. Those high-risk claims can be sent to Special Investigation Units for further investigation.
AI/machine learning techniques can discover new and existing patterns of fraud to make sure that the right claims are sent to investigation teams. Low risk claims can be fast tracked, improving customer services. These techniques are transferable across different lines of business such as medical provider insurers as well as general household and motor.
In banking, AI machine learning techniques are used to stop payments that could be fraudulent. This is increasingly important when transfers of money occur more frequently and within a much shorter timeframe. Regulators now stipulate that real time fraud detection systems are in place since the introduction of instant payments and increased use of online channels such as mobile.
This has also created challenges where banks have had to evaluate whether existing fraud management systems can cope with this shift of focus and in many instances are having to migrate to more purpose-built applications.
More sophisticated fraud solutions will block online payments if they look fraudulent. Profiles of cardholders are updated real time. They can then detect whether subsequent payments are feasible based on the location. For example, if a payment is being made in two different places in a short space of time then it might be fraudulent.
Banks need to check on an ongoing basis whether they have any banking interactions with known criminals who are involved in money laundering. There are anti-money laundering (AML) solutions which check incoming transactions against watchlists and other sources of intelligence. Traditional solutions can generate large amounts of false alerts. You can use advanced analytics to reduce false positives in transaction and customer list screening.
In the context of those challenges how does / might improved use of analytics help?
For AML, this improves operational effectiveness without increasing risk of missing true positive alerts. It can assess the likelihood that an AML alert is likely to be genuine and prioritise investigations accordingly. It can also detect patterns which happen over a series of transactions over time which also would be indicative of money laundering.
For fraud, data science can also reduce false positives and discover new patterns of fraud. One of our systems caught a significant fraud ring within the first three months of implementation. When a policyholder made a claim, the claim’s risk was evaluated by our fraud risk model. Our model gave the claim the highest possible score and the claim was routed through to the Special Investigation Unit. Further investigation revealed that there were common elements with other claims that had been made.
What analytics do you do on a day to day basis – and what range of tools / techniques do you use?
I use a combination of IBM tools such as Watson Studio, SPSS Modeler and some open source tools such as Python and R. I try out many different approaches and techniques, both supervised and unsupervised learning techniques, anomaly detection models and decision tree models such as random forests.
It’s not all about the techniques though – it’s also down to what you do with your raw ingredients. Feature engineering is a really critical part of the process. This is the process by which input factors that could be predictive are prepared before modelling takes place. Feature engineering is effectively calculations on top of the base data. These can be important calculated ratios and other business driven indicators that the data scientist feels would be important for prediction. In practice, hundreds of factors can be derived. Once feature engineering is complete, there are feature selection techniques that will select the ones which are going to be the most predictive for modelling.
We also have some dedicated RegTech solutions which offer sophisticated deployment platforms such as IBM’s Financial Crimes FCI platform and IBM Safer Payments solution for detection of real time payment fraud.
What does this insight deliver to client organisations / running of the business?
It’s more than insight, it’s about embedding intelligence from analytics into the decision-making process. Systems can potentially save millions if designed well and integrated into a ‘business as usual’ process. If your machine learning discovers a pattern you must act upon it, which leads me on to the important but sometimes overlooked area of deployment.
Deployment can be tricky as integration with other systems is often required. Models can degrade over time, so it is important to have an easy process to rebuild/refresh and promote new models and rollback to previous versions when required.
I met one company who had a system which had a set of static business rules and they never updated and improved the logic as it would require a major IT project to do so. Bottlenecks in the deployment process will result in not being able to fully realise the full business benefits of data science.
IBM has model management software such as Watson Studio Local and SPSS Collaboration and Deployment Services to ensure that model deployment is an easy and streamlined process.
Can you give some examples of interesting Internet of Things (IoT) / predictive maintenance projects?
In the area of IoT, data science is used to predict when robots may fail. For example, in car manufacturing, data science can be used to predict when a body shop assembly robot fails. Also paint robots can give substandard finishes which would require rework. Robots give a rich source of data, such as movement and current used and measurements are recorded many times a second to give a full picture of what is happening.
Data science can provide early detection of problems and highlight when and where to do maintenance. We have used anomaly detection to pick up on systematic deviations in robot behaviour away from what would be considered normal.
Visual recognition can be used to detect problems in the manufacturing process, for example a machine learning model can analyse a picture or image and then spot a defect. Image recognition models train from many instances of specific images. If, for example, you need your image recognition to pick up defects, then the model will need to be trained using pictures of the defect against pictures with no defect.
Models once trained can then score new images during manufacturing and help give engineers warning of potential problems which could indicate systemic problems. Early detection of repeat problems here can save manufacturers significant cost of rework and improve manufacturing quality and consistency.
In the utilities industries, predictive maintenance can be used to predict where problems are in gas or water networks. Predictive maintenance helps identify where to perform proactive maintenance and therefore reduce reactive maintenance.
Utilities companies have a rich source of data on assets, historic maintenance activity and other data such as weather conditions. Our modelling process risk grades all assets on a regular basis, and this is then rolled up to a programme of work for maintenance teams to proactively visit areas where higher risk is predicted. By identifying areas of highest risk, maintenance spend is targeted more effectively.
How do such projects deliver value back into the organisations?
For car manufacturers, robot breakdown can have a significant financial impact due to lost car production. This can amount to thousands of pounds per minute and a decent early warning system can prevent this.
In utilities, there is an obvious benefit to customers if a sewer flood, a gas leak or a water burst is prevented. Utility companies are regulated, and it is important that they hit acceptable levels of service. They can be fined and missing acceptable levels of serviceability can affect how much they can charge customers for their services.
What are the challenges / opportunities associated with using IoT data?
The sheer volume of data and false positives can be a problem. If an engineer receives many false alarms where an early warning system alerts them and there is not a genuine problem, then they will lose confidence in the system. Sometimes robots can transmit hundreds of records a second.
Operational buy-in is also important. You can build a great system, but the maintenance and operations team must be able to act upon any findings. It must also translate into a realistic program of maintenance work. Sometimes, this may mean maintenance teams changing the way they work, for example to more of a proactive regime. Operations may initially be resistant to this unless they see clear benefits from changing the way they work.
There are many opportunities as a lot of IoT data can be good quality. We once picked up that a robot was losing oil. It was a subtle pattern, but this was something that we were able to pick up because of the granularity of the data. The business benefit of detection can be massive. If a robot is out of action, in the extreme, production can halt causing tens of thousands of lost production revenue per minute.
How has working in the analytics space changed during the course of your career? Have you noticed changes with the emergence of the media interest in big data / data science?
Yes. When I started at SPSS, we had to spend months explaining to people why analytics goes beyond business rules and gut reactions. It’s taken 20 years or more but now data science is mainstream, and everyone is talking about it. Computing power is cheaper and faster. Facebook, Amazon and Google have built their businesses around analytics. Also, the emergence of GPUs has made techniques such as visual and speech recognition become an everyday event rather than a research dream.
What’s the influence / impact of open source analytics packages like R been?
It has made the price entry level for analytics much lower. In the past you used to have to buy a commercial product, now anyone can get started with open source tools. This has increased the supply of data scientists. However, it has made a technical area even more technical due to the reliance on developers and programmers.
Commercial data science tools such as SPSS Modeler still have an important part to play as they increase productivity and can be simpler to use. I believe anyone can be encouraged to get into data science but should not be put off by having to write code. Some of the best data scientists I have worked with have not been programmers. Rather, they have had a strong ability to think about the problem they are solving and a good understanding of how to work with data.
How do you see the future of advanced analytics?
I see three important themes –
- The deployment of data science will continue to be a challenge. Data science will continue to grow but the deployment aspect of managing models will need to mature and standardise. Businesses want this to be simple and as hassle free as possible. They don’t like technical complexity. Lower complexity, cost of ownership and ease of use is really important.
- Delivering value quickly is still a challenge as there are still a lot of misconceptions around analytics. AI and machine learning is a hot topic these days but there is still the need for a well-defined approach to defining and solving problems. It’s also critical to build the right team around a project to focus and prioritise on the right approach and outcome. The machine cannot learn if it does not know what problem it is trying to solve.
- Building the right team for success. Finding a decent data scientist who has good business, analytics and IT skills is still often a challenge. It is rare that one person will have all of these skills as people tend to be strong in only one or two of these areas. You can however also build out teams which focus on the different aspects of the data science and wider process. We now talk about data engineers, data scientists, data journalists. I prefer to call it building the right team that has an appetite for success and has the right mix of business, analytics and IT skills. Also don’t underestimate business sponsorship and process change around the analytics. Start small and try to get some quick wins that will excite people.
What has been your favourite analytics project and why?
It would be impossible to pick one. I really enjoyed working in the water industry and becoming a guru in predicting where sewer floods are. I also loved working on real time systems as you can stop a fraudulent payment before it happens or detect a fault on equipment whether that’s with a robot or an escalator. Detecting interesting cases that the customer wouldn’t have seen without analytics, that’s the highlight for me.
It’s given me the opportunity to travel to places I wouldn’t have otherwise visited. I love learning about new areas and meeting new people. This might sound cliched, but in many ways the best projects have been working with people who have had an ambitious vision and a will to make it work. There will be challenges but overcoming these problems is part of the process.
And the most frustrating?
Again, it would be unfair to single out a project. It’s frustrating when a customer does not think big enough and thinks analytics is solely about running reports but not acting on anything. I believe that a customer needs to work closely with you to get the best outcome.
Also, I’ve had customers who don’t believe they can do anything because their data is not good enough. It is amazing what you can do with the raw set of ingredients. I had one customer who produced a map to show where all of their sewer incidents were. They were all in the sea! After cleaning the data, using the address where the incidents occurred with an accurate geolocation lookup, we were able to sort the problem out.
I think nearly all customers will be prepared to eventually take the leap of faith. Believe in analytics and you will be rewarded!
Any words of advice for people entering into a career in analytics?
You can learn about the newest and shiniest algorithms but don’t underestimate the importance of thinking for yourself. If something doesn’t look right with your data, question it and try to understand it. Build up your business knowledge and don’t try to be too clever. Soft skills are equally as important. Try out different things and learn what you like to do. Occasionally step outside your comfort zone and most importantly have fun. Data science can be difficult to explain so try to be clear in how you present your results and always ask “Why are we doing this?” and “What is the benefit of it?”.