We are living through a golden age of technological innovation and revolution. Whether it’s the blending of real time data, technical software and hardware instrumentation to deliver efficient, user friendly and zero carbon buildings or the combination of data, artificial intelligence algorithms and automotive technology to enable semi-autonomous vehicles, we are witnessing rapid expansion of the human race’s technical horizons and possibilities.
One of the key enabling elements for all of these developments is the ability to capture, organise, store, analyse and model large volumes of data. The outputs of the analysis and modelling of this data can then be used to improve decision making, often in real time. One of the main driving factors behind this change has been the advent and growth of cloud computing. That is the exciting headline. However, the challenge is how organisations can best take advantage of the opportunities that cloud computing presents.
I would argue that there is one discipline that can really help your organisation demonstrate the value of cloud computing. That discipline is data science.
First of all, let’s define what we mean by data science. At its most fundamental, data science is simply about using analytical techniques to identify repeatable patterns in data which in turn enables often complex and important questions to be answered. The answers to these questions, assuming they’re well-framed and carefully defined, should enable your organisation (regardless of its core activity) to do things better.
The outcomes of data science use cases are broad and varied, however they often include elements of the following:
- Improved operational efficiency (better use of resources and improved financial performance)
- Better clinical outcomes for patients
- Generating more revenue in a sustainable way from your customers
- Improving customer retention and understanding its causes, understanding risk characteristics to mitigate the effects of exposure to it
- Finding patterns of non-compliance and fraud to offset its negative impact
- Using manufacturing data to improve product quality and consumer satisfaction etc.
The number of potential data science applications is enormous. So let’s now talk more about why data science lends itself so well to cloud computing (and vice versa):
- The cloud is ready for data science when you are – Cloud-based computing environments have the benefit of being ‘on demand’. A cloud environment can be quickly and easily configured and used to enable an initial project to be set up, boasting all the latest data ingestion, wrangling, modelling and model deployment tools and at relatively low cost.
- Scalable data science infrastructure on tap – Once completed, assuming the project is successful, this same environment can then be provided with more cloud computing resource to enable larger data volumes and broader data sources to be incorporated. This is quite different from (and much simpler than) the process that an organisation would need to go through if attempting the same approach with on premise software and hardware.
- All the contemporary data science tools you’ll need are in one place – All the large cloud providers (e.g. Amazon, Google, IBM or Microsoft) offer flexible and fully developed data science tooling and apps to help you establish your data science capability. Many of these are open-source technologies that have been optimised for that vendor’s specific cloud environment. These tools and apps will cover your requirements from initial ingestion of data (the process of importing, transferring, loading and processing for temporary or permanent storage and use), through analysis and modelling tools and onto model deployment. These can often be very low or zero cost, especially if you are running a constrained pilot.
- Information security and governance built in – Data science projects almost always use data that has an element of competitive value and therefore commercial sensitivity. If your organisation engages directly with individual customers (members of the public) rather than other businesses then there will also be personally identifiable data that immediately carries all the implications of the General Data Protection Regulations (GDPR). These information security concerns are an unavoidable occupational hazard for any organisation or data professional. One of the benefits of making use of commercial cloud offerings of the type offered by large technology companies is that a GDPR compliant infrastructure, robust cyber security of your cloud instance and well-documented security, disaster recovery and information security policies are built in for you. This does not excuse your organisation of any responsibility or the need to complete your own due diligence but it is another variable cost that is already factored in. This allows you to get a project moving much more quickly than might otherwise be the case.
- Portability guaranteed – One of the commercial concerns about making a commitment to a specific cloud platform provider is the risk, perceived or real, of getting locked into that vendor without an easy option of shifting to a new supplier. This perceived risk is true of a data science project as it would be of any other technology enabled programme. Once again, the cloud environment provides data science projects specifically with some valuable capabilities that really help to mitigate this risk. In most cases, the big four cloud platform providers support, at least to some degree, an open-source technology project called Docker. Docker is a tool designed to make it easier to create, deploy, and run applications by using so-called ‘containers’. Containers allow a developer to package up an application with all the parts it needs, such as libraries and other dependent components, and deploy it as one package. Thanks to the container, the developer can rest assured that the application will run on any other Linux or Windows machine regardless of any customised settings that machine might have. Although a technical and rather specialist toolkit, Docker (along with other containerization alternatives that are available) simplifies and protects the process of ensuring that your data science technology infrastructure remains easily portable from one cloud platform to another.
In summary, contemporary cloud computing revolves around an axis of efficient data ingestion, management and utilization. The raw material of applied and useful data science is data. That contemporary cloud platforms are readily accessible, fully equipped with a panoply of data management, data storage, advanced analytical and modelling tools is hugely valuable. Add to that checklist built in cyber security and data protection measures and the ability to package and move a configured suite of software tools and custom workflow from one cloud environment to another with relative ease, and the stage is set for data science use cases to show case the power and flexibility of cloud computing.
As a counterbalance to the enthusiasm and excitement that we have for data science using cloud infrastructure, it is also sensible to note that many of the tools, data focused activities and capabilities listed above do require some degree of technical expertise and effort. In that context your organisation may benefit from some help and support in the initial project and subsequent iterations. Getting the right support will save you time and help you navigate potential pitfalls and common mistakes.
If you would like to start your data science practice in the cloud, or if you have a current on-premise approach that you would like to migrate to cloud, please feel free to get in touch.