Many of our clients regularly hire new analysts and we’re often involved in discussions about what the core skills are that they should be looking for. Similarly, I often talk to people looking to build a career in analytics who want to know what skills they need to develop. The most skilled analysts are in high demand because they blend together a range of skills that are rarely found in a single person. Here are the things that I think are really key.
Domain knowledge about your industry
It’s not enough just to have the technical skills. As we have argued before on this blog, good analytics projects depend on the analyst having a solid understanding of the industry in which they are working. Algorithms, formulae and models have no value in the abstract – their value comes when they are deployed within an organisation so the analyst needs to understand the objectives of the organisation, the strategic direction, the constraints under which it operates and so on. Without that, there’s no way that the analyst can leverage their technical expertise in order to really make a fundamental change within your organisation. I’d temper this by saying that everyone has to be given an opportunity to learn and this is true for domain expertise too, so for more junior data science roles you may look for some familiarity and a clear understanding of applied data science beyond the abstract.
Machine learning and data mining skills
Clearly anyone you hire as an analyst needs to be a competent data miner with a very solid grasp of key statistical concepts, as well as an understanding of which models or methods most relevant or applicable in what kinds of situations. It’s sometimes the case that graduates from highly theoretical analytics-based degree programs can struggle when it comes to deploying those skills, learnt in the abstract, on messy real world projects. Ultimately if someone doesn’t fully understand the tool they’re using, be it CHAID models, Support Vector Machines or Hidden Markov Models, then they’re not going to be able to logically interpret the results for you, and flawed interpretation can lead to decisions being made based on incorrect information, which potentially serious consequences for your business.
Familiarity with big data processing platforms
Whilst I wouldn’t expect every data scientist to be expert in this area, it’s clearly important that they have a good understanding of how data processing frameworks such as Hadoop, Spark and so on fit into the overall data science environment.
Comfortable with both structured and unstructured data
The focus here tends to be on structured data and the key thing there is to be competent in writing queries in SQL. Generally data scientists tend to be pretty strong in this area, but what about when it comes to unstructured data? Here there’s much more diversity of tools – no one tool dominates the unstructured data environment in the way that SQL does for structured data. Again, it’s not realistic to expect data scientists to be familiar with every possible unstructured data tool but they do need to be generally comfortable working with unstructured data and understand how to manage it, perhaps with good knowledge of one or two NoSQL database system implementations.
This is a field in which technology moves fast and a good data scientist will be proactive about keeping their skills up to date. Excellent coding skills and the flexibility to keep them updated are critical. It doesn’t necessarily matter which languages someone knows, as much as the willingness to adapt and to be able to learn new skills quickly. That said, of course some languages are more suited to particular applications than others.
If you’re looking for someone to conduct pure data analysis then R can be useful. However if you want someone not only to conduct analysis but also to be able to develop software and solutions then something like Python might be more relevant. Particular industries tend to have biases towards certain products or preferred solutions. If you’re in academia, for example, then SPSS is most often the tool of choice. It’s not necessary (or indeed possible!) for an analyst to be familiar with every tool in the market, but I’d certainly expect them to be comfortable with several, even though they might specialise in one.
We’re all aware of the stereotype of the statistician or technical expert talking in language that no one else can understand. This isn’t good enough any more. Whilst your data scientists may not necessarily ever be client-facing, they certainly will be internal staff-facing. The role only succeeds if the data scientist can communicate their insights effectively and engagingly to those whose job it will be to deploy them throughout the organisation.