Data science projects are unique in the sense that each one has their objective. However, they can typically be distilled into 5 common steps. We spoke to James Londal, Director of Data Science at MyLife Digital who explained the frequent lifecycle of a Data Science Project.
Ask a Question
This first step requires both domain and commercial experience as any data science team will likely be faced with several questions to answer. While the team may start with a question like “Who are our best customers” they’ll swiftly work to refine this. Open ended questions are they bane of any data science team, so the question will become more specific until they distil into a more precise version such as: “Why have some customers been more profitable than others over the last 12 months?”
Explore the Data
The exploration stage is all about validation. Teams already have the data to find an answer to the question outlined in the first step, it may just need merging, reformatting or transforming to assist the next step. Unfortunately, this step can often be rushed and suspected biases can be overlooked instead of being properly addressed, in favour of jumping towards the obvious answer.
Fit a Model
When choosing a statistical model, a data team should be aware of the underlying assumptions and compromises. Every statistical model will have some form of these for example neural networks compromise on interrupting while Logistic Regression provides an interruptible model, but makes assumptions on the structure of underlying data which may result in a weaker answer
Before you fit a model, it’s important to identify who you are communicating your answer to and with that knowledge, decide on the most appropriate model.
This step can become increasingly difficult when you’re trying to communicate your results to the less technical members of the company. Ensuring the end client can relate the answers to their own experiences is a simple measure that can make this process much easier. It’s also important to point out the strengths and weaknesses of the processes you used and justify your methods.
Making a Decision
The final step is deciding whether or not the answer you’ve deducted through the process is useful. For example, if you’re investigating why your customers are cancelling subscriptions, it would be beneficial to exclude users visiting a page detailing how to unsubscribe. It’s safe to assume that those visiting the page are about to leave anyway.
The biggest pitfall I have observed from Data Science Teams is failing to understand what decisions could be made off the answers they have provided.
A Data Science Team needs a balance of technical skills, domain experience and communication skills. Structuring your team so that Data Scientists can be paired up to compensate for strengths and weakness.