Inspired by my last article, Graph databases, what are those? I decided to carry on in the same spirit of questioning and exploration. Taking a step back to the very beginning, I’ll be attempting to dissect the very concept of what it means to data-science, and try to paint a picture from my perspective. Moreover, I hope for this to serve as a reference to everyone that ever asks what I do.
It has been a couple of months into my journey as a Data scientist, I’ve completed the flatiron immersive Data science bootcamp, spent countless hours on stack overflow ( there should be a certificate for that), used numerous learning tools and mini courses, it’s time for some introspection.
What is a“Data scientist” ? Well, the simple answer, the sexiest job of the 21st century, according to Havard business review. But that would make this write up too short, so let’s take a deeper dive in.
The word Science in Data science, refers to the scientific method; making an observation, asking the right questions, creating hypotheses, devising experiments to test these hypotheses, analyzing your findings, and then coming up with a conclusion. Simply illustrated in the diagram below.
Within the space between observing our data and drawing our conclusions, we could tweak the question(s) again and again according to some type of data based feedback loop, that is where the essence of Data science lives. The goal of the data scientist is to turn data into value.
As we can see, “The data science process” is very similar to the scientific method:
Both these processes emphasize iteration. The “science” in these processes seem to manifest in the question-test-analzye loop we repeat again and again. By iterating through these steps data scientists are able to craft the right questions.
So what is the role of a data scientist?
Well, as much I want to say it depends (a phrase that become the top 5 most frequently used to in my vocabulary since I began to data-science) and leave it open ended, I like the way Alex Castrounis, the founder of InnoArchiTech, put it in his article; What Is Data Science, and What Does a Data Scientist Do?
This definition is somewhat loose since there really isn’t a standardized definition of the data scientist role, and given that the ideal experience and skill set is relatively rare to find in one individual.
This definition can be further confused by the fact that there are other roles sometimes thought of as the same, but are often quite different. Some of these include data analyst, data engineer, and so on.
Here is a diagram showing some of the common disciplines that a data scientist may draw upon. A data scientist’s level of experience and knowledge in each, often varies along a scale ranging from beginner, to proficient, and to expert, in the ideal case.
He goes on to talk about, The Pillars Of Data Science Expertise:
- Mathematics (includes statistics and probability)
- Computer science (e.g., software/data architecture and engineering)
- Communication (both written and verbal)
Based on these pillars, a data scientist is a person who should be able to leverage existing data sources, and create new ones as needed in order to extract meaningful information and actionable insights. These insights can be used to drive business decisions and changes intended to achieve business goals.
More on the Data science process
When confronted with a data Science problem, the description of your task can be quite ambiguous. It is up to you as a data scientist to translate it into a concrete problem to solve.The data science process involves several steps :
- Frame the problem: Who is your client? What exactly is the client asking you to solve?
- Collect the raw data needed to solve the problem: Is this data already available? If so, what parts of the data are useful? More data? What resources are available?
- Process the data (data wrangling): Real, raw data is rarely usable out of the box. There are errors in data collection, corrupt records, missing values and many other challenges you will have to manage. You will first need to clean and convert to useable form.
- Perform in-depth analysis (machine learning, statistical models, algorithms): This step is usually the meat of your project, where you apply all the cutting-edge machinery of data analysis to unearth high-value insights and predictions. I previously wrote an article about Machine learning Terminologies demystified, (shameless self promotion) This also where a lot of the mathematical, and programming skills are called upon.
- Communicate results of the analysis: All the analysis and technical results that you come up with are of little value unless you can explain to your stakeholders what they mean, in a way that’s comprehensible and compelling. The art of Story telling is a critical skill, you are expected to paint a compelling narrative.
Hopefully you made it with me to the end of this introspection. Data Science is multi- disciplinary field that requires a practitioner to translate between technology and business concerns. It is a very valuable and challenging career path (forever learning). It’s not enough to just choose the right algorithms, or be an expert programmer, or mathematician. A data scientist is one who can interweave all the necessary disciplines , while artfully telling a compelling story to provide business solutions.