Careers: Data Science

Lately I have been getting a lot of questions about this area of study.  At least it shows that people are very keen to learn new technologies. As a not so subtle hint, I sometimes feel that our universities let us down in not offering short courses for many of these new areas, but that’s a discussion for another day! Anyway here is the question that finally motivated me to write this article:

Hi my name is Evans Chikuni. I have a degree in Computer science which I attained at HIT. I have a keen interest in programming but am also fascinated with Big Data analytics and Business intelligence, however after inquiring for courses in BI at one center I was told its $US1225 and I can’t afford that. What certification path would you recommend and what advice you can give?

What is Data Science?

Perhaps to better understand what Data Science is, we should look at the context to which it is applied. Over the many years, various institutions and organisations have been collecting data about their clients and products among other things. Imagine the amount of data a bank has on transactions in all its history. Or the amount of data that has been collected through census exercises over all the years. What about the data that is held by Telecoms companies on calling durations, times and patterns. Health Data held by hospitals and surgeries. Social Media data held by organisations such as Facebook, twitter and so on.  This is all Big Data.

Data science refers to the mechanisms, skills and technologies that can be applied to data so as to extract patterns, predictions, anomalies and make recommendations in business. Essentially data is collected, cleaned, analysed, and passed through algorithms in a bid to extract information from the data. So we start with data and end up with information. Armed with information, the organisation can then make the correct or even real time decisions.

What is the difference between Business intelligence and Data Science?

The difference here is a bit technical but at a basic level, BI focuses on identifying relationships between variables and business reporting from a historical view. BI generally explains to us what happened during the period in question. So here Data is stored in structured databases and the questions to be asked are already known, so we are just using the data to get answers to those questions.

 Data Science on the other hand focuses on predictive analysis, at its core are three values known as the three V’s of Data Science: Volume, Velocity and Variety. What does this mean? Consider for example the new self-driven cars. They have sensors everywhere to avoid collisions with other cars. They need to know when they get to intersection when to stop or proceed whether there are robots or not. The amount of data being processed here is voluminous, it has high velocity because it is real time data and the data is of different forms (variety) so not really suitable for structured databases. In Data Science we don’t necessarily have the questions to ask, we just have the data and the technologies and methods involved to give us new insights (Big Data Analytics).

Other example questions might be:

  • What will be the performance of an outlet placed in a new area
  • Which customers are going to stop subscribing to our product
  • Where should we place an ad on a web page so that it gets clicked
  • When will this part on a machine in the plant fail?
  • Which shares to buy on the stock market

 Data Science relies on statistical methods, machine learning and technologies like NoSQL and Hadoop. Generally for both BI and Data Science you will need:

  • Programming skills
  • Data access skills ( SQL Databases and other sources)
  • Statistical Skills and Communication Skills.

In the next article we will look at the qualifications that you can acquire for your journey.

Author: Edmore Munedzimwe