During the past two years of college, I have had a lot of friends who would come up to me and tell how they were learning Big Data Analytics and how they wished to pursue that in the future as well. At first, the word seemed fancy and adding to that people would sometimes refer it to as Data Science which would sometimes make it seem like its a whole new branch of computer science. But over time I have had the chance to know more about it and eventually had the opportunity to work as a Data Scientist myself, and trust me, it is nothing as I thought it was compared to when I first heard it. So for all those people who are wondering Big Data is and what exactly it means to be a Big Data Analyst, here is my understanding of it, But first, let’s answer another more genuine question some of us might have, what does Big Data actually mean?
Big Data refers to the large amount of data, we are capable of extracting from any raw form of data. To be precise this sudden boom in the amount of data we were generating is called the Big Data Age.
To understand this lets take the example of some object X ( remember X is an object type , like a class ). X has two components as an object — 1. attributes 2. characteristics. A characteristic can be defined as its behavioural aspect more like the effect and the attribute like a quality, more like the cause. Hence attributes cause an X object’s characteristics.
Since X as an object is long known, extracting data about all its characteristics is possible, however X as a object would give the user, information about its attributes 1,2,3 ( let’s suppose ). X simply with it’s attributes — 1,2,3 is said to be in it’s raw form. Now before the Big Data Age, we could only assign and extract attributes 1,2,3 from X. But now we can assign and extract 1,2,3 along with an additional 4,5,6,7,8 and derivatives of all these 8 attributes, which leads to secondary attributes and so on.
Note: Throughout the post the word ‘extracting’ doesn’t mean the literal extraction, it simply means that we can find it’s values.
Now the job of a Data Analyst — the primary goal of a Big Data Analyst is to get some useful information from these raw objects X ie, understanding the relation between the characteristics ( effect ) and the attributes ( cause ). It’s pretty obvious all X objects aren’t the same and since we wish to study X as an object, it is important that we try to understand all it’s variants. So here come our three big tasks
Gathering data — Finding all variants of X — A very important step, because sources may not always be straightforward, and extracting these raw forms may not be easy too and can sometimes be painfully time-consuming.
Cleaning it and organising it — extracting all the characteristics, attributes and generating additional secondary attributes from the various X objects we have gathered. Then we must organise this information so that it is easily readable.
Getting some useful inferences from it — And here is the final task, finding the relation between the characteristics and attributes.
A lot can be done with the data we have collected, such as studying data trends over time, making statistical models/visualisation to try and understand these X objects in detail and maybe finally making predictive models about these X objects and there attributes/characteristics.
But although we focus a lot on trying to understand different statistical models in order to apply them, we often forget that the quality of the data we have gathered plays a direct role in how our models will be, thus it becomes important to research and understand what data we are pulling and how it may be useful. Another key factor that matters is the amount of data because although we do call it Big Data, it doesn’t always mean we have to collect a large number of attributes in order to make our models — A small set yet smart set would be, of course way better than a large number of less reliable features.
But overall, being a Data Analyst simply boils down to these 3 steps and how fast and effiiently this can be done is simply yet another coding skill one must learn and develop overtime. I hope this gives a good idea to young minds as to what it means to be a Big Data Analyst.
cheers !
- Jayakrishna Sahit