My Big Data Journey

I admit it. When I started my Big Data journey, I was overwhelmed,  quite easily and very quickly, by all the new technologies, terminologies and touchpoints I was encountering. Squoop, Splunk, Striim, Storm, Scribe, Hbase, Hive, Hue Pig etc., and they were Latin and Greek to me.  Ahhh!!!

It was only after my third “Introduction to Big Data” meetup that I caught myself nodding, ever so slightly, as I was finally able to distinguish a few buzz words from all the stop words on those PowerPoint slides.

In 2005, Yahoo created Hadoop. At that time it had only two components, HDFS and MapReduce. (See a brief history)  Big Data today is just in its adolescent stage, still trying to decide what it wants to be when it grows up. Yet for years, the world has been abuzz about how sexy it is going to be.

Obviously, everyone has realized the criticality of data to the future success of any business. Businesses now focus on data as an important business asset. This situation is very conducive to the coming of age of Big Data, as the technology to generate, capture, store and process vast amounts of data has become easy and dirt cheap.

We are generating a quintillion bytes of data every day. The hardware and technology to store and access this data on commodity hardware is very affordable and accessible. So it is no surprise we have a proliferation of businesses trying to address different aspects of Big Data.

The Big Data Landscape, 2017

The image above is at once daunting as well as exhilarating. “Thousands of companies will be trying to fill millions of jobs in Big Data in the very near future” does not sound far-fetched. Businesses are looking for Data Scientists, hoping one of them will be their Christopher Columbus, who will wade through their data lakes, explore their dark data and discover new lands they did not even know existed.

Where can I fit in? How does one go about preparing to become a part of this movement? I think it will depend on one’s background and what one loves doing best. The chart below puts certain terms in the right buckets. I think one needs to understand or know most terms in this chart and pick a few to specialize in.

Credit: Swami Chandrasekaran

Interestingly, some technology requirements also vary by zip code (full details in a “beautiful” Jupiter Notebook here).

Another analysis of Linkedin Job postings, by my favorite AI company, Figure Eight , yields this Data Science skills dataset that is interesting to play around with.  They have put requirements in four broad buckets: Database, Hadoop technologies, Statistical tools and programming language. One can master one tool from each bucket and be familiar with some others to get the more or less complete picture.

We are fortunate that we live in a time where so much data and knowledge is so easily accessible.  Universities have opened up their courses to anyone with internet access. You do not need a Masters or Ph.D to gain access to the best minds in this business. This is both a blessing and a curse. There are so many courses; I am like a kid in a candy shop. What do I do first?

Women in Big Data have put together a curated list of courses taken by their members and recommended by them. That I think will be a good guide. I will continue to attend meetups that demonstrate new technologies and others that deep dive into topics I know a bit about already . That’s my plan. What is yours?

Leave a Reply

d
c