Women in Big Data Global



WiBD Technical Training Workshop at MapR: Apache Drill & Apache Spark

Women in Big Data

By Fauzia Chaudhry,

May 18, 2016

Neeraja Rentachintala leading an Apache Drill session

Let’s kick off this blog post with a numbers game, shall we? The answer is: 50+ women, who also happen to be data geeks!

Doing what, you’re wondering… that is the number of women who participated in the most recent technical training session organized by Women in Big Data forum. This all-day technical workshop on Apache Drill and Apache Spark was sponsored by MapR and held at their headquarters in San Jose on May 4th, 2016.

In the morning, we learned about the essentials of Apache Drill, including these topics:
• SQL on Hadoop landscape & where does Drill fit in
• Introduction to Apache Drill
• How Drill achieves flexibility & performance – Architecture overview

Interactive demos included:
• Using Drill to query files, Hive tables & HBase/MapR-DB
• ANSI SQL functionality (including queries on JSON, Parquet)
• Working with nested data using Drill
• Using Tableau (BI tools) with Drill

In the afternoon, we learned about the essentials of Apache Spark:
Introduction to Apache Spark
• Describe the features of Apache Spark
• Advantages of Spark
• How Spark fits in with the Big Data application stack
• How Spark fits in with Hadoop
• Define Apache Spark components

Load and Inspect Data in Apache Spark
• Describe different ways of getting data into Spark
• Create and use Resilient Distributed Datasets (RDDs)
• Apply transformation to RDDs
• Use actions on RDDs

James Casaletto giving an introduction to Apache Spark
James Casaletto giving an introduction to Apache Spark

The training session concluded with a demo of how to write a simple Spark application in Java.
Thanks to these speakers from MapR for their time and effort and making this a successful training event (greatly appreciated by WiBD members):
• Neeraja Rentachintala, Sr Director Product Management at MapR – Drill Session (Overview)
• Andries Engelbrecht, Partner Systems Engineer at MapR – Drill Session (BI connection)
• James Casaletto, Senior Curriculum Developer at MapR – Spark Session

What stood out for me after the morning training session on Apache Drill, as an ex-DBA I felt a little loss of power that typically comes with the role. On the bright side, from another perspective (wearing the Analyst/BI hat), I felt empowered and equipped to be more productive in a short amount of time. I am certain that once the training session was over, a lot of us couldn’t wait to get our hands dirty with the tools that were shared with us. You can access the slides here.

Lunch networking with Ellen Friedman

Moreover, during the lunch break, we heard Apache committer Ellen Friedman speak to us about how the organizational culture is changing as the big data analytics tools evolve. We also got a glimpse of MapR culture from their HR, along with hiring opportunities. A common theme with past WiBD meetups was the presence of high energy amongst members and ample opportunity to network with each other.

A high-energy group!

Heartfelt thanks to Alicia Alvarez from MapR for being the glue that held everything together for this event on behalf of WiBD. And my fellow WiBD training committee members, especially Radhika and Yulia for all their hard work and making this technical training opportunity possible for the WiBD community members. I look forward to contributing with you for similar future events!