Women in Big Data – East Quarter 1 Newsletter
April 7th, 2017
We are happy to announce that the Women in Big Data East Coast Chapter has a new satellite launched by Rachel Silver in Pittsburgh.
The Pittsburgh satellite chapter of Women in Big Data East boasts ~45 members and is gearing up to hold a welcome event in late spring. Pittsburgh is currently at the center of the robotics and autonomous vehicle boom thanks in part to Google, Uber, Carnegie Mellon University, and Ford Motors.
WiBD is a volunteer-driven organization. It’s amazing to see how many women appreciate and recognize the cause, and are coming forward to devote valuable time to support the forum. WiBD announced the establishment of two new chapters earlier this year.
US Mid-West with Maleeha Qazi from American Family Insurance as chapter lead.
We have a full calendar of activities planned through the end of the year. See below for upcoming events and links to sites for up to date information on other events happening in your area.
Spotlight Event: AWS Workshop
When: May 2, 2017
Where: Ballston, VA
Amazon Web Services and Slalom consulting have partnered to bring a special AWS Big Data Workshop for the Women in Big Data community. The workshop will include presentations and hands-on labs and is a great opportunity to learn about AWS offerings in the Big Data space and how they apply to various use cases. This event is open to all skill-levels, though basic SQL knowledge will be helpful. The workshop will be held in AWS offices in Ballston, VA. For more information, visit our Meetup here and don’t forget to register via the registration link in the description. Seating is limited and is on first-come-first-serve basis.
To get up to date information on our events, please join our LinkedIn Group and the local Meetup in your area.
A Women in Big Data networking and lunch event was held in February at the Spark Summit East 2017 conference. Welcome remarks were given by Gunjan Sharma, director of the WiBD East Coast Chapter, which encompassed an overview of women and the big data landscape. Research suggests that by 2018, big data talent needs will exceed supply by over 50% and Gunjan highlighted the untapped potential in the talented resources that women could provide.
Attendees then heard from keynote speaker Kavitha Mariappan, Vice President of Marketing at Databricks. Kavitha’s talk revolved around the “leaky pipeline” problem, a metaphor often used to describe and explain the lack of representation of women in leadership positions in the tech industry. Currently, there is a 50% decline in the number of women represented from entry to executive levels in tech jobs. Kavitha attributes this staggering retention statistic in part to a shortage of female role models and mentoring opportunities. This shortage also results in a lack of understanding of a clear career path and a hesitancy to participate in office politics to climb the corporate ladder. Kavitha also points out the general tentativeness of women to bring attention to themselves. Women can be less likely to self-promote or take opportunities for high visibility projects or promotions, which could be caused by a perceived lack of skills or experience. Overall, Kavitha advises all to combat adversity with diversity. She also encouraged women to be proactive and ask for what they deserve while keeping true to themselves and embracing their “inner girl”. Click here to watch Kavitha’s full talk.
Our final discussion for the evening was hosted by Donna Fernandez, COO of MetiStream Inc. on the State of Big Data Innovation with panelists Nick Dimtchev (IBM), Julie Greenway (Bose), Ziya Ma (Intel) and our very own, Gunjan Sharma (Capital One). The panel discussed how Hadoop and Spark technologies have evolved over the past few years and why each of them has their own place in the industry. They touched upon the challenges faced by new entrants, companies and people, into this industry given the plethora of tools and tracks available in the market. On the question of what advice to give to one’s younger self, everyone unanimously agreed to the importance of a continuous learning mindset and networking! Click here to watch the discussion.
I started my entry-level data engineering role in mid-July and joined Women in Big Data at the end of last year. For the past few months I have been working with Snowplow, an open-source project for constructing an event-based pipeline, and I’m excited to share my experience with it.
Snowplow is an end-to-end event data pipeline platform that will collect raw data from an “event”, add real-time or batch processing, and offer to load the data into various storage options. For clarity, an “event” that generates data can be a click, scroll, or other actions made on a mobile application, webpage, or any system being tracked. The Snowplow pipeline is broken down into several components which helps entry-level engineers like myself understand how each part is needed for data ingestion. A data ingestion pipeline usually consists of collection, validation, enrichment, and load phases. In my experience, the Snowplow documentation is fairly straight forward. If you are “data-” driven or driven to obtain data, just follow the set-up guide and start building!
Why implement a Snowplow pipeline? When thinking about this question, I circled back to a common answer: it depends on your requirements and the skill or experience of your team. If you are a savvy pipeline engineer and have certain needs, make your own assessments with the technical documents provided. If you are new to pipeline architectures and related technologies, I would give Snowplow a serious consideration. Generally, most pipelines are built by piecing together different technologies. With Snowplow, you can download their pre-built components (jars) and build a functional pipeline to learn about pipeline architecture in general. Once you start tinkering with different components and technologies, it is definitely possible to swap out parts or add new features. Like any open-source project, contributing and improving the pipeline is always welcome. So feel free to jump right in and play with the Snowplow pipeline! There is a lot to learn, but it’s worth plowing through.