Women in Big Data Global



Machine Learning with Python and Scikit-learn

Women in Big Data

By Anindita Bhattacharjee,

May 12, 2017


On April 27, 2017, an enthusiastic crowd of sixty-one women turned up as Cloudflower’s Lukas Biewald conducted a class on Python and Scikit-Learn. Sponsored by the Women in Big Data Forum, the class took place at the MapR campus in San Jose, California.

Lukas Biewald presents Machine Learning and AI to the Women in Big Data Forum

Lukas is the founder and Chief Data Scientist for Crowdflower and an expert in the field of Machine Learning and AI. Machine Learning has become an important part of Big Data revolution and Scikit-learn is a popular library in Python for Machine Learning, data analysis and data mining.

MapR Technologies is an enterprise software company headquartered in San Jose, California that provides an Apache Hadoop distribution, a converged data platform and more. Combining analytics in real-time with operational applications, its technology runs on both commodity hardware and public cloud computing services.

Lukas skillfully introduced Machine Learning through examples and hands-on exercises. He also touched on Deep Learning, one of the most interesting areas in ML. The class exchanged comments and help through Slack, a chatting app. The participants installed Scikit-learn along with associated libraries Numpy and Pandas on Windows/Linux platforms on their laptops, and through hands-on exercises learned about the basics of Machine Learning, different categories of ML, including Regression, Classification, Clustering, Anomaly detection, AI/ Robotics and more. They also learned the steps involved in data processing (collection and selection) and modelling of data to make predictions.

For the hands-on exercise, the participants built a real ML classifier to judge emotion about brands and products. This included:

  • Loading data from files.
  • Feature extraction and data filtering.
  • Modelling of data – Choosing the right algorithm.
  • Constructing ML Classifier.
  • Making and testing predictions, by splitting data into test and train sets.
  • Cross validation by dividing train data into sections.
  • Pipeline all the above steps.

Lukas also explained the basics of Deep Learning and the scenarios where Deep Learning should be used. He introduced Tensorflow and Keras, popular Deep Learning libraries.

Thank you, Lukas, for your expert guidance to  this difficult subject. You made it very easy for the participants to quickly understand the subject and come up with interesting questions and views. And thanks to the participants for their obvious interest and enthusiasm.

Women in Big Data thanks MapR for hosting this event and providing women with an opportunity to step into Machine Learning.

Above all, the event has been made successful by the hardworking members of our training team at Women in Big Data. We thank Samina Partapurwalla for her efforts and her initiative in setting up the training workshop with help from Alicia Alvarez from MapR, who was responsible for the logistics at MapR. Thank you both for making this event successful!!