Women in Big Data Global



Deep Dive – IoT on Google Cloud Platform

Women in Big Data

By Srabasti Banerjee,

August 27, 2020


On August 19, 2020, WiBD sponsored an interesting discussion to learn how IoT can be implemented on Google Cloud Platform. The goal was to help us gain an understanding of the underlying tools and technologies being used in the background, like Google Cloud IoT Core, Pipelines, Big Query, Dataflow, Machine Learning, Dataprep, and Data Studio.

GCP has a vast verity of tools and technologies that cater to all kinds of use cases – real time, batch as well as streaming.

In current times, when there is an explosion in use of digitals and sensors, Google provides a very unique proposition of being able to track devices and sensors and collect data and insights on it. GCP allows an opportunity to scale up or scale down, as needed, as per situation, without having to worry about infrastructure costs. Being able to keep abreast of new scientifically advanced sensors, as well as traditional sensors, helps ensure that deployment is uniform and not impacting existing services, considering security risks and concerns in this new era.

Data being gathered from the devices and sensors is sent to the Cloud using Cloud IoT Edge. Data processing and cleaning can be done on the edge. Google offers Machine Learning capabilities at the edge using Edge ML.

Data is ingested and processed using Cloud IoT Core – a service managed by Google that combines MQTT protocol with highest level of security.

Like an assembly line in a manufacturing workshop, Pipelines are the key means used to extract data from different sources and formats, perform transformations, aggregations, compute business calculations and add business value to the data. Google provides an option to use – Cloud Functions, Cloud Dataflow and Apache Beam based pipelines.

Using Cloud Functions, one can control sensors and devices at the edge, preventing accidents and losses from being incurred. This can be used for different domains and industries like healthcare, manufacturing, retail.

As an example, a power plant shutdown can be avoided if we are able to predict the time when a vital machinery part in the power plant processing process is about to fail. Replacing it before its lifecycle expiry can result in no disruption in power supply due to equipment breakdown, saving the company from incurring losses.

Cloud Pub/Sub is a scalable, in-order message delivery with push and pull modes, using publishers and subscribers allowing synchronous, cross-zone message replication and per-message receipt tracking ensuring reliable delivery at any scale. One can use it for Batch and Streaming scenarios and different use cases like performing stream analytics.

Cloud Dataflow provides the flexibility to create pipelines using different Dataflow Templates offered by Google. Underlying technology is Apache Beam, which was initially developed and incubated by Google and open sourced eventually for others to use. One can develop custom templates and implement for Data ingestion and Processing as well.

Data can be stored in Google provided Cloud Storage in diverse formats like Json, Orc, Avro, Parquet. One has an option to choose from different kinds of Data source options on GCP as per usage. It is very important to pick one that works best, considering technical environment and business functionality, to avoid performance issues.

Cloud BigQuery is the datawarehouse on GCP that provides a place for Business Intelligence and Analytics to be built on top of the data. Hadoop and Spark workloads can be run using Dataproc and workflows can be scheduled using different open source technologies like Airflow. Airflow allows many operators that can be used to customize as per different instances for both batch and streaming processes.

Machine Learning can also be applied using Auto ML and Tables and Big Query ML. Google offers the flexibility to use custom models as well as implement models based on recommendations by Auto ML – a fully automated ML recommendation system. As an alternative to use services of a trained data scientist, startups and small businesses can use Auto ML to derive value out of the data using BigQuery ML.

Cloud Dataprep is an intelligent cloud data service, from Trifacta, that visually allows users to explore, clean and prepare data for analysis and Machine Learning. It provides visualizations and data distributions, and provides recommendations to users, using Machine Learning, for cleaning data efficiently.

Just like we prepare and cook food to suit one’s palate, along these lines, one can use recipes in Cloud Dataprep to transform and perform Extraction, Transformation and Loading of data to Datasets and Tables on Google BigQuery.

Google Data Studio is a Data reporting and Visualization tool that helps create insightful dashboards and share them with a click. For example, a Sales or Marketing manager can develop reports and send them to different customers for tracking metrics on dashboards. It integrates with different data source Connectors to Data Studio.

Thanks to Coursera for allowing everyone to take free courses during the COVID pandemic! During these tough times in the current job market, it helps job seekers and professionals to stay updated with current trends and learn new technologies.

A big shout out to Google for making such wonderful hands-on courses like Coursera Learn IIoT on GCP !! Concise and bite-sized courses like this help professionals stay up to date. Providing a practical aspect and being able to apply it in use cases in the different tutorials helps give a better understanding of the different tools and technologies of the Google Cloud Platform.

Last, not the least, I would like to thank Women in Big data for giving me an opportunity to share my knowledge and experience on Google Cloud Platform. I would like to thank my mentors Radhika RangarajanTina Tang, and Shala Arshi for their guidance all the way here. I also appreciate Shuchi Rana, Regina Karson, and and Stella Mashkevich for their help in getting the ball rolling to share my knowledge in GCP and Big Data with everyone.

I would like to acknowledge and appreciate the WiBD Executive team for guiding us, as well as volunteers and all the different chapters all over the globe for helping with our mission to inspire, connect, grow and champion the success of Women in Big Data! We have been able to achieve a greater audience and spread the word with everyone’s help and support.

Thanks to the training resources, many persons are able to increase their knowledge and get a taste of different areas and technologies to help keep updated with current trends!

Recording of the session is available at Deep Dive – IoT on Google Cloud Platform.

Related Posts