Women in Big Data Global



From Data to Decisions: Deep Dive into Workshop Learnings

Women in Big Data

By Surekha Reddy,

April 17, 2024

blog image 4-13

A Big Thank You to Professor Gudigantala!

A special shoutout to our esteemed workshop leader, Professor Naveen Gudigantala. His expertise in Artificial Intelligence and Machine Learning and engaging teaching style made the workshop an enriching experience. His ability to bridge the gap between complex topics and practical applications was truly invaluable.

We would also like to express our sincere gratitude to the University of Portland, particularly the Pamplin School of Business, for their collaboration in hosting the workshop. Their willingness to share their expertise and provide a well-equipped on-campus room for the workshop was instrumental in its success.

Want to revisit the learnings or share them with someone who missed out?

Check out this link for the video recording of the workshop

Check out the link for workshop notes.

The “From Data to Decisions” workshop provided a fantastic foundation for understanding how statistics bridge the gap to powerful machine learning applications. Let’s revisit the workshop’s key learning objectives and delve deeper into the unpacked concepts.

Learning Objectives Recap:

  • Paradigms in Data Science: We explored the two main paradigms in data science: explanatory modeling (using deductive approach through the use of theory testing) and predictive modeling (inductive approach through the use of data and models).
  • A/B Testing Fundamentals: We learned the essential aspects of A/B testing, an experiment where two variations of a process are compared to determine which performs better.
  • The Central Limit Theorem Demystified: We tackled this fundamental concept, which states that under certain conditions, the sampling distribution of the mean approaches a normal distribution regardless of the underlying random variable’s distribution. This allows us to make inferences about populations/processes based on samples.
  • Hypothesis Testing in Action: We learned how to formulate a null hypothesis (no difference exists) and an alternative hypothesis (a difference exists) and how to use statistical tests for hypothesis testing.
  • Exploratory Data Analysis (EDA): We unpacked the importance of EDA, the process of uncovering patterns and relationships within your data. In specific, we learned about the types of data, and looked at the appropriate charts for conducting univariate and bivariate analysis.
  • Evaluating Models: We discussed how to evaluate explanatory models (those that explain relationships) and predictive machine learning models (those that forecast future events) using appropriate metrics.


Concepts Revisited:

1. The “Science” in Data Science:

The “science” part of data science refers to the use of scientific method to extract knowledge and insights from data. Think of it as a detective story! You gather data (evidence), analyze it to identify patterns (investigate), and use these patterns to draw conclusions and make predictions (solve the case). For example, analyzing customer purchase history data can help predict future buying trends. Alternatively, one could start with a theory, create hypotheses, gather data, and test the theory.

2. Machine Learning Models: Building Blocks of Prediction

A model in machine learning is an abstraction of real-world phenomena. Models are developed and tuned using historical data to make predictions about future events.

While some models are transparent and easy to understand, others may act like black boxes, taking inputs and producing outputs. Their purpose is to learn the underlying relationships in data and use them to make accurate predictions on new, unseen data.

3. A/B Testing: Putting Ideas to the Test

An A/B experiment is a controlled experiment where two versions of a process (A and B) are compared to see which one performs better. This helps us make data-driven decisions about website design, marketing campaigns, and more.

4. Basic Statistical Tools:

  • Random Variable: A numerical outcome of a random experiment. One way to think of it is the outcome of a roll of a dice. (e.g., customer age in a dataset).
  • Probability Distribution: Describes the probabilities of different outcomes for a random variable.
  • Sampling Distribution of the Mean: The distribution of means obtained from repeated samples of a specific size drawn from a population.
  • The Central Limit Theorem: This theorem explains how the sampling distribution of the mean approaches a normal distribution regardless of the underlying random variable’s distribution, allowing for inferences about populations/processes based on samples.
  • Hypotheses Testing:
  • Null and Alternative Hypotheses: Null hypothesis (H0) proposes no difference exists, while the alternative hypothesis (Ha) proposes a difference exists.
  • Sampling Distribution of the Test Statistic: The distribution of a test statistic calculated from repeated samples.
  • Critical Region: The area in the tail(s) of the sampling distribution where the test statistic is considered statistically significant.
  • P-value: The probability of observing a test statistic as extreme or more extreme than the one obtained, assuming the null hypothesis is true. A low p-value (as compared to the alpha/or the level of significance provides evidence for rejecting the null hypothesis.

Statistical Significance: Refers to the probability of observing a result due to chance (random sampling) or due to a true effect.

5 & 6. Data Exploration: Unveiling the Story Within

The workshop equipped you with skills to analyze sample A/B experiment data and perform exploratory data analysis (EDA).

  • EDA involves techniques like:
  • Identifying different types of variables (categorical, numerical).
  • Using graphs like histograms and scatterplots for univariate and bivariate analysis, respectively, to visualize data patterns.

Understanding that correlation (a relationship between variables) does not imply causation (one variable causing another).

7. Metrics for Model Evaluation

The workshop discussed how to evaluate different types of models using appropriate metrics.

    Related Posts