Women in Big Data Global



NLP, Tools and Technologies and Career Opportunities

Women in Big Data

By Rupa Gangatirkar,

December 13, 2023

NLP blog image

The Bay Area Chapter of Women in Big Data (WiBD) hosted its second successful episode on the NLP (Natural Language Processing), Tools, Technologies and Career opportunities. The event was part of the chapter’s technical talk series 2023.

The Technical Talk Series focuses on Technical Skills, bringing awareness about a technical topic, sharing knowledge, and ways to learn/enhance required skills, thus linking it to career development.

We are hearing about NLP, LLMs, ChatGPT and Generative AI a lot ! On December 5th, 2023, Dr Sonal Khosla took us on a journey from where it all began to the most recent Generative AI. The goal of the talk was to learn about the basics of NLP (Natural Language Processing), how NLP is done, what is LLM (Large Language Model), Generative AI and how you can drive your career around it.

Dr Sonal Khosla (Speaker) holds a PhD in Computer Science with a specialization in Natural Language Processing from Symbiosis International University, India with publications in peer reviewed Indexed journals. With a robust educational foundation in Computer Science, Mathematics, and Statistics, she brings over 12 years of expertise across Research, Academia, and Industry. Currently based in Germany, she possesses extensive experience in developing data-intensive applications leveraging NLP, data science, and data analytics. Her proficiency extends to domains such as supply chain management, mobile technology, and EdTech. She is currently working as a Researcher in OdiaGenAI which seeks to harness the potential of AI in the development of Gen AI and LLM based technologies and solutions for low resource Indic languages.

Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps computers understand, interpret and manipulate human language. Computational Linguistics is rule based modeling of natural languages. NLP programming combines the fields of linguistics and computer science to decipher language structure and guidelines to comprehend, break down, and separate significant details from text and speech. It automates the translation process between computers and humans by manipulating unstructured data (words) in the context of a specific task (conversation).

Benefits of NLP ?


NLP has many applications – Machine Translation, Text Summarization, Searching, Question Answering, Named-Entity Recognition, Parts-of-Speech: (POS), Clustering, Sentiment Analysis, Text Classification, Chatbots and Virtual Assistants.

Process of NLP:

What is a language model and what are Large Language Models (LLMs)? A language model is a probability distribution over sequences of words. Given any sequence of words of length m, a language model assigns a probability P(w1, …, wm) to the whole sequence. Language models generate probabilities by training on text corpora in one or many languages. The probability intuitively tells us how “good” a sequence of tokens is.  There are two types of language models – Statistical Language Model (uses probability distribution to predict the next word) and Natural Language Model (uses neural network to predict the next word).

A Large Language Model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.LLM works on the Transformer Architecture.

Sonal talked about applications of LLMs as in – Content generation, Part-of-speech (POS) tagging, Question answering, Text summarization, Sentiment analysis, Conversational AI, Machine translation and Code completion.

Image Credit: Neebal Technologies

LLMs are not free from challenges. There are ways to mitigate them, ways to use the LLMs so as to minimize the challenges/shortcomings of the LLMs. Fine tuning the pre-trained LLMs is widely used instead of building LLM from scratch.  Using prompts and creating intelligent, purpose built prompts aka questions you can achieve desired performance/accuracy from the LLM.

Reinforcement Learning with Human Feedback (RLHF) is a popular technique to fine tune LLM and address some of its challenges in generating incorrect answers/bias etc.

Retrieval Augmented Generation (RAG) is another technique/a framework for building LLM powered applications that can retrieve information from external data sources. It handles some of the limitations of LLM models.  It improves the relevance and accuracy of the completion (output of LLM).

Bias, Explainability and privacy are the major ethical issues of AI. With issues also come the

challenges.  Sonal discussed the main challenges of NLP being ambiguity, context understanding, data quality, bias and fairness, multilingual support, handling of sensitive data, and real world adaptability.

What is the future of NLP? Multimodal NLP, Few-shot learning, explainability, ethical AI, more languages supported, industry applications and integration with robotics are some key areas where NLP based applications will expand and dominate.

There are various career paths and roles in NLP and Generative AI today. NLP Scientist, Linguist, NLP Data Scientist, NLP ML Engineer, Conversational AI Developer, NLP Consultant, Prompt Engineer to name a few. Similarly, there are various training options available to upskill in those areas. Coursera, Udemy, DeepLearning.AI, Udacity, DataCamp, Stanford University and many online and on campus courses, degrees, certification programs are offering options to learn and gain certification in NLP, AI, and Generative AI.

Attendees could not agree more on how informative the talk was for them. One attendee shared – “I found it to be a very informative high level overview on how AI/neural networks/LLMs work. AI is transforming the tech industry and I’m excited to learn more about this industry as a whole. In particular I know that how we collect, manage, and clean data to be consumed by these systems can greatly impact the overall success of these systems. I look forward to attending future events hosted by WiBD”.  Training resources shared by Dr Sonal were very helpful as well.

The Bay Area Chapter’s Co-Director, Soumyasree Vinod supported and provided valuable inputs to host the technical talk. Soumya introduced the Women in Big Data organization and its mission to the audience. Soumya connected WiBD’s goals of Connect Cultivate Champion with how we do it through various programs, trainings, presentations, mentoring, new hackathon initiative and providing a platform to our members to connect, collaborate, learn and grow.

Bay Area Chapter’s core team member,  Rupa Gangatirkar arranged the tech talk NLP session.

Thank you to Bay Area Chapter’s Co-Director, Erika Luncford for providing support to promote the event. We thank all the core team members of the Bay Area Chapter and WiBD Executive Board member, Shala Arshi for her support.

Listen to the event recording here and join Women in Big Data to have a companion and support on your NLP journey.

Related Posts