Perceptyx uses natural language processing (NLP) approaches for theme detection, intent detection, and sentiment analysis, all of which contribute to our comment analytics capabilities. These capabilities can help you uncover insights that help your organization make better decisions.
This article walks through:
- What Does Comment Analysis Do?
- Our Approach to Comment Analysis
- Motivation and Industry Comparisons
- Default Themes
- Lexical Theme Sets
- Intents Set
What Does Comment Analysis Do?
Open-ended, verbatim comments provide a rich source of information. They reflect the direct, unfiltered voice of the respondent. Reviewing comment data has traditionally been a manual process in which individuals had to read through all comments to gain an understanding of what respondents were trying to express. While this approach is time-consuming for managers, it is even more so for HR and people analytics leaders, who face the task of conducting organization-wide analysis and reporting. This is where AI tools come in, to make an otherwise impossible task into an easy one.
Theme detection, intent detection, and sentiment analysis together address a critical issue facing our customers - how to understand what issues employees are talking about, zoom in on the most critical ones, and determine appropriate action items - without having to read each and every comment.
Perceptyx’s comment analysis feature provides a way for users to parse through the sea of comments in an informed manner, navigating the data via ad hoc combinations of topic filters (what the comments are about) and intent filters (how the topics are being talked about) to ultimately uncover what is important to the respondents. This could be comments offering praise for certain benefits, or comments offering suggestions about how management can improve.
Our Approach to Comment Analysis
Perceptyx provides a three-prong approach to comment analysis:
Theme Detection
Intent Detection
Sentiment Analysis
Theme Detection
Theme detection refers to a set of methods for mapping comments into a predefined set of topics. Out of the box, we offer two main theme detection approaches:
Lexical Theme Detection (Keyword Matching):
A highly flexible method which allows quick creation of new/custom themes.
Supervised Theme Detection (Advanced Theme Discovery):
A highly accurate, multi-lingual, deep learning method built on a vetted set of predefined themes.
Lexical Theme Detection is a flexible way to detect themes in comments. With this method, we devise themes to be searched for, based on a list of keywords and phrases. This approach enables customers to customize, update, create, or delete the themes and keywords applied to their data.
An individual comment can be aligned to multiple themes. Perceptyx provides a curated set of 38 default themes related to employee engagement (e.g., employee benefits, safety, trust, employee recommendations and responses, etc.). These engagement themes, and other theme sets noted below, are subject to expansion, based on new issues, areas of focus, and feedback from internal and external stakeholders. Additional lexical theme sets are also available to dig further into topic areas of focus, such as:
Diversity, Equity, & Inclusion (DEI)
COVID-19, Work from Home (WFH)
Middle Manager
Industry-Specific Theme Sets (e.g.,. Healthcare, Retail, and Manufacturing)
Patient Safety (e.g., medication errors, etc.)
Hot Words / Hot Topic themes (to uncover issues of concern)
Supervised Theme Detection is a machine learning / AI approach for enhancing the accuracy of matches. Rather than relying on a list of keywords and phrases for text matching, this approach uses a set of training data, in which the algorithm is shown verified examples of text mentioning themes, and the algorithm then learns to understand and detect language pertaining to different themes. At Perceptyx, our supervised theme detection model is trained to detect our curated set of 38 employee engagement related themes, and supports over 100 languages, without the need for translation into English.
Intent Detection
Once comments are categorized into themes, a user may be interested in what the discourse is around those themes. For example, knowing that “Benefits” has been mentioned in a large number of comments is a useful insight; however, are employees generally satisfied with company benefits, or do they find the benefits to be lacking? This is where intent detection comes in - to help identify the motivations behind the comments.
Independent of the theme detection approach, we categorize each comment into a set of 5 intents:
Approval/Praise - Highly positive
Wants/Preferences - General nice-to-haves
Should/Suggest - Specific recommendations
Needs/Concerns - Sharing pain points
Angry/Unfair - Highly negative
The intents categories are inspired by a similar model in cognitive psychology used to measure and rank attitudes toward a topic or concept. Each “intent” captures a specific framing or emotional emphasis. Intent detection is not done along a spectrum, so a single comment may be matched with more than one of these categories. The aim is to identify emotional indicators and satisfaction / action indicators, such that users can easily spend time on only the most impactful themes, or the most impactful subset of comments within a given theme.
Perceptyx’s Intent Detection Model is a machine learning / AI approach that is trained on the same backbone as the Supervised Theme Detection model, and thus offers the same high degree of accuracy, along with out-of-the-box multilingual support.
Sentiment Analysis
Sentiment Analysis is a model that aims to help better understand the feelings and opinions expressed in a comment about a certain subject (e.g,. a person, a process, an initiative, etc.). The model does this by mapping each sentence within a comment into one of three non-overlapping categories: negative, neutral, or positive.
Perceptyx’s Sentiment Analysis model is actually a set of advanced deep learning models, each trained from one of three large, open source datasets. The models are “averaged” together to provide one final verdict per input sentence. Incorporating models trained with as wide-spread a set of input data as possible avoids bias towards any topic or specific industry. Additionally, the models are trained on what are called “multilingual embeddings,” so we support sentiment analysis in more than 100 languages, without the need to first translate comments into English (which may introduce unnecessary noise).
Model Training and Validation
Our themes and intents are built in collaboration with subject matter experts and other stakeholders to ensure that we provide themes that are relevant for use, are strongly defined in their scope, and contain useful concepts within them that can be broken down for further analysis, if needed. Our theme / intent detection data models are trained on a proprietary dataset of human-labeled comments, in which a human being has read each comment and denoted which theme(s), if any, appear in a given comment. The training data have been labeled by subject matter experts on human behavior, survey, and text data; and these data cover the range of typical employee survey type questions and topics.
To train our new model, we use a category of models called transformers, which has become very popular in the past few years. They work especially well with text data. Moreover, we use a specific, custom architecture that provides fast inference speeds and works well to classify text data from multiple languages. The architecture is as follows:
During training, we hold out a subset of the data as a validation set, used to ensure that the model is performing well after training is complete. There are many different ways of measuring performance of models tasked with assigning one or more labels to a given input, but one of the most common measures is an average macro F1 score. The general intuition is that the higher the F1 score, the better the model is able to jointly minimize false positives (inaccurately predicting a theme when there isn’t one present) and false negatives (inaccurately predicting no theme when there is one present). We average the F1 scores for all possible themes and intents to get our final score, which is how we’re able to ultimately compare the “goodness” of our models. See below for some of the relevant results. The precision indicates the model’s ability to minimize false positives, while the recall indicates the model’s ability to minimize false negatives.
The sentiment model is trained under exactly the same regimen and deep learning architecture. Across our three sentiment analysis datasets, we see an average macro F1 score of 0.94.
Motivation and Industry Comparisons
Our 3-prong approach provides a full set of complementary methods, which provides users with a comprehensive approach to comments analysis. Our approach is in line with the trends and best practices of the employee survey analytics industry, and all three approaches provided by Perceptyx match the expected norms of methods used within the industry.
The industry has historically made use of lexical methods for theme detection, usually providing a base set of themes related to employee engagement, along with a function allowing users to create custom themes. Thus, the lexical method is expected to be available, as a norm in the industry. Where we differ from the industry in regards to lexical themes is that we have expanded beyond simply providing employee engagement themes by creating new theme sets, as described above, covering topics such as DEI, Patient Safety, specific industries, WFH, etc. This provides our customers with off-the-shelf theme sets that cover a multitude of topic areas unmatched in scope by our competitors. This method also allows for maximum flexibility in theme detection, as customers can readily tailor fit the themes and keywords to meet their needs, thus allowing quick updates to address new issues.
However, the drawbacks with this approach are that: (1) due to its reliance on keyword / phrase matching, it is possible for false positive matches to occur, simply because the word or phrase is being used in a different context than what the theme is looking for, and (2) it is limited to the list of themes that are expressly provided to be searched for, meaning that new issues could be missed.
Thus, we have introduced three state-of-the-art deep learning models into our product line to increase the accuracy and precision of theme detection (supervised theme detection) and allow for two alternative, complementary avenues of comment exploration (intent detection and sentiment analysis).
Default Themes
Benefits
Bureaucracy
Career Opportunities
Commitment
Communication
Compensation
Continuous Improvement
Culture
Customer Experience
Departmental Effectiveness
Discrimination and Harassment
Diversity and Inclusion
Efficiency
Favoritism
Feedback
Focus and Goals
Global
HR and Recruitment
Job Security
Learning and Development
Legal Issues
Management
Market
Overtime
Performance
Performance Management
Products and Services
Promotion and Advancement
Recognition
Resources
Safety
Social Responsibility
Strategy
Survey Results
Teamwork
Trust
Turnover
Work Life Balance
Workload
Lexical Theme Sets
Default Engagement Themes: Provides an overview of the most common topics / issues that are monitored across industries.
Diversity, Equity & Inclusion: A special topic theme set that examines DEI issues within an organization.
COVID-19: A special topic theme set that examines the impact of the Coronavirus pandemic within an organization.
Work from Home: A special topic theme set (includes COVID-19) that examines the impact of working from home, including issues / challenges with organizational transition.
Middle Management: A special topic theme set (includes COVID-19) that examines the impact and concerns of middle management including issues / challenges to adapting across an organization.
Return to Work: A special topic theme set (includes COVID-19) that examines the impact and concerns regarding returning to the traditional work environment.
Healthcare: A special industry theme set that partitions departments / roles across the healthcare industry.
Patient Safety: A special industry theme set that partitions topics related to patient safety in the healthcare industry.
Retail: A special industry theme set that partitions departments / roles across the retail industry.
Manufacturing: A special industry theme set that partitions departments / roles across the manufacturing industry.
Hot Words and Topics: A special theme set for detecting offensive word use and identifying hot topics that consist of issues that merit elevated concern.
Intents Set
Angry and Unfair
Needs and Concerns
Should and Suggest
Wants and Preferences
Praise
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article