Perceptyx Comment Analytics Capabilities

Modified on Wed, 30 Jul at 1:50 PM

This article provides an overview of our comment analysis capabilities at Perceptyx. Perceptyx uses natural language processing (NLP) approaches for theme detection, intents detection, and sentiment analysis. The sections below give an overview of the issues we are trying to help customers address, a discussion of our approach, how our approach compares to others in the industry, and an example case of how our products can help a customer uncover insights that help them make better decisions.

What Does Comments Analysis Do?

The open-ended, verbatim comments left by survey respondents provides a rich source of information. Truly, it is the direct, unfiltered voice of the respondent. That said, reviewing text data has traditionally been an arduous, manual process, where individuals have to read through each individual comment in order to gain an understanding of what a group of respondents are writing about. While this is time-consuming enough for direct line managers, it is even more so for HR and people analytics leaders, who face the task of conducting organization-wide analysis and reporting, and for whom reading all comments is an essentially impossible task. This is where AI tools come in, to make an otherwise impossible task into an easy one.

Themes detection, intents detection, and sentiment analysis together address a critical issue facing our users: how to understand what issues employees are talking about, zoom in on the most critical issues, then determine appropriate action items, without having to painstakingly read all the comments.

Perceptyx’s comments analysis feature provides a way for users to parse through the sea of comments in an informed manner, navigating the data via ad hoc combinations of topic filters (what the comments are about) and intent filters (how the topics are being talked about) to ultimately uncover what is important to them. This could be comments offering praise on certain benefits, or comments offering suggestions on how management could improve. In the sections below, we discuss how the comments analysis feature works in more detail.

Our Approach to Comments Analysis

Perceptyx provides a multi-prong approach to comments analysis, to provide a comprehensive solution for our customers. These approaches are:

Theme Detection
Intent Detection
Sentiment Analysis

Theme Detection

Theme Detection refers to a set of methods for mapping comments into a predefined set of topics. Out of the box, we offer two theme detection approaches:

Lexical Theme Detection

Lexical Theme Detection is a highly flexible way to detect themes in comments. In this method, we devise themes to be searched for, based on a list of keywords and phrases. This approach enables customers to customize, update, create, or delete the themes and keywords applied to their data. An individual comment can be aligned to multiple themes. Perceptyx provides a curated set of 38 default themes related to employee engagement (e.g., employee benefits, safety, trust, employee recommendations, response types). These engagement themes, and other theme sets noted below, are subject to expansion, based on new issues, areas of focus, and feedback from internal and external stakeholders. Additional lexical theme sets are also available to dig further into topic areas of focus, such as:

Diversity, Equity, & Inclusion (DEI)
Work from Home (WFH)
Middle Manager
Industry-specific Theme Sets (e.g., Healthcare, Retail, and Manufacturing)
Patient Safety (e.g., medication errors, etc.)
Hot Words/Hot Topic themes (to uncover issues of concern)

Lexical theme detection is currently supported for managed events and can be accessed in Advanced Reporting. You can view and edit lexical themes within Advanced Reporting for these events. In Analytics Studio, lexical themes are available in view-only mode.

Supervised Theme Detection

Supervised Theme Detection is a highly accurate deep learning method built on a vetted set of predefined themes. Rather than relying on a list of keywords and phrases for text matching, this approach fine-tunes a deep learning AI model on a large volume of labeled comments across industries. The model learns the common language patterns and semantic content backing each of the predefined themes and is thus able to categorize comments into those themes with enhanced accuracy. It is worth noting that the Perceptyx Supervised Theme Detection model is trained to detect our curated set of 38 employee engagement-related themes, and while it offers industry-leading accuracy on these 38 themes, it is unable to be extended to detect “custom” or “company-specific” themes.

Supervised theme detection is currently available for managed and self-service events across several areas of the platform:

Advanced Reporting
Analytics Studio
AI Hub

Intent Detection

Once comments have been categorized into themes, a user may be interested in identifying what “comment types” are appearing within a theme to further understand what the discourse is around those themes. For example, knowing that “Benefits” has been mentioned in a large fraction of comments is a useful insight; however, are employees describing what they really enjoy (praise), voicing general preferences (wants/preferences), making recommendations (should/suggest) for improvement, highlighting pain points (needs/concerns), or detailing key frustrations (angry/unfair) with company benefits? This is where Intent Detection comes in as a deep dive analysis tool to further explore individual themes.

Independent of the theme detection approach, we categorize each comment into a set of 5 intents. These are:

● Approval/Praise - highly positive/favorable language

● Wants/Preferences - general nice-to-haves

● Should/Suggest - specific recommendations

● Needs/Concerns - sharing pain points

● Angry/Unfair - highly negative/critical language

The intent categories are inspired by a similar model in cognitive psychology that is used to measure and rank attitudes towards a topic or concept. Each “Intent” captures a specific framing or emotional emphasis. The aim is to identify emotional indicators and satisfaction/action indicators, such that users can easily filter and isolate comment types within themes into a more focused subset of similar comments within a given theme.

Perceptyx’s intent detection model is a machine learning AI approach that is trained on the same backbone as the Supervised Theme Detection model, and thus offers the same high degree of accuracy.

Intent detection is currently available for managed and self-service events across several areas of the platform:

Advanced Reporting
Analytics Studio
AI Hub

Sentiment Analysis

Sentiment Analysis is a model that aims to identify the polarity of a comment, ranging from highly negative to highly positive. The model does this by mapping each sentence within a comment into one of three non-overlapping categories: negative, neutral, or positive.

Perceptyx’s Sentiment Analysis model is actually a set of advanced deep learning models, each trained from one of four large, open source datasets. The models are then “averaged” together, to provide one final verdict per input sentence. Incorporating models trained using the most wide-spread set of input data possible avoids bias towards any topic or specific industry.

Sentiment analysis is currently available for managed and self-service events across several areas of the platform:

Advanced Reporting
Analytics Studio
AI Hub

Multilingual Support

By default, when the Perceptyx system receives a non-English comment, it translates the comment into English using the Microsoft Azure AI Translator. This translated English comment is then provided as input to each of the comment analytics models described above.

However, there are some unique situations in which this translation procedure fails. Sometimes the survey is provided in English, but the respondent chooses to respond entirely in a different language, or interweave phrases from a foreign language into their responses. As a backup for these situations, our sentiment analysis, intents detection, and supervised (non-lexical) theme detection models are trained on what are called “multilingual embeddings.” This means that when necessary, the models can accept foreign language text as input (without the need for English translations), and provide sentiment, intent, and theme classifications at nearly the same level of accuracy.

Motivation and Industry Comparisons

Our multi-pronged approach provides a full set of complementary methods, which provide users with a comprehensive approach to comments analysis. Our approach is in-line with the trends and best practices of the employee survey analytics industry, and all three approaches provided by Perceptyx match the expected norms of methods used within the industry.

The industry has historically made use of lexical methods for theme detection, usually providing a base set of themes related to employee engagement, along with a function allowing users to create custom themes. Thus, the lexical method is expected to be available, as a norm in the industry. Where we differ from the industry, in regard to lexical themes, is that we have expanded beyond simply providing employee engagement themes, by creating new theme sets, as described above, covering topics such as DEI, Patient Safety, specific industries, remote work, and so forth. This provides our customers with off-the-shelf sets of themes which cover a multitude of topic areas that are not matched in scope by our competitors. This method also allows for maximum flexibility in theme detection, because customers can readily tailor fit the themes and keywords to meet their needs, thus allowing quick updates to address new issues.

However, the drawbacks with this approach are that: (1) due to its reliance on keyword or/ phrase matching, it is possible for false positive matches to occur, simply because the word or phrase is being used in a different context than what the theme is looking for, and (2) it is limited to the list of themes that are expressly provided to be searched for, meaning that new issues could be missed.

Thus, we have introduced state-of-the-art deep learning models into our product line, to increase the accuracy and flexibility of theme detection (supervised and unsupervised theme detection), and allow for two alternative, complementary avenues of comments exploration (intents detection and sentiment analysis).

Model Training and Validation

Our themes & intents are built in collaboration with subject matter experts and other stakeholders, to ensure that we provide themes that are relevant for use, are strongly defined in their scope, and contain useful concepts within them that could be broken down for further analysis, if desired. Our themes and intents detection data models are trained on a proprietary dataset of human-labeled comments, in which a human being has read each comment and denoted which theme(s), if any, appear in a given comment. The training data have been labeled by subject matter experts on human behavior, survey, and text data; and these data cover the range of typical employee survey type questions and topics.

To train our new model, we use a category of models called transformers which has become very popular in the past few years. They work especially well with text data. Moreover, we use a specific, custom architecture that provides fast inference speeds and works well to classify text data from multiple languages. The architecture looks like this:

During training, we hold out a subset of the data as a validation set, used to ensure that the model is performing well after training is complete. There are many different ways of measuring performance of models tasked with assigning one or more labels to a given input, but one of the most common measures is an average macro F1 score. The general intuition is that the higher the F1 score, the better the model is able to jointly minimize false positives (inaccurately predicting a theme when there isn’t one present) and false negatives (inaccurately predicting no theme when there is one present). We average the F1 scores for all possible themes and intents to get our final score, which is how we’re able to ultimately compare the accuracy of our models. See below for some of the relevant results. The precision indicates the model’s ability to minimize false positives, while the recall indicates the model’s ability to minimize false negatives.

	Avg. Precision	Avg. Recall	Avg. F1
Theme	0.98	0.88	0.92
Intents	0.96	0.95	0.96

The sentiment model is trained under exactly the same regimen and deep learning architecture. Across our four sentiment analysis datasets, we see an average macro F1 score of 0.94.

Default Themes

Benefits

Benefits: Dental

Benefits: Education

Benefits: Events/Food

Benefits: Family Support Benefits: Fitness

Benefits: Medical/Health

Benefits: Retirement Planning

Benefits: Time Off/PTO

Benefits: Vision

Bureaucracy

Career Opportunities

Commitment

Communication

Compensation

Continuous Improvement

Culture

Customer Experience

Departmental Effectiveness

Discrimination and Harassment

Diversity and Inclusion

Efficiency

Favoritism

Feedback

Focus and Goals

Global

Human Resources

Hiring and Retention

Job Security

Learning and Development

Legal Issues

Management

Market

Overtime

Performance

Performance Management

Products and Services

Promotion and Advancement

Recognition

Remote/Hybrid Work

Resources

Safety

Social Responsibility

Strategy

Survey Results

Teamwork

Trust

Turnover

Work Life Balance

Workload

Lexical Theme Sets

Default Engagement Themes: Provides an overview of the most common topics/issues that are monitored across industries.  

Diversity, Equity, & Inclusion: A special topic theme set that examines DEI issues within an organization.  

COVID-19: A special topic theme set that examines the impact of the Coronavirus pandemic within an organization.  

Work from Home: A special topic theme set (includes COVID-19) that examines the impact of working from home including issues/challenges with organizational transition.

Middle Management: A special topic theme set (includes COVID-19) that examines the impact and concerns of middle management including issues/challenges to adapting across an organization.  

Return to Work: A special topic theme set (includes COVID-19) that examines the impact and concerns regarding returning to the traditional work environment. 

Healthcare: A special industry theme set that partitions departments/roles across the healthcare industry. 

Patient Safety: A special industry theme set that partitions topics related to patient safety in the healthcare industry. 

Retail: A special industry theme set that partitions departments/roles across the retail industry. 

Manufacturing: A special industry theme set that partitions departments/roles across the manufacturing industry.

Hot Words and Topics: A special theme set, for detecting offensive word use and to identify hot topics that consist of issues that merit elevated concern.