Comment Analytics Capabilities

Modified on Mon, 13 Nov 2023 at 07:38 AM

Perceptyx Comment Analytics Capabilities

Introduction

This document provides an overview of our comment analysis capabilities here at Perceptyx. Perceptyx uses natural language processing (NLP) approaches for theme detection, intents detection, and sentiment analysis. The sections below give an overview of the issues we are trying to help clients address, a discussion of our approach, how our approach compares to others in the industry, and an example case of how our tools can help a client uncover insights that help them make better decisions.

What Does Comments Analysis Do?

The open-ended, verbatim comments left by survey respondents provides a rich source of information. Truly, it is the direct, unfiltered voice of the respondent. That said, reviewing text data has traditionally been an arduous, manual process, where individuals have to read through each individual comment, in order to gain an understanding of what respondents are writing about. While this is time-consuming enough for direct line managers, it is even more so for HR and people analytics leaders, who face the task of conducting organization-wide analysis and reporting for whom reading all comments is an essentially impossible task. This is where AI tools come in, to make an otherwise impossible task into an easy one.

Themes detection, intents detection, and sentiment analysis together address a critical issue facing our users - how to understand what issues employees are talking about, zoom in on the most critical issues, and then determine appropriate action items, without having to painstakingly read all the comments.

Perceptyx’s comments analysis feature provides a way for users to parse through the sea of comments in an informed manner, navigating the data via ad hoc combinations of topic filters (what the comments are about) and intent filters (how the topics are being talked about) to ultimately uncover what is important to them. This could be comments offering praise on certain benefits, or comments offering suggestions on how management could improve. In the sections below, we discuss how the comments analysis feature works in more detail.

Our Approach to Comments Analysis

Perceptyx provides a three-prong approach to comments analysis, in order to provide a comprehensive solution for our clients. These approaches are:

Theme Detection
Intents Detection
Sentiment Analysis

Theme Detection

Theme Detection refers to a set of methods for mapping comments into a predefined set of topics. Out of the box, we offer two main theme detection approaches:

Lexical Theme Detection (Keyword matching): a highly flexible method which allows quick creation of new/custom themes.
Supervised Theme Detection (Advanced Theme Discovery): a highly accurate, multi-lingual, deep learning method built on a vetted set of predefined themes.

Lexical Theme Detection is a flexible way to detect themes in comments. In this method, we devise themes to be searched for, based on a list of keywords and phrases. This approach enables clients to customize, update, create, or delete the themes and keywords applied to their data. An individual comment can be aligned to multiple themes. Perceptyx provides a curated set of 38 default themes related to employee engagement, e.g., employee benefits, safety, trust, employee recommendations and responses, etc.. These engagement themes, and other theme sets noted below, are subject to expansion, based on new issues, areas of focus, and feedback from internal and external stakeholders. Additional lexical theme sets are also available to dig further into topic areas of focus, such as:

Diversity, Equity, & Inclusion (DEI)
COVID-19, Work from Home (WFH)
Middle Manager
Industry-specific Theme Sets (i.e. Healthcare, Retail, and Manufacturing)
Patient Safety (e.g. medication errors, etc.)
Hot Words / Hot Topic themes (to uncover issues of concern)

Supervised Theme Detection is a machine learning / AI approach for enhancing the accuracy of matches. Rather than relying on a list of keywords and phrases for text matching, this approach uses a set of training data, in which the algorithm is shown verified examples of text mentioning themes, and the algorithm then learns to understand and detect language pertaining to different themes. At Perceptyx, our supervised theme detection model is trained to detect our curated set of 38 employee engagement related themes, and supports over 100 languages, without the need for translating into English.

Intents Detection

Once comments have been categorized into themes, a user may be interested in what the discourse is around those themes. For example, knowing that “Benefits” has been mentioned in a large fraction of comments is a useful insight; however, are employees generally satisfied with company benefits, or do they find the benefits severely lacking? This is where Intents Detection comes in.

Independent of the theme detection approach, we categorize each comment into a set of 5 intents.

These are:

Approval/Praise - highly positive
Wants/Preferences - general nice-to-haves
Should/Suggest - specific recommendations
Needs/Concerns - sharing pain points
Angry/Unfair - highly negative

The intents categories are inspired by a similar model in cognitive psychology, used to measure and rank attitudes towards a topic or concept. Each “Intent” captures a specific framing or emotional emphasis. The aim is to identify emotional indicators and satisfaction/action indicators, such that users can easily spend time on only the most impactful themes, or the most impactful subset of comments within a given theme.

Perceptyx’s intents detection model is a machine learning / AI approach that is trained on the same backbone as the Supervised Theme Detection model, and thus offers the same high degree of accuracy, along with out-of-the-box multilingual support.

Sentiment Analysis

Sentiment Analysis is a model that aims to identify the polarity of a comment, ranging from highly negative to highly positive. The model does this by mapping each sentence within a comment into one of three non-overlapping categories: negative, neutral, or positive.

Perceptyx’s Sentiment Analysis model is actually a set of advanced deep learning models, each trained from one of three large, open source datasets. The models are then “averaged” together, to provide one final verdict per input sentence. Incorporating models trained off as wide-spread a set of input data as possible avoids bias towards any topic or specific industry. Additionally, the models are trained on what are called “multilingual embeddings”, so we support sentiment analysis of over 100 languages, without the need to first translate foreign language comments into English (which may introduce unnecessary noise).

Model Training and Validation

Our themes & intents are built in collaboration with subject matter experts and other stakeholders, to ensure that we provide themes that are relevant for use, are strongly defined in their scope, and contain useful concepts within them that could be broken down for further analysis, if desired. Our themes/intents detection data models are trained on a proprietary dataset of human-labeled comments, in which a human being has read each comment and denoted which theme(s), if any, appear in a given comment. The training data have been labeled by subject matter experts on human behavior, survey, and text data; and these data cover the range of typical employee survey type questions and topics.

To train our new model, we use a category of models called transformers which has become very popular in the past few years. They work especially well with text data. Moreover, we use a specific, custom architecture that provides fast inference speeds and works well to classify text data from multiple languages. The architecture looks like this:

During training, we hold out a subset of the data as a validation set, used to ensure that the model is performing well after training is complete. There are many different ways of measuring performance of models tasked with assigning one or more labels to a given input, but one of the most common measures is an average macro F1 score. The general intuition is that the higher the F1 score, the better the model is able to jointly minimize false positives (inaccurately predicting a theme when there isn’t one present) and false negatives (inaccurately predicting no theme when there is one present). We average the F1 scores for all possible themes and intents to get our final score, which is how we’re able to ultimately compare the “goodness” of our models. See below for some of the relevant results. The precision indicates the model’s ability to minimize false positives, while the recall indicates the model’s ability to minimize false negatives.

	Avg. Precision	Avg. Recall	Avg. F1
Theme	0.98	0.88	0.92
Intents	0.96	0.95	0.96

The sentiment model is trained under exactly the same regimen and deep learning architecture. Across our three sentiment analysis datasets, we see an average macro F1 score of 0.94.

Motivation and Industry Comparisons

Our 3-prong approach provides a full set of complementary methods, which provide users with a comprehensive approach to comments analysis. Our approach is in-line with the trends and best practices of the employee survey analytics industry, and all three approaches provided by Perceptyx match the expected norms of methods used within the industry.

The industry has historically made use of lexical methods for theme detection, usually providing a base set of themes related to employee engagement, along with a function allowing users to create custom themes. Thus, the lexical method is expected to be available, as a norm in the industry. Where we differ from the industry, in regards to lexical themes, is that we have expanded beyond simply providing employee-engagement themes, by creating new theme sets, as described above, covering topics such as DEI, Patient Safety, specific industries, WFH, and so forth. This provides our clients with off-the-shelf sets of themes, which cover a multitude of topic areas that are not matched in scope by our competitors. This method also allows for maximum flexibility in theme detection, as clients can readily tailor fit the themes and keywords to meet their needs, thus allowing quick updates to address new issues.

However, the drawbacks with this approach are that: (1) due to its reliance on keyword or/ phrase matching, it is possible for false positive matches to occur, simply because of the word /or phrase is being used in a different context than what the theme is looking for, and (2) it is limited to the list of themes that are expressly provided to be searched for, meaning that new issues could be missed.

Thus, we have introduced three state-of-the-art deep learning models into our product line, in order to increase the accuracy and precision of theme detection (supervised theme detection), and allow for two alternative, complementary avenues of comments exploration (intents detection and sentiment analysis).

Appendix 1. Default Themes

Benefits

Bureaucracy

Career Opportunities

Commitment

Communication

Compensation

Continuous Improvement

Culture

Customer Experience

Departmental Effectiveness

Discrimination and Harassment

Diversity and Inclusion

Efficiency

Favoritism

Feedback

Focus and Goals

Global

HR and Recruitment

Job Security

Learning and Development

Legal Issues

Management

Market

Overtime

Performance

Performance Management

Products and Services

Promotion and Advancement

Recognition

Resources

Safety

Social Responsibility

Strategy

Survey Results

Teamwork

Trust

Turnover

Work Life Balance

Workload

Appendix 2. Lexical Theme Sets

1. Default Engagement Themes: – Provides an overview of the most common topics/issues that are monitored across industries.

2. Diversity, Equity & Inclusion: – A special topic theme set that examines DEI issues within an organization.

3. COVID-19: – A special topic theme set that examines the impact of the Coronavirus pandemic within an organization.

4. Work from Home: – A special topic theme set (includes COVID-19) that examines the impact of working from home including issues/challenges with organizational transition.

5. Middle Management: – A special topic theme set (includes COVID-19) that examines the impact and concerns of middle management including issues/challenges to adapting across an organization.

6. Return to Work: – A special topic theme set (includes COVID-19) that examines the impact and concerns regarding returning to the traditional work environment.

7. Healthcare: – A special industry theme set that partitions departments/roles across the healthcare industry.

8. Patient Safety: – A special industry theme set that partitions topics related to patient safety in the healthcare industry.

9. Retail: – A special industry theme set that partitions departments/roles across the retail industry.

10. Manufacturing: – A special industry theme set that partitions departments/roles across the manufacturing industry.

11. Hot Words and Topics: – A special theme set, for detecting offensive word use and to identify hot topics that consist of issues that merit elevated concern.

Appendix 3. Intents Set

Angry and Unfair
Needs and Concerns
Should and Suggest
Wants and Preferences
Praise