Sentimentanalysis¶
flowtask.components.SentimentAnalysis
¶
ModelPrediction
¶
ModelPrediction(sentiment_model='tabularisai/robust-sentiment-analysis', emotions_model='bhadresh-savani/distilbert-base-uncased-emotion', classification='sentiment-analysis', levels=5, max_length=512, use_bertweet=False, use_bert=False, use_roberta=False)
ModelPrediction
Overview
Performs sentiment analysis and emotion detection on text using Hugging Face Transformers.
This class utilizes pre-trained models for sentiment analysis and emotion detection.
It supports different model architectures like BERT, BERTweet, and RoBERTa.
The class handles text chunking for inputs exceeding the maximum token length
and provides detailed sentiment and emotion scores along with predicted labels.
Attributes:
| Name | Type | Description |
|---|---|---|
sentiment_model |
str
|
Name of the sentiment analysis model to use from Hugging Face. |
emotions_model |
str
|
Name of the emotion detection model to use from Hugging Face. |
classification |
str
|
Type of classification pipeline to use (e.g., 'sentiment-analysis'). |
levels |
int
|
Number of sentiment levels for sentiment analysis (2, 3, or 5). |
max_length |
int
|
Maximum token length for input texts. Defaults to 512. |
use_bertweet |
bool
|
If True, uses BERTweet model for sentiment analysis. Defaults to False. |
use_bert |
bool
|
If True, uses BERT model for sentiment analysis. Defaults to False. |
use_roberta |
bool
|
If True, uses RoBERTa model for sentiment analysis. Defaults to False. |
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
A DataFrame with sentiment and emotion analysis results. |
|
|
Includes columns for sentiment scores, sentiment labels, emotion scores, and emotion labels. |
Raises:
| Type | Description |
|---|---|
ComponentError
|
If there is an issue during text processing or data handling. |
Example
|
|
SentimentAnalysis
|
text_column: text sentiment_model: tabularisai/robust-sentiment-analysis sentiment_levels: 5 emotions_model: bhadresh-savani/distilbert-base-uncased-emotion |
Sets up the sentiment analysis and emotion detection models and tokenizers based on the provided configurations.
aggregate_sentiments
¶
Aggregates sentiment predictions from multiple texts to produce a single overall sentiment.
Calculates the average sentiment score across a list of sentiment predictions and determines the overall predicted sentiment based on these averages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sentiments
|
list
|
A list of dictionaries, each containing sentiment prediction results |
required |
levels
|
int
|
The number of sentiment levels used in the analysis, determining the sentiment map. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The aggregated predicted sentiment label (e.g., 'Positive', 'Negative', 'Neutral'). |
predict_emotion
¶
Predicts the emotion of the input text.
Handles text chunking for long texts to ensure they fit within the model's token limit. Returns a dictionary containing emotion predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The input text to predict emotion for. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A dictionary containing emotion predictions. |
dict
|
For example: {'emotions': [{'label': 'joy', 'score': 0.99}]} |
|
dict
|
Returns an empty dictionary if the input text is empty. |
predict_sentiment
¶
Predicts the sentiment of the input text.
Utilizes the sentiment analysis pipeline to classify the text and returns sentiment scores and the predicted sentiment label. Handles text chunking for texts exceeding the maximum token length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to analyze for sentiment. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A dictionary containing sentiment analysis results. |
dict
|
Includes 'score' (list of sentiment scores) and 'predicted_sentiment' (string label). |
|
dict
|
Returns None if the input text is empty. |
split_into_sentences
¶
Splits a text into sentences using NLTK's sentence tokenizer.
Leverages nltk.tokenize.sent_tokenize for robust sentence splitting, handling various sentence terminators and abbreviations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The input text to be split into sentences. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
A list of strings, where each string is a sentence from the input text. |
SentimentAnalysis
¶
Bases: FlowComponent
Applies sentiment analysis and emotion detection to a DataFrame of text data.
This component processes a DataFrame, applying Hugging Face Transformer models
to analyze the sentiment and emotions expressed in a specified text column.
It leverages the ModelPrediction class to perform the actual predictions
and integrates these results back into the DataFrame.
Properties
text_column (str): The name of the DataFrame column containing the text to analyze. Defaults to 'text'. sentiment_model (str): Model name for sentiment analysis. Defaults to 'tabularisai/robust-sentiment-analysis'. emotions_model (str): Model name for emotion detection. Defaults to 'cardiffnlp/twitter-roberta-base-emotion'. pipeline_classification (str): Classification type for the pipeline (e.g., 'sentiment-analysis'). Defaults to 'sentiment-analysis'. with_average (bool): Boolean to indicate if sentiment should be averaged across rows (if applicable). Defaults to True. sentiment_levels (int): Number of sentiment levels (2, 3, or 5). Default is 5. use_bert (bool): Boolean to use BERT model for sentiment analysis. Defaults to False. use_roberta (bool): Boolean to use RoBERTa model for sentiment analysis. Defaults to False. use_bertweet (bool): Boolean to use BERTweet model for sentiment analysis. Defaults to False.
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
The input DataFrame augmented with new columns for sentiment scores, |
|
|
predicted sentiment, emotion scores, and predicted emotion. |
||
|
Specifically, it adds: 'sentiment_scores', 'sentiment_score', 'emotions_score', |
||
|
'predicted_emotion', and 'predicted_sentiment' columns. |
Raises:
| Type | Description |
|---|---|
ComponentError
|
If input data is not a Pandas DataFrame or if the text column is not found. |
run
async
¶
Executes the sentiment analysis and emotion detection process on the input DataFrame.
Uses a single shared predictor instance to process data in larger batches. After processing, it concatenates the results and extracts relevant prediction scores and labels.
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: The DataFrame with added sentiment and emotion analysis results. |