Getting Began With Mirascope: Eradicating Semantic Duplicates Utilizing An LLM

Mirascope is a robust and user-friendly library that gives a unified interface for working with a variety of Massive Language Mannequin (LLM) suppliers, together with OpenAI, Anthropic, Mistral, Google (Gemini and Vertex AI), Groq, Cohere, LiteLLM, Azure AI, and Amazon Bedrock. It simplifies every part from textual content era and structured knowledge extraction to constructing complicated AI-powered workflows and agent methods.

On this information, we’ll give attention to utilizing Mirascope’s OpenAI integration to determine and take away semantic duplicates (entries which will differ in wording however carry the identical that means) from an inventory of buyer opinions.

Putting in the dependencies

pip set up "mirascope[openai]"

OpenAI Key

To get an OpenAI API key, go to https://platform.openai.com/settings/group/api-keys and generate a brand new key. For those who’re a brand new person, you might want so as to add billing particulars and make a minimal fee of $5 to activate API entry.

import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

Defining the listing of buyer opinions

customer_reviews = [
    "Sound quality is amazing!",
    "Audio is crystal clear and very immersive.",
    "Incredible sound, especially the bass response.",
    "Battery doesn't last as advertised.",
    "Needs charging too often.",
    "Battery drains quickly -- not ideal for travel.",
    "Setup was super easy and straightforward.",
    "Very user-friendly, even for my parents.",
    "Simple interface and smooth experience.",
    "Feels cheap and plasticky.",
    "Build quality could be better.",
    "Broke within the first week of use.",
    "People say they can't hear me during calls.",
    "Mic quality is terrible on Zoom meetings.",
    "Great product for the price!"
]

These opinions seize key buyer sentiments: reward for sound high quality and ease of use, complaints about battery life, construct high quality, and name/mic points, together with a constructive word on worth for cash. They replicate widespread themes present in actual person suggestions.

Defining a Pydantic Schema

This Pydantic mannequin defines the construction for the response of a semantic deduplication job on buyer opinions. This schema helps construction and validate the output of a language mannequin tasked with clustering or deduplicating pure language enter (e.g., person suggestions, bug stories, product opinions).

from pydantic import BaseModel, Subject

class DeduplicatedReviews(BaseModel):
    duplicates: listing[list[str]] = Subject(
        ..., description="An inventory of semantically equal buyer overview teams"
    )
    opinions: listing[str] = Subject(
        ..., description="The deduplicated listing of core buyer suggestions themes"
    )

Defining a Mirascope @openai.name for Semantic Deduplication

This code defines a semantic deduplication perform utilizing Mirascope’s @openai.name decorator, which allows seamless integration with OpenAI’s gpt-4o mannequin. The deduplicate_customer_reviews perform takes an inventory of buyer opinions and makes use of a structured immediate—outlined by the @prompt_template decorator—to information the LLM in figuring out and grouping semantically comparable opinions.

The system message instructs the mannequin to investigate the that means, tone, and intent behind every overview, clustering people who convey the identical suggestions even when worded in a different way. The perform expects a structured response conforming to the DeduplicatedReviews Pydantic mannequin, which incorporates two outputs: an inventory of distinctive, deduplicated overview sentiments, and an inventory of grouped duplicates.

This design ensures that the LLM’s output is each correct and machine-readable, making it perfect for buyer suggestions evaluation, survey deduplication, or product overview clustering.

from mirascope.core import openai, prompt_template

@openai.name(mannequin="gpt-4o", response_model=DeduplicatedReviews)
@prompt_template(
    """
    SYSTEM:
    You're an AI assistant serving to to investigate buyer opinions. 
    Your job is to group semantically comparable opinions collectively -- even when they're worded in a different way.

    - Use your understanding of that means, tone, and implication to group duplicates.
    - Return two lists:
      1. A deduplicated listing of the important thing distinct overview sentiments.
      2. An inventory of grouped duplicates that share the identical underlying suggestions.

    USER:
    {opinions}
    """
)
def deduplicate_customer_reviews(opinions: listing[str]): ...

The next code executes the deduplicate_customer_reviews perform utilizing an inventory of buyer opinions and prints the structured output. First, it calls the perform and shops the outcome within the response variable. To make sure that the mannequin’s output conforms to the anticipated format, it makes use of an assert assertion to validate that the response is an occasion of the DeduplicatedReviews Pydantic mannequin.

As soon as validated, it prints the deduplicated ends in two sections. The primary part, labeled “✅ Distinct Buyer Suggestions,” shows the listing of distinctive overview sentiments recognized by the mannequin. The second part, “🌀 Grouped Duplicates,” lists clusters of opinions that have been acknowledged as semantically equal.

response = deduplicate_customer_reviews(customer_reviews)

# Guarantee response format
assert isinstance(response, DeduplicatedReviews)

# Print Output
print("✅ Distinct Buyer Suggestions:")
for merchandise in response.opinions:
    print("-", merchandise)

print("n🌀 Grouped Duplicates:")
for group in response.duplicates:
    print("-", group)

AD 4nXeXEtDIZVmseoJjlYtxSHZz6fjlBHQ8OI44aHObRUI9RKMkfALAcwOf 339xVCBh26JCvlvdZHtPAMeulal7fiJg454clZf5qA Xw nPe1hWwA1au4

The output exhibits a clear abstract of buyer suggestions by grouping semantically comparable opinions. The Distinct Buyer Suggestions part highlights key insights, whereas the Grouped Duplicates part captures completely different phrasings of the identical sentiment. This helps eradicate redundancy and makes the suggestions simpler to investigate.

Try the complete Codes. All credit score for this analysis goes to the researchers of this undertaking.

Prepared to attach with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Analysis, and prime AI firms leverage MarkTechPost to achieve their target market [Learn More]

I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a eager curiosity in Knowledge Science, particularly Neural Networks and their utility in numerous areas.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s developments right this moment: learn extra, subscribe to our e-newsletter, and turn out to be a part of the NextTech group at NextTech-news.com

What's Hot

Fintechs in Kenya and Rwanda might quickly function underneath one licence

Irish unicorn Tines creating 100 jobs within the US

RayNeo X3 Professional Integrates Amap Providers, Bringing “Service-Finds-Person” Expertise to AR Glasses

Getting Began with Mirascope: Eradicating Semantic Duplicates utilizing an LLM

Find out how to Design a Streaming Determination Agent with Partial Reasoning, On-line Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

NVIDIA Releases Nemotron 3 Tremendous: A 120B Parameter Open-Supply Hybrid Mamba-Consideration MoE Mannequin Delivering 5x Larger Throughput for Agentic AI

Construct a Self-Designing Meta-Agent That Robotically Constructs, Instantiates, and Refines Job-Particular AI Brokers

Fintechs in Kenya and Rwanda might quickly function underneath one licence

Irish unicorn Tines creating 100 jobs within the US

RayNeo X3 Professional Integrates Amap Providers, Bringing “Service-Finds-Person” Expertise to AR Glasses

Fintechs in Kenya and Rwanda might quickly function underneath one licence

Irish unicorn Tines creating 100 jobs within the US

RayNeo X3 Professional Integrates Amap Providers, Bringing “Service-Finds-Person” Expertise to AR Glasses

What's Hot

Getting Began with Mirascope: Eradicating Semantic Duplicates utilizing an LLM

Putting in the dependencies

OpenAI Key

Defining the listing of buyer opinions

Defining a Pydantic Schema

Defining a Mirascope @openai.name for Semantic Deduplication

Related Posts

Subscribe For Latest Updates