A Coding Information Implementing ScrapeGraph And Gemini AI For An Automated, Scalable, Perception-Pushed Aggressive Intelligence And Market Evaluation Workflow

On this tutorial, we show the way to leverage ScrapeGraph’s highly effective scraping instruments together with Gemini AI to automate the gathering, parsing, and evaluation of competitor data. Through the use of ScrapeGraph’s SmartScraperTool and MarkdownifyTool, customers can extract detailed insights from product choices, pricing methods, know-how stacks, and market presence straight from competitor web sites. The tutorial then employs Gemini’s superior language mannequin to synthesize these disparate information factors into structured, actionable intelligence. All through the method, ScrapeGraph ensures that the uncooked extraction is each correct and scalable, permitting analysts to concentrate on strategic interpretation moderately than handbook information gathering.

%pip set up --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn

We quietly improve or set up the most recent variations of important libraries, together with langchain-scrapegraph for superior net scraping and langchain-google-genai for integrating Gemini AI, in addition to information evaluation instruments reminiscent of pandas, matplotlib, and seaborn, to make sure your atmosphere is prepared for seamless aggressive intelligence workflows.

import getpass
import os
import json
import pandas as pd
from typing import Listing, Dict, Any
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

We import important Python libraries for establishing a safe, data-driven pipeline: getpass and os handle passwords and atmosphere variables, json handles serialized information, and pandas affords sturdy DataFrame operations. The typing module offers sort hints for higher code readability, whereas datetime data timestamps. Lastly, matplotlib.pyplot and seaborn equip us with instruments for creating insightful visualizations.

if not os.environ.get("SGAI_API_KEY"):
    os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:n")


if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:n")

We test if the SGAI_API_KEY and GOOGLE_API_KEY atmosphere variables are already set; if not, the script securely prompts the person for his or her ScrapeGraph and Google (Gemini) API keys by way of getpass and shops them within the atmosphere for subsequent authenticated requests.

from langchain_scrapegraph.instruments import (
    SmartScraperTool,
    SearchScraperTool,
    MarkdownifyTool,
    GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser


smartscraper = SmartScraperTool()
searchscraper = SearchScraperTool()
markdownify = MarkdownifyTool()
credit = GetCreditsTool()


llm = ChatGoogleGenerativeAI(
    mannequin="gemini-1.5-flash",
    temperature=0.1,
    convert_system_message_to_human=True
)

Right here, we import and instantiate ScrapeGraph instruments, the SmartScraperTool, SearchScraperTool, MarkdownifyTool, and GetCreditsTool, for extracting and processing net information, then configure the ChatGoogleGenerativeAI with the “gemini-1.5-flash” mannequin (low temperature and human-readable system messages) to drive our evaluation. We additionally herald ChatPromptTemplate, RunnableConfig, chain, and JsonOutputParser from langchain_core to construction prompts and parse mannequin outputs.

class CompetitiveAnalyzer:
    def __init__(self):
        self.outcomes = []
        self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
   
    def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict[str, Any]:
        """Scrape complete information from a competitor web site"""
       
        extraction_prompt = """
        Extract the next data from this web site:
        1. Firm identify and tagline
        2. Major merchandise/companies supplied
        3. Pricing data (if accessible)
        4. Audience/market
        5. Key options and advantages highlighted
        6. Know-how stack talked about
        7. Contact data
        8. Social media presence
        9. Current information or bulletins
        10. Crew measurement indicators
        11. Funding data (if talked about)
        12. Buyer testimonials or case research
        13. Partnership data
        14. Geographic presence/markets served
       
        Return the data in a structured JSON format with clear categorization.
        If data will not be accessible, mark as 'Not Out there'.
        """
       
        strive:
            end result = smartscraper.invoke({
                "user_prompt": extraction_prompt,
                "website_url": url,
            })
           
            markdown_content = markdownify.invoke({"website_url": url})
           
            competitor_data = {
                "company_name": company_name or "Unknown",
                "url": url,
                "scraped_data": end result,
                "markdown_length": len(markdown_content),
                "analysis_date": self.analysis_timestamp,
                "success": True,
                "error": None
            }
           
            return competitor_data
           
        besides Exception as e:
            return {
                "company_name": company_name or "Unknown",
                "url": url,
                "scraped_data": None,
                "error": str(e),
                "success": False,
                "analysis_date": self.analysis_timestamp
            }
   
    def analyze_competitor_landscape(self, opponents: Listing[Dict[str, str]]) -> Dict[str, Any]:
        """Analyze a number of opponents and generate insights"""
       
        print(f"🔍 Beginning aggressive evaluation for {len(opponents)} firms...")
       
        for i, competitor in enumerate(opponents, 1):
            print(f"📊 Analyzing {competitor['name']} ({i}/{len(opponents)})...")
           
            information = self.scrape_competitor_data(
                competitor['url'],
                competitor['name']
            )
            self.outcomes.append(information)
       
        analysis_prompt = ChatPromptTemplate.from_messages([
            ("system", """
            You are a senior business analyst specializing in competitive intelligence.
            Analyze the scraped competitor data and provide comprehensive insights including:
           
            1. Market positioning analysis
            2. Pricing strategy comparison
            3. Feature gap analysis  
            4. Target audience overlap
            5. Technology differentiation
            6. Market opportunities
            7. Competitive threats
            8. Strategic recommendations
           
            Provide actionable insights in JSON format with clear categories and recommendations.
            """),
            ("human", "Analyze this competitive data: {competitor_data}")
        ])
       
        clean_data = []
        for end in self.outcomes:
            if end result['success']:
                clean_data.append({
                    'firm': end result['company_name'],
                    'url': end result['url'],
                    'information': end result['scraped_data']
                })
       
        analysis_chain = analysis_prompt | llm | JsonOutputParser()
       
        strive:
            competitive_analysis = analysis_chain.invoke({
                "competitor_data": json.dumps(clean_data, indent=2)
            })
        besides:
            analysis_chain_text = analysis_prompt | llm
            competitive_analysis = analysis_chain_text.invoke({
                "competitor_data": json.dumps(clean_data, indent=2)
            })
       
        return {
            "evaluation": competitive_analysis,
            "raw_data": self.outcomes,
            "summary_stats": self.generate_summary_stats()
        }
   
    def generate_summary_stats(self) -> Dict[str, Any]:
        """Generate abstract statistics from the evaluation"""
        successful_scrapes = sum(1 for r in self.outcomes if r['success'])
        failed_scrapes = len(self.outcomes) - successful_scrapes
       
        return {
            "total_companies_analyzed": len(self.outcomes),
            "successful_scrapes": successful_scrapes,
            "failed_scrapes": failed_scrapes,
            "success_rate": f"{(successful_scrapes/len(self.outcomes)*100):.1f}%" if self.outcomes else "0%",
            "analysis_timestamp": self.analysis_timestamp
        }
   
    def export_results(self, filename: str = None):
        """Export outcomes to JSON and CSV recordsdata"""
        if not filename:
            filename = f"competitive_analysis_{datetime.now().strftime('%Ypercentmpercentd_percentHpercentMpercentS')}"
       
        with open(f"{filename}.json", 'w') as f:
            json.dump({
                "outcomes": self.outcomes,
                "abstract": self.generate_summary_stats()
            }, f, indent=2)
       
        df_data = []
        for end in self.outcomes:
            if end result['success']:
                df_data.append({
                    'Firm': end result['company_name'],
                    'URL': end result['url'],
                    'Success': end result['success'],
                    'Data_Length': len(str(end result['scraped_data'])) if end result['scraped_data'] else 0,
                    'Analysis_Date': end result['analysis_date']
                })
       
        if df_data:
            df = pd.DataFrame(df_data)
            df.to_csv(f"{filename}.csv", index=False)
           
        print(f"✅ Outcomes exported to {filename}.json and {filename}.csv")

The CompetitiveAnalyzer class orchestrates end-to-end competitor analysis, scraping detailed firm data utilizing ScrapeGraph instruments, compiling and cleansing the outcomes, after which leveraging Gemini AI to generate structured aggressive insights. It additionally tracks success charges and timestamps, and offers utility strategies to export each uncooked and summarized information into JSON and CSV codecs for straightforward downstream reporting and evaluation.

def run_ai_saas_analysis():
    """Run a complete evaluation of AI/SaaS opponents"""
   
    analyzer = CompetitiveAnalyzer()
   
    ai_saas_competitors = [
        {"name": "OpenAI", "url": "https://openai.com"},
        {"name": "Anthropic", "url": "https://anthropic.com"},
        {"name": "Hugging Face", "url": "https://huggingface.co"},
        {"name": "Cohere", "url": "https://cohere.ai"},
        {"name": "Scale AI", "url": "https://scale.com"},
    ]
   
    outcomes = analyzer.analyze_competitor_landscape(ai_saas_competitors)
   
    print("n" + "="*80)
    print("🎯 COMPETITIVE ANALYSIS RESULTS")
    print("="*80)
   
    print(f"n📊 Abstract Statistics:")
    stats = outcomes['summary_stats']
    for key, worth in stats.gadgets():
        print(f"   {key.substitute('_', ' ').title()}: {worth}")
   
    print(f"n🔍 Strategic Evaluation:")
    if isinstance(outcomes['analysis'], dict):
        for part, content material in outcomes['analysis'].gadgets():
            print(f"n   {part.substitute('_', ' ').title()}:")
            if isinstance(content material, listing):
                for merchandise in content material:
                    print(f"     • {merchandise}")
            else:
                print(f"     {content material}")
    else:
        print(outcomes['analysis'])
   
    analyzer.export_results("ai_saas_competitive_analysis")
   
    return outcomes

The above operate initiates the aggressive evaluation by instantiating CompetitiveAnalyzer and defining the important thing AI/SaaS gamers to be evaluated. It then runs the complete scraping-and-insights workflow, prints formatted abstract statistics and strategic findings, and eventually exports the detailed outcomes to JSON and CSV for additional use.

def run_ecommerce_analysis():
    """Analyze e-commerce platform opponents"""
   
    analyzer = CompetitiveAnalyzer()
   
    ecommerce_competitors = [
        {"name": "Shopify", "url": "https://shopify.com"},
        {"name": "WooCommerce", "url": "https://woocommerce.com"},
        {"name": "BigCommerce", "url": "https://bigcommerce.com"},
        {"name": "Magento", "url": "https://magento.com"},
    ]
   
    outcomes = analyzer.analyze_competitor_landscape(ecommerce_competitors)
    analyzer.export_results("ecommerce_competitive_analysis")
   
    return outcomes

The above operate units up a CompetitiveAnalyzer to judge main e-commerce platforms by scraping particulars from every website, producing strategic insights, after which exporting the findings to each JSON and CSV recordsdata underneath the identify “ecommerce_competitive_analysis.”

@chain
def social_media_monitoring_chain(company_urls: Listing[str], config: RunnableConfig):
    """Monitor social media presence and engagement methods of opponents"""
   
    social_media_prompt = ChatPromptTemplate.from_messages([
        ("system", """
        You are a social media strategist. Analyze the social media presence and strategies
        of these companies. Focus on:
        1. Platform presence (LinkedIn, Twitter, Instagram, etc.)
        2. Content strategy patterns
        3. Engagement tactics
        4. Community building approaches
        5. Brand voice and messaging
        6. Posting frequency and timing
        Provide actionable insights for improving social media strategy.
        """),
        ("human", "Analyze social media data for: {urls}")
    ])
   
    social_data = []
    for url in company_urls:
        strive:
            end result = smartscraper.invoke({
                "user_prompt": "Extract all social media hyperlinks, group engagement options, and social proof components",
                "website_url": url,
            })
            social_data.append({"url": url, "social_data": end result})
        besides Exception as e:
            social_data.append({"url": url, "error": str(e)})
   
    chain = social_media_prompt | llm
    evaluation = chain.invoke({"urls": json.dumps(social_data, indent=2)}, config=config)
   
    return {
        "social_analysis": evaluation,
        "raw_social_data": social_data
    }

Right here, this chained operate defines a pipeline to assemble and analyze opponents’ social media footprints: it makes use of ScrapeGraph’s good scraper to extract social media hyperlinks and engagement components, then feeds that information into Gemini with a centered immediate on presence, content material technique, and group ways. Lastly, it returns each the uncooked scraped data and the AI-generated, actionable social media insights in a single structured output.

def check_credits():
    """Examine accessible credit"""
    strive:
        credits_info = credit.invoke({})
        print(f"💳 Out there Credit: {credits_info}")
        return credits_info
    besides Exception as e:
        print(f"⚠️  Couldn't test credit: {e}")
        return None

The above operate calls the GetCreditsTool to retrieve and show your accessible ScrapeGraph/Gemini API credit, printing the end result or a warning if the test fails, and returns the credit score data (or None on error).

if __name__ == "__main__":
    print("🚀 Superior Aggressive Evaluation Software with Gemini AI")
    print("="*60)
   
    check_credits()
   
    print("n🤖 Operating AI/SaaS Aggressive Evaluation...")
    ai_results = run_ai_saas_analysis()
   
    run_additional = enter("n❓ Run e-commerce evaluation as nicely? (y/n): ").decrease().strip()
    if run_additional == 'y':
        print("n🛒 Operating E-commerce Platform Evaluation...")
        ecom_results = run_ecommerce_analysis()
   
    print("n✨ Evaluation full! Examine the exported recordsdata for detailed outcomes.")

Lastly, the final code piece serves because the script’s entry level: it prints a header, checks API credit, then kicks off the AI/SaaS competitor evaluation (and optionally e-commerce evaluation) earlier than signaling that each one outcomes have been exported.

In conclusion, integrating ScrapeGraph’s scraping capabilities with Gemini AI transforms a historically time-consuming aggressive intelligence workflow into an environment friendly, repeatable pipeline. ScrapeGraph handles the heavy lifting of fetching and normalizing web-based data, whereas Gemini’s language understanding turns that uncooked information into high-level strategic suggestions. In consequence, companies can quickly assess market positioning, establish characteristic gaps, and uncover rising alternatives with minimal handbook intervention. By automating these steps, customers achieve velocity and consistency, in addition to the flexibleness to develop their evaluation to new opponents or markets as wanted.

Take a look at the Pocket book on GitHub. All credit score for this analysis goes to the researchers of this venture. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 95k+ ML SubReddit and Subscribe to our E-newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

What's Hot

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?

27 scientists in Eire on Extremely Cited Researchers listing

A Coding Information Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Perception-Pushed Aggressive Intelligence and Market Evaluation Workflow

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

Methods to Cut back Price and Latency of Your RAG Software Utilizing Semantic LLM Caching

Baidu Releases ERNIE-4.5-VL-28B-A3B-Considering: An Open-Supply and Compact Multimodal Reasoning Mannequin Beneath the ERNIE-4.5 Household

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?

27 scientists in Eire on Extremely Cited Researchers listing

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?

27 scientists in Eire on Extremely Cited Researchers listing

What's Hot

A Coding Information Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Perception-Pushed Aggressive Intelligence and Market Evaluation Workflow

Related Posts

Subscribe For Latest Updates