The webmaster.py file implements an intelligent content analysis system that processes user input against brand guidelines and various brand-related data. Here's a detailed breakdown of its functionality:
Required packages:
pip install pymongo sentence-transformers textblob requests langchain langchain-google-genai transformers scikit-learn python-dotenv numpyCreate a .env file with the following variables:
MONGO_URI: MongoDB connection stringGROQ_API_KEY: Groq API keyGOOGLE_API_KEY: Google API key
-
Input Processing
- Takes brand ID and user text as input
- Preprocesses text for analysis
-
Sentiment Analysis
- Uses TextBlob for sentiment and subjectivity analysis
- Provides polarity and subjectivity scores
-
Text Embedding
- Utilizes SentenceTransformer ('all-MiniLM-L6-v2')
- Converts text into vector representations
-
Brand Data Retrieval
- Connects to MongoDB to fetch:
- Brand guidelines
- Brand adjectives
- Sector/subsector information
- Tonalities
- SEO keywords (branded and non-branded)
- Connects to MongoDB to fetch:
-
Context Processing
- Performs similarity search using vector embeddings
- Implements context filtering and fusion
- Applies metadata-based weighting for different content types
-
Intelligent Response Generation
- Uses LangChain with Google's Generative AI
- Incorporates brand context and guidelines
- Generates brand-aligned responses
-
preprocess(text: str) -> str- Purpose: Cleans and normalizes the input text
- Input: Raw text string
- Operations:
- Strips whitespace
- Converts to lowercase
- Truncates to max length (4096 characters)
- Returns: Cleaned and normalized text
-
analyze_sentiment(text)- Purpose: Analyzes the emotional tone of text
- Input: Preprocessed text
- Operations:
- Uses TextBlob for analysis
- Calculates polarity (-1 to 1)
- Measures subjectivity (0 to 1)
- Returns: Tuple of (sentiment_polarity, subjectivity)
-
fetch_brand_data(brand_id)- Purpose: Retrieves all brand-related information from MongoDB
- Input: Brand ID (string)
- Operations:
- Fetches brand guidelines
- Retrieves brand adjectives
- Gets sector/subsector information
- Collects tonalities
- Gathers SEO keywords
- Returns: Dictionary containing all brand data
-
similarity_search(query_embedding: np.ndarray, data: List[str], top_k: int = 5) -> List[Tuple[float, str]]- Purpose: Finds semantically similar content
- Input:
- query_embedding: Vector representation of query
- data: List of text items to search through
- top_k: Number of results to return (default: 5)
- Operations:
- Calculates cosine similarity
- Ranks results by similarity score
- Returns: List of (score, text) tuples
-
filter_and_fuse(user_embedding: np.ndarray, brand_data: Dict[str, List[str]], sentiment: float) -> Dict[str, List[str]]- Purpose: Combines and ranks different types of brand content
- Input:
- user_embedding: Vector of user input
- brand_data: Dictionary of brand information
- sentiment: Sentiment score
- Operations:
- Applies importance weights to different content types
- Adjusts scores based on sentiment alignment
- Ranks and filters content
- Returns: Dictionary of ranked and filtered content
-
generate_response(fused_results: Dict[str, List[str]], user_text: str, sentiment: float) -> str- Purpose: Creates final AI response using LangChain
- Input:
- fused_results: Processed brand context
- user_text: Original user input
- sentiment: Calculated sentiment score
- Operations:
- Constructs context-aware prompt
- Uses Google's Generative AI
- Applies brand guidelines
- Returns: Brand-aligned response text
The script prompts for:
- Brand ID
- User text input
It then processes the input through multiple stages to generate brand-aligned content recommendations.