A private Telegram bot that stores text, voice, and audio messages with vector embeddings for intelligent semantic search and retrieval. Built with Voyage AI for embeddings, OpenAI Whisper for transcription, and PostgreSQL with pgvector for vector similarity search.
- Text Message Storage: Automatically creates embeddings for text messages and stores them in the database
- Voice & Audio Transcription: Transcribes voice and audio messages using OpenAI Whisper API, then creates and stores embeddings
- Semantic Search: Search through all your stored messages using natural language queries with vector similarity
- Smart Pagination: Search results are displayed in batches with "Show more" buttons for easy navigation
- Full Text Display: Trimmed results can be expanded to show full message content with timestamps
- Vector Database: PostgreSQL with pgvector extension for fast similarity search using cosine distance
- Connection Pooling: Thread-safe connection pool with automatic retry logic for reliability
- User Access Control: Private bot with configurable allowed user list for security
bot.py - Main bot logic and message handlers
db.py - Database operations (insert, query, schema management)
config.py - Configuration and environment variable loading
- Python 3.8+
- PostgreSQL 12+ with pgvector extension
- API Keys:
- Telegram Bot Token (from @BotFather)
- Voyage AI API Key (from voyageai.com)
- OpenAI API Key (from platform.openai.com)
Ubuntu/Debian:
# Install PostgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib
# Install pgvector
sudo apt install postgresql-15-pgvectormacOS (using Homebrew):
brew install postgresql@15
brew install pgvectorOr compile pgvector from source:
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install# Connect to PostgreSQL
sudo -u postgres psql
# Create user with password
CREATE USER neron_bot WITH PASSWORD 'your_secure_password_here';
# Grant necessary privileges on the default postgres database
GRANT ALL PRIVILEGES ON DATABASE postgres TO neron_bot;
# Connect to postgres database to grant schema privileges
\c postgres
# Grant schema privileges
GRANT ALL ON SCHEMA public TO neron_bot;
# Exit psql
\qNote: This bot uses the default postgres database. The table is called neron and will be created automatically when you first run the bot.
# Clone or navigate to project directory
cd /path/to/Neron-Bot
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy the example environment file
cp .env.example .env
# Edit .env with your actual credentials
nano .env # or use your preferred editorFill in the following values in .env:
TELEGRAM_BOT_TOKEN: Your bot token from @BotFatherVOYAGE_API_KEY: Your Voyage AI API keyOPENAI_API_KEY: Your OpenAI API keyDB_PASSWORD: Your PostgreSQL password- Other DB settings as needed
# Test database setup
python db.pyYou should see:
✓ Database setup successful!
# Test configuration
python config.pyYou should see:
✓ All required configuration variables are set
python bot.pyYou should see:
INFO - Starting bot...
INFO - Bot handlers registered
- Open Telegram and find your bot
- Send
/startto see the welcome message - Send any text message - the bot will reply with "✅ Logged"
- Send a voice or audio message - the bot will transcribe it and reply with "✅ Logged"
- Use
/countto see how many messages are stored - Use
/search <query>to search through your stored messages
/start- Show welcome message and available commands/count- Display total number of stored messages/search <query>- Search your messages using semantic similarity (e.g.,/search meeting notes from last week)
The bot automatically creates this table on first run:
CREATE TABLE neron (
id SERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
text TEXT NOT NULL,
embedding vector(1024) NOT NULL
);- Table name:
neron - Database:
postgres(default PostgreSQL database) - Vector dimension: 1024 (for voyage-3-large model)
- Similarity metric: Cosine distance (
<=>operator)
Neron-Bot/
├── bot.py # Main bot application
├── db.py # Database operations
├── config.py # Configuration management
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── .env # Your actual environment variables (not in git)
└── README.md # This file
- User sends a text message
- Bot receives the message
- Creates embedding using Voyage AI (
voyage-3-largemodel,input_type="document") - Stores timestamp, text, and 1024-dimensional embedding vector in PostgreSQL
- Replies "✅ Logged"
- User sends a voice or audio message
- Bot downloads the audio file to temporary storage
- Transcribes audio using OpenAI Whisper (
whisper-1model) - Creates embedding of the transcribed text using Voyage AI
- Stores timestamp, transcribed text, and embedding in PostgreSQL
- Deletes temporary audio file from disk
- Replies "✅ Logged"
- User sends
/search <query>command - Bot creates embedding for the query using Voyage AI (
input_type="query") - Performs cosine similarity search against all stored message embeddings
- Returns up to 12 most similar results, ranked by similarity score
- Displays results in batches of 3 with:
- Trimmed text (max 150 chars for readability)
- Timestamp (YYYY-MM-DD HH:MM format)
- "Full text" button for expanded results (if trimmed)
- "Show more" button for pagination (if more results available)
- User can click buttons to view full messages or load additional results
By default, the bot is restricted to specific users. Edit config.py to configure access:
# Allow only specific user IDs (get your user_id by messaging the bot)
ALLOWED_USERS = [1890816031] # Replace with your Telegram user ID
# Or allow anyone (NOT RECOMMENDED for private data):
ALLOWED_USERS = [] # Empty list = no restrictionsTo find your Telegram user ID, you can:
- Use @userinfobot
- Check bot logs when you send a message (user ID will be logged)
Edit config.py:
VOYAGE_MODEL = 'voyage-3-large' # or 'voyage-3', 'voyage-2', etc.
EMBEDDING_DIMENSION = 1024 # Update based on model (voyage-3-large = 1024)Important: If you change the dimension, you must recreate the database table:
DROP TABLE neron;
-- Restart the bot to recreate with new dimensionModify search behavior in bot.py:
# Change number of results fetched (line 165)
results = db.query_similar_messages(query_embedding, limit=12) # Change 12 to desired limit
# Change batch size for pagination (line 178)
message, buttons_data, has_more = format_search_results(results, offset=0) # Default batch_size=3
# Change text trimming length (line 80)
def trim_text(text: str, max_length: int = 150): # Change 150 to desired lengthAdjust pool size in .env based on your load:
DB_MIN_CONNECTIONS=2 # Minimum idle connections
DB_MAX_CONNECTIONS=10 # Maximum total connections# Make sure pgvector is installed
sudo apt install postgresql-15-pgvector
# Or compile from source (see Installation section)- Check that PostgreSQL is running:
sudo systemctl status postgresql - Verify database credentials in
.env - Test connection:
psql -U neron_bot -d postgres - Ensure the
neron_botuser has proper privileges (see Installation section)
- Verify API keys in
.envare correct - Check API rate limits and quotas
- Ensure you have credits/access for OpenAI and Voyage AI
- Check bot token is correct
- Ensure bot is started:
python bot.py - Check logs for error messages
- Never commit
.envfile to version control (already in.gitignore) - Keep API keys secure and rotate them periodically
- Configure
ALLOWED_USERSinconfig.pyto restrict access to your Telegram user ID only - Use strong database passwords (minimum 16 characters recommended)
- Consider setting up firewall rules for PostgreSQL if exposed to network
- The bot stores all messages in plain text - ensure your server and database are secure
- Unauthorized users receive a rejection message when trying to use the bot
- Connection pooling: Adjust
DB_MIN_CONNECTIONSandDB_MAX_CONNECTIONSin.envbased on expected load - No index for small datasets: By default, no vector index is created for better accuracy with small datasets (< 1000 messages)
- Add IVFFlat index for large datasets: If you have 1000+ messages, uncomment the index creation code in
db.py(lines 150-154) for faster searches - Monitor database size: The
nerontable grows with each message. Consider adding cleanup jobs for old messages if needed - Batch operations: The bot uses connection pooling with retry logic to handle temporary database issues
- Search limits: Default search fetches 12 results total, displaying 3 at a time. Adjust if needed for your use case
To keep the bot running in the background, create a systemd service:
- Create service file:
sudo nano /etc/systemd/system/neron-bot.service- Add the following content (adjust paths as needed):
[Unit]
Description=Neron Telegram Bot
After=network.target postgresql.service
[Service]
Type=simple
User=root
WorkingDirectory=/home/Neron-Bot
Environment="PATH=/home/Neron-Bot/venv/bin"
ExecStart=/home/Neron-Bot/venv/bin/python3 /home/Neron-Bot/bot.py
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target- Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable neron-bot
sudo systemctl start neron-bot- Check status:
sudo systemctl status neron-bot- View logs:
sudo journalctl -u neron-bot -fThe bot creates a bot.log file in the project directory with detailed logging information. To monitor:
tail -f /home/Neron-Bot/bot.logThis project is provided as-is for personal use.
- python-telegram-bot (21.0.1): Telegram Bot API wrapper
- voyageai (0.2.3): Voyage AI embeddings API client
- openai (1.54.3): OpenAI API client (Whisper transcription)
- psycopg2-binary (2.9.10): PostgreSQL database adapter
- python-dotenv (1.0.1): Environment variable management
For issues related to:
- Telegram Bot API: https://core.telegram.org/bots/api
- Voyage AI Embeddings: https://docs.voyageai.com/
- OpenAI Whisper: https://platform.openai.com/docs/guides/speech-to-text
- pgvector Extension: https://github.com/pgvector/pgvector
- PostgreSQL: https://www.postgresql.org/docs/
- Embedding Model: voyage-3-large (1024 dimensions)
- Transcription Model: whisper-1 (OpenAI)
- Vector Similarity: Cosine distance (pgvector
<=>operator) - Database: PostgreSQL with pgvector extension
- Message Types Supported: Text, Voice (OGG), Audio (MP3, etc.)
- Search Algorithm: K-nearest neighbors with cosine similarity