Neron Bot

A private Telegram bot that stores text, voice, and audio messages with vector embeddings for intelligent semantic search and retrieval. Built with Voyage AI for embeddings, OpenAI Whisper for transcription, and PostgreSQL with pgvector for vector similarity search.

Features

Text Message Storage: Automatically creates embeddings for text messages and stores them in the database
Voice & Audio Transcription: Transcribes voice and audio messages using OpenAI Whisper API, then creates and stores embeddings
Semantic Search: Search through all your stored messages using natural language queries with vector similarity
Smart Pagination: Search results are displayed in batches with "Show more" buttons for easy navigation
Full Text Display: Trimmed results can be expanded to show full message content with timestamps
Vector Database: PostgreSQL with pgvector extension for fast similarity search using cosine distance
Connection Pooling: Thread-safe connection pool with automatic retry logic for reliability
User Access Control: Private bot with configurable allowed user list for security

Architecture

bot.py      - Main bot logic and message handlers
db.py       - Database operations (insert, query, schema management)
config.py   - Configuration and environment variable loading

Prerequisites

Python 3.8+
PostgreSQL 12+ with pgvector extension
API Keys:
- Telegram Bot Token (from @BotFather)
- Voyage AI API Key (from voyageai.com)
- OpenAI API Key (from platform.openai.com)

Installation

1. Install PostgreSQL and pgvector

Ubuntu/Debian:

# Install PostgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib

# Install pgvector
sudo apt install postgresql-15-pgvector

macOS (using Homebrew):

brew install postgresql@15
brew install pgvector

Or compile pgvector from source:

git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

2. Setup PostgreSQL Database

# Connect to PostgreSQL
sudo -u postgres psql

# Create user with password
CREATE USER neron_bot WITH PASSWORD 'your_secure_password_here';

# Grant necessary privileges on the default postgres database
GRANT ALL PRIVILEGES ON DATABASE postgres TO neron_bot;

# Connect to postgres database to grant schema privileges
\c postgres

# Grant schema privileges
GRANT ALL ON SCHEMA public TO neron_bot;

# Exit psql
\q

Note: This bot uses the default postgres database. The table is called neron and will be created automatically when you first run the bot.

3. Clone and Setup Project

# Clone or navigate to project directory
cd /path/to/Neron-Bot

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

4. Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit .env with your actual credentials
nano .env  # or use your preferred editor

Fill in the following values in .env:

TELEGRAM_BOT_TOKEN: Your bot token from @BotFather
VOYAGE_API_KEY: Your Voyage AI API key
OPENAI_API_KEY: Your OpenAI API key
DB_PASSWORD: Your PostgreSQL password
Other DB settings as needed

5. Test Database Connection

# Test database setup
python db.py

You should see:

✓ Database setup successful!

6. Test Configuration

# Test configuration
python config.py

You should see:

✓ All required configuration variables are set

Usage

Start the Bot

python bot.py

You should see:

INFO - Starting bot...
INFO - Bot handlers registered

Interact with the Bot

Open Telegram and find your bot
Send /start to see the welcome message
Send any text message - the bot will reply with "✅ Logged"
Send a voice or audio message - the bot will transcribe it and reply with "✅ Logged"
Use /count to see how many messages are stored
Use /search <query> to search through your stored messages

Bot Commands

/start - Show welcome message and available commands
/count - Display total number of stored messages
/search <query> - Search your messages using semantic similarity (e.g., /search meeting notes from last week)

Database Schema

The bot automatically creates this table on first run:

CREATE TABLE neron (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    text TEXT NOT NULL,
    embedding vector(1024) NOT NULL
);

Table name: neron
Database: postgres (default PostgreSQL database)
Vector dimension: 1024 (for voyage-3-large model)
Similarity metric: Cosine distance (<=> operator)

Project Structure

Neron-Bot/
├── bot.py              # Main bot application
├── db.py               # Database operations
├── config.py           # Configuration management
├── requirements.txt    # Python dependencies
├── .env.example        # Environment variables template
├── .env               # Your actual environment variables (not in git)
└── README.md          # This file

How It Works

Text Messages

User sends a text message
Bot receives the message
Creates embedding using Voyage AI (voyage-3-large model, input_type="document")
Stores timestamp, text, and 1024-dimensional embedding vector in PostgreSQL
Replies "✅ Logged"

Voice & Audio Messages

User sends a voice or audio message
Bot downloads the audio file to temporary storage
Transcribes audio using OpenAI Whisper (whisper-1 model)
Creates embedding of the transcribed text using Voyage AI
Stores timestamp, transcribed text, and embedding in PostgreSQL
Deletes temporary audio file from disk
Replies "✅ Logged"

Semantic Search

User sends /search <query> command
Bot creates embedding for the query using Voyage AI (input_type="query")
Performs cosine similarity search against all stored message embeddings
Returns up to 12 most similar results, ranked by similarity score
Displays results in batches of 3 with:
- Trimmed text (max 150 chars for readability)
- Timestamp (YYYY-MM-DD HH:MM format)
- "Full text" button for expanded results (if trimmed)
- "Show more" button for pagination (if more results available)
User can click buttons to view full messages or load additional results

Customization

User Access Control

By default, the bot is restricted to specific users. Edit config.py to configure access:

# Allow only specific user IDs (get your user_id by messaging the bot)
ALLOWED_USERS = [1890816031]  # Replace with your Telegram user ID

# Or allow anyone (NOT RECOMMENDED for private data):
ALLOWED_USERS = []  # Empty list = no restrictions

To find your Telegram user ID, you can:

Use @userinfobot
Check bot logs when you send a message (user ID will be logged)

Change Embedding Model

Edit config.py:

VOYAGE_MODEL = 'voyage-3-large'  # or 'voyage-3', 'voyage-2', etc.
EMBEDDING_DIMENSION = 1024       # Update based on model (voyage-3-large = 1024)

Important: If you change the dimension, you must recreate the database table:

DROP TABLE neron;
-- Restart the bot to recreate with new dimension

Adjust Search Results

Modify search behavior in bot.py:

# Change number of results fetched (line 165)
results = db.query_similar_messages(query_embedding, limit=12)  # Change 12 to desired limit

# Change batch size for pagination (line 178)
message, buttons_data, has_more = format_search_results(results, offset=0)  # Default batch_size=3

# Change text trimming length (line 80)
def trim_text(text: str, max_length: int = 150):  # Change 150 to desired length

Database Connection Pool

Adjust pool size in .env based on your load:

DB_MIN_CONNECTIONS=2   # Minimum idle connections
DB_MAX_CONNECTIONS=10  # Maximum total connections

Troubleshooting

pgvector extension not found

# Make sure pgvector is installed
sudo apt install postgresql-15-pgvector

# Or compile from source (see Installation section)

Database connection errors

Check that PostgreSQL is running: sudo systemctl status postgresql
Verify database credentials in .env
Test connection: psql -U neron_bot -d postgres
Ensure the neron_bot user has proper privileges (see Installation section)

API errors

Verify API keys in .env are correct
Check API rate limits and quotas
Ensure you have credits/access for OpenAI and Voyage AI

Bot not responding

Check bot token is correct
Ensure bot is started: python bot.py
Check logs for error messages

Security Notes

Never commit .env file to version control (already in .gitignore)
Keep API keys secure and rotate them periodically
Configure ALLOWED_USERS in config.py to restrict access to your Telegram user ID only
Use strong database passwords (minimum 16 characters recommended)
Consider setting up firewall rules for PostgreSQL if exposed to network
The bot stores all messages in plain text - ensure your server and database are secure
Unauthorized users receive a rejection message when trying to use the bot

Performance Tips

Connection pooling: Adjust DB_MIN_CONNECTIONS and DB_MAX_CONNECTIONS in .env based on expected load
No index for small datasets: By default, no vector index is created for better accuracy with small datasets (< 1000 messages)
Add IVFFlat index for large datasets: If you have 1000+ messages, uncomment the index creation code in db.py (lines 150-154) for faster searches
Monitor database size: The neron table grows with each message. Consider adding cleanup jobs for old messages if needed
Batch operations: The bot uses connection pooling with retry logic to handle temporary database issues
Search limits: Default search fetches 12 results total, displaying 3 at a time. Adjust if needed for your use case

Running as a System Service

To keep the bot running in the background, create a systemd service:

Create service file:

sudo nano /etc/systemd/system/neron-bot.service

Add the following content (adjust paths as needed):

[Unit]
Description=Neron Telegram Bot
After=network.target postgresql.service

[Service]
Type=simple
User=root
WorkingDirectory=/home/Neron-Bot
Environment="PATH=/home/Neron-Bot/venv/bin"
ExecStart=/home/Neron-Bot/venv/bin/python3 /home/Neron-Bot/bot.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable neron-bot
sudo systemctl start neron-bot

Check status:

sudo systemctl status neron-bot

View logs:

sudo journalctl -u neron-bot -f

Logs

The bot creates a bot.log file in the project directory with detailed logging information. To monitor:

tail -f /home/Neron-Bot/bot.log

License

This project is provided as-is for personal use.

Dependencies

python-telegram-bot (21.0.1): Telegram Bot API wrapper
voyageai (0.2.3): Voyage AI embeddings API client
openai (1.54.3): OpenAI API client (Whisper transcription)
psycopg2-binary (2.9.10): PostgreSQL database adapter
python-dotenv (1.0.1): Environment variable management

Support & Resources

For issues related to:

Telegram Bot API: https://core.telegram.org/bots/api
Voyage AI Embeddings: https://docs.voyageai.com/
OpenAI Whisper: https://platform.openai.com/docs/guides/speech-to-text
pgvector Extension: https://github.com/pgvector/pgvector
PostgreSQL: https://www.postgresql.org/docs/

Technical Details

Embedding Model: voyage-3-large (1024 dimensions)
Transcription Model: whisper-1 (OpenAI)
Vector Similarity: Cosine distance (pgvector <=> operator)
Database: PostgreSQL with pgvector extension
Message Types Supported: Text, Voice (OGG), Audio (MP3, etc.)
Search Algorithm: K-nearest neighbors with cosine similarity

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bot.py		bot.py
config.py		config.py
db.py		db.py
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Neron Bot

Features

Architecture

Prerequisites

Installation

1. Install PostgreSQL and pgvector

2. Setup PostgreSQL Database

3. Clone and Setup Project

4. Configure Environment Variables

5. Test Database Connection

6. Test Configuration

Usage

Start the Bot

Interact with the Bot

Bot Commands

Database Schema

Project Structure

How It Works

Text Messages

Voice & Audio Messages

Semantic Search

Customization

User Access Control

Change Embedding Model

Adjust Search Results

Database Connection Pool

Troubleshooting

pgvector extension not found

Database connection errors

API errors

Bot not responding

Security Notes

Performance Tips

Running as a System Service

Logs

License

Dependencies

Support & Resources

Technical Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages