A Django web application that provides AI-powered image processing capabilities including image captioning and conversational AI with vision. Built for Computer Engineering Bachelor's degree project using Rhino Light API (vLLM-based).
Rename .env.example to .env and provide model server variables.
Run with Docker:
docker-compose up -dAccess at: http://localhost:8000/api/ui
- Image Captioning: Upload an image and get an AI-generated description using Rhino vision model
- Conversational AI with Vision: Chat with images using natural language with full conversation history support
- Chat Continuity: Maintain conversation context across multiple messages - AI remembers previous exchanges
- Session Management: Start new conversations or continue existing ones with persistent chat history
- Web Interface: Modern, interactive UI with real-time chat status and conversation controls
- API Endpoints: RESTful API for programmatic access with session-based conversation management
- OS: Linux (Ubuntu 24.04 recommended)
- Python: 3.12+
- RAM: 16GB+ recommended
- Storage: 10GB+ free space
- CPU: Intel Core i3+ (AVX512 support preferred)
git clone <https://github.com/zahrashabani2000/img2caption.git>
cd img2captioncp .env.example .env
# Edit .env with your configuration# Build the Docker image
docker-compose build
# Start the application
docker-compose up -d
# View logs
docker-compose logs -f- Frontend:
http://localhost:8000/api/ui - API:
http://localhost:8000/api/chat
git clone <https://github.com/zahrashabani2000/img2caption.git>
cd img2captionpython3 -m venv .venv
source .venv/bin/activatepip install --upgrade pip
pip install -r requirements.txtpython manage.py migratepython manage.py runserver 0.0.0.0:8000Open your browser and go to: http://localhost:8000/api/ui
-
Image Captioning:
- Click on "Image Caption" tab
- Upload an image file
- Add an optional prompt or leave blank for default description
- Click "Generate Caption"
- View the AI-generated description
-
Conversational Chat with Images:
- Click on "Chat" tab
- Upload an image and/or enter a message to start a conversation
- Continue the conversation: Send follow-up messages - the AI remembers the entire conversation history
- Conversation status: See real-time indicators showing if you're continuing a conversation
- Start fresh: Click the "New Chat" button to clear conversation history and begin a new discussion
- Ask questions, request analysis, or discuss multiple aspects of the image - context is maintained across all messages
curl -X POST -F image=@your_image.jpg -F prompt="Describe this image in detail" http://localhost:8000/api/chatResponse:
{
"description": "A beautiful landscape with mountains and trees under a clear blue sky",
"prompt": "Describe this image in detail",
"image": "data:image/jpeg;base64,...",
"source": "rhino-light"
}curl -X POST -F image=@your_image.jpg -F message="What colors do you see in this image?" http://localhost:8000/api/chatOr chat without image:
curl -X POST -H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}' \
http://localhost:8000/api/chatResponse:
{
"reply": "I can see vibrant greens, blues, and earthy browns in this natural landscape scene.",
"message": "What colors do you see in this image?",
"image": "data:image/jpeg;base64,...",
"source": "rhino-light",
"conversation_continued": false
}curl -X POST http://localhost:8000/api/new_chatResponse:
{
"message": "New chat started"
}This endpoint clears the conversation history and allows you to start a fresh conversation.
The application maintains conversation context using Django's session framework:
- Session Storage: Conversation history is stored server-side and linked to each user's browser session
- Message Persistence: All user messages and AI responses are saved and sent to the AI for context
- Context Awareness: The AI receives full conversation history with each request, enabling coherent multi-turn conversations
- Session Isolation: Each user has their own separate conversation history
- Memory Management: Conversations persist across page refreshes but can be cleared with the "New Chat" button
Technical Details:
- Messages are stored in OpenAI-compatible format with
roleandcontentfields - First message starts a new conversation; subsequent messages include full history
- System messages are automatically added to inform the AI about conversation continuity
- Vision and Chat (Rhino Light API):
- Primary: Rhino vision model via vLLM OpenAI-compatible API
- Supports both image captioning and conversational AI with vision
- Hosted at:
https://rhino-light-api.ssl.qom.ac.ir
Image_processing_uni_project/
├── caption/ # Django app
│ ├── views.py # API endpoints (chat, new_chat)
│ ├── urls.py # URL routing with session management
│ └── ...
├── visionapp/ # Django project settings
│ ├── settings.py # Session configuration
│ ├── urls.py # Main URL configuration
│ └── ...
├── templates/ # HTML templates
│ └── caption/
│ └── ui.html # Interactive chat interface with conversation controls
├── requirements.txt # Python dependencies
├── manage.py # Django management
└── README.md # This file
- Session Management: Django sessions store conversation history per user
- Message History: Persistent storage of user/AI message pairs
- Context Preservation: Full conversation context sent to AI with each request
- UI State Management: Real-time conversation status indicators
- New Chat Handler: Session cleanup and conversation reset functionality
Create a .env file in the project root and configure the Rhino Light API settings:
# Rhino Light API Configuration
RHINO_LIGHT_BASE_URL=https://rhino-light-api.ssl.qom.ac.ir
RHINO_LIGHT_TOKEN=default
RHINO_LIGHT_MODEL=rhino
# Image processing settings
RHINO_MAX_IMAGE_SIDE=1024
RHINO_JPEG_QUALITY=80RHINO_LIGHT_BASE_URL: URL of the Rhino Light API server (default:https://rhino-light-api.ssl.qom.ac.ir)RHINO_LIGHT_TOKEN: API authentication token (default:default)RHINO_LIGHT_MODEL: Model name to use (default:rhino)RHINO_MAX_IMAGE_SIDE: Maximum image dimension for processing (default:1024)RHINO_JPEG_QUALITY: JPEG compression quality for image uploads (default:80)
The application uses the Rhino Light API, which is a vLLM-based vision model service hosted at Qom University. This service provides:
- Vision-language understanding
- Image captioning capabilities
- Conversational AI with image context
- OpenAI-compatible API interface
The API is pre-configured and ready to use. No additional server setup is required.
-
"Connection refused" or API errors:
- Check if the Rhino Light API is accessible
- Verify your internet connection
- Ensure RHINO_LIGHT_BASE_URL is correct
-
Image upload issues:
- Ensure images are under the maximum size limit (RHINO_MAX_IMAGE_SIDE)
- Supported formats: JPEG, PNG (automatically converted)
- First run will be slower due to model downloads
- Keep the server running to avoid reloading models
- Use smaller images for faster processing
- Consider running on a machine with more RAM for better performance
- Docker Engine 20.10+
- Docker Compose 2.0+
# Build and start the application
docker-compose up --build -d
# View logs
docker-compose logs -f
# Stop the application
docker-compose down
# Rebuild after code changes
docker-compose up --build
# Access container shell
docker-compose exec web bash
# Run Django commands
docker-compose exec web python manage.py migrate
docker-compose exec web python manage.py collectstaticThe application uses the following Docker configuration:
- Base Image: Python 3.11-slim
- Port: 8000 (mapped to host port 8000)
- Volume Mounts: Database and static files for persistence
- Environment: Supports .env file for configuration
- Health Check: Monitors application availability
For production deployment, consider:
- Environment Variables: Set
DEBUG=Falseand configure properSECRET_KEY - Static Files: Use a reverse proxy (nginx) to serve static files
- Database: Use PostgreSQL or MySQL instead of SQLite
- Security: Configure proper
ALLOWED_HOSTSand use HTTPS
# Check container status
docker-compose ps
# View detailed logs
docker-compose logs web
# Restart the service
docker-compose restart web
# Clean up (removes containers, networks, volumes)
docker-compose down -v
# Rebuild from scratch
docker-compose down -v
docker-compose build --no-cache
docker-compose up -d- New API endpoints: Add to
caption/views.pyandcaption/urls.py - UI changes: Modify
templates/caption/ui.html - New models: Add loading functions in
caption/views.py - Chat enhancements: Extend conversation management in session handling
- Session features: Add conversation export, save/load functionality, or multi-user chat rooms
# Run Django tests
python manage.py test
# Test chat API endpoints
curl -X POST -F image=@test_image.jpg -F message="What do you see?" http://localhost:8000/api/chat
# Test conversation continuity (send follow-up messages)
curl -X POST -H "Content-Type: application/json" \
-d '{"message": "Can you describe the colors in more detail?"}' \
-H "Cookie: sessionid=YOUR_SESSION_ID" \
http://localhost:8000/api/chat
# Test new chat functionality
curl -X POST http://localhost:8000/api/new_chat
# Test with Docker
docker-compose exec web python manage.py testChat Testing Tips:
- Use browser developer tools to inspect session cookies
- Test conversation persistence across multiple requests
- Verify
conversation_continuedfield in API responses - Test "New Chat" button clears conversation history
This project is created for educational purposes as part of a Computer Engineering Bachelor's degree program.
This is a university project. For questions or issues, please contact the project author.
- Django framework and session management
- Rhino Light API (Qom University)
- vLLM platform for vision-language models
- OpenAI API specification for chat completions
- Conversation history and context management implementation