AI-Powered Speech Generation and Pronunciation Training Platform
Speakly is a comprehensive full-stack application that leverages cutting-edge AI technologies to help users create engaging speeches and improve their pronunciation through intelligent analysis and feedback.
- Intelligent Content Creation: Generate speeches using Google Gemini 2.5 Flash AI
- Multi-Language Support: Support for 20+ languages including English, Spanish, French, German, Hindi, Japanese, Chinese, and more
- Customizable Tone & Style: Choose from formal, casual, persuasive, inspirational, and other tones
- Accent Adaptation: Speeches optimized for different regional accents
- Automatic Audio Generation: High-quality speech synthesis using Murf AI
- Azure Speech Services Integration: Professional-grade pronunciation assessment
- Real-Time Analysis: Accuracy, fluency, and completeness scoring (0-100 scale)
- Word-Level Feedback: Detailed analysis of individual word pronunciation
- AI-Generated Coaching: Personalized feedback using Gemini AI
- Progress Tracking: Monitor improvement over time with detailed statistics
- Multi-Provider Login: Email/password and Google OAuth 2.0 integration
- JWT Security: Secure token-based authentication with refresh tokens
- Protected Routes: Role-based access control
- Session Management: Automatic token refresh and secure logout
- Performance Analytics: Track speech count, practice sessions, and average scores
- Language Insights: Favorite languages and usage patterns
- Progress Metrics: Improvement rates and learning streaks
- Speech History: Complete history with advanced filtering and pagination
- Glassmorphic Design: Beautiful dark theme with backdrop blur effects
- Responsive Layout: Mobile-first design that works on all devices
- Real-Time Feedback: Toast notifications and loading states
- Intuitive Navigation: Clean, modern interface with smooth animations
- Framework: React 19.1 with TypeScript
- Build Tool: Vite for fast development and optimized builds
- State Management: Zustand for lightweight, scalable state management
- Styling: Tailwind CSS 4.x with custom components
- UI Components: shadcn/ui component library built on Radix UI primitives
- Form Handling: React Hook Form with Zod validation
- Routing: React Router DOM v7 with protected routes
- Icons: Lucide React for consistent iconography
- Runtime: Node.js with Express.js framework
- Database: MongoDB with Mongoose ODM
- Authentication: JWT tokens with bcrypt password hashing
- File Upload: Multer middleware for audio file handling
- Validation: Zod schemas for request/response validation
- Security: Helmet, CORS, rate limiting, and input sanitization
- Audio Processing: FFmpeg for audio format conversion
- Speech Generation: Google Gemini 2.5 Flash AI
- Audio Synthesis: Murf AI for high-quality voice generation
- Pronunciation Analysis: Microsoft Azure Speech Services
- Cloud Storage: AWS S3 for audio file storage
- Authentication: Google OAuth 2.0 integration
- Node.js: v18.0.0 or higher
- MongoDB: Local installation or MongoDB Atlas cluster
- Git: For version control
Create a .env file in the backend/ directory:
# Server Configuration
NODE_ENV=development
PORT=8080
FRONTEND_URL=http://localhost:5173
# Database
MONGODB_URI=your_mongodb_connection_string
# JWT Secret (generate a secure random string)
JWT_SECRET=your_secure_jwt_secret_key
# AI Services
GEMINI_API_KEY=your_google_gemini_api_key
MURF_API_KEY=your_murf_api_key
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=your_azure_region
# AWS Configuration (for audio storage)
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_BUCKET_NAME=your_s3_bucket_nameCreate a .env file in the frontend/ directory:
# API Configuration
VITE_API_BASE_URL=http://localhost:8080/api
VITE_API_TIMEOUT=180000
# Environment
VITE_NODE_ENV=development
# Google OAuth
VITE_GOOGLE_CLIENT_ID=your_google_oauth_client_id-
Clone the Repository
git clone https://github.com/meetbatra/speakly.git cd speakly -
Backend Setup
cd backend npm install npm run dev -
Frontend Setup (in a new terminal)
cd frontend npm install npm run dev -
Access the Application
- Frontend: http://localhost:5173
- Backend API: http://localhost:8080
- Health Check: http://localhost:8080/health
speakly/
├── backend/ # Node.js Express API
│ ├── config/ # Configuration files
│ ├── controllers/ # Route controllers
│ ├── middleware/ # Custom middleware
│ ├── models/ # MongoDB models
│ ├── routes/ # API route definitions
│ ├── services/ # Business logic services
│ ├── utils/ # Utility functions
│ │ ├── auth/ # Authentication utilities
│ │ ├── db/ # Database connection
│ │ ├── services/ # External service integrations
│ │ └── validations/ # Input validation schemas
│ ├── uploads/ # Temporary file uploads
│ ├── temp/ # Temporary audio processing
│ └── app.js # Main application entry point
│
├── frontend/ # React TypeScript SPA
│ ├── public/ # Static assets
│ ├── src/
│ │ ├── api/ # API service functions
│ │ ├── components/ # Reusable React components
│ │ │ └── ui/ # UI component library
│ │ ├── config/ # Configuration files
│ │ ├── hooks/ # Custom React hooks
│ │ ├── pages/ # Route page components
│ │ ├── routes/ # Application routing
│ │ ├── stores/ # Zustand state stores
│ │ ├── styles/ # Global styles and themes
│ │ ├── types/ # TypeScript type definitions
│ │ └── main.tsx # Application entry point
│ ├── components.json # UI component configuration
│ └── vite.config.ts # Vite build configuration
│
└── README.md # This file
POST /api/auth/register- User registrationPOST /api/auth/login- User loginPOST /api/auth/google-login- Google OAuth loginPOST /api/auth/refresh- Refresh access tokenPOST /api/auth/logout- User logout
GET /api/speech/options- Get supported languages/accentsPOST /api/speech/generate- Generate new speech with AIPOST /api/speech/modify- Modify existing speechGET /api/speech- Get user's speeches (paginated)GET /api/speech/:id- Get specific speech by IDPUT /api/speech/:id/regenerate-audio- Regenerate speech audioDELETE /api/speech/:id- Delete speechGET /api/speech/stats- Get speech statistics
POST /api/practice/:speechId/attempt- Create practice attemptGET /api/practice/attempt/:attemptId- Get practice resultsGET /api/practice/attempt/:attemptId/status- Check analysis statusGET /api/practice/history- Get practice historyGET /api/practice/stats- Get practice statisticsDELETE /api/practice/attempt/:attemptId- Delete practice attempt
GET /api/user/profile- Get user profilePUT /api/user/profile- Update user profileGET /api/user/statistics- Get dashboard statistics
All languages support content generation and audio synthesis:
- English, Spanish, French, German, Italian, Portuguese
- Dutch, Russian, Japanese, Korean, Chinese (Mandarin)
- Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada
- Malayalam, Marathi, Punjabi, Malay
Advanced pronunciation analysis available for:
- ✅ English, Spanish, French, German, Italian
- ✅ Portuguese, Dutch, Russian, Japanese, Korean
- ✅ Chinese (Mandarin), Hindi, Tamil
Note: Languages not listed above support speech generation but have "Practice with AI" disabled for pronunciation assessment.
The platform uses Google Gemini 2.5 Flash to generate contextually appropriate speeches based on:
- Topic: User-provided subject matter
- Tone: Formal, casual, persuasive, inspirational, educational, etc.
- Language: 20+ supported languages with cultural adaptation
- Accent: Regional variations and pronunciation preferences
- Length: Optimized for different speech durations
- Audio Upload: Users record themselves reading the generated speech
- Azure Analysis: Microsoft Speech Services performs detailed assessment
- AI Feedback: Gemini AI generates personalized coaching feedback
- Audio Response: Murf AI creates spoken feedback for the user
- Progress Tracking: Results stored for long-term progress monitoring
- Password Security: Bcrypt hashing with salt rounds
- Token Management: JWT access tokens (15min) + refresh tokens (7 days)
- Rate Limiting: IP-based request throttling
- Input Validation: Zod schema validation on all endpoints
- CORS Protection: Configured for frontend domain
- Helmet Security: Standard web security headers
# Backend tests (when implemented)
cd backend
npm test
# Frontend tests (when implemented)
cd frontend
npm test# Build frontend
cd frontend
npm run build
# Backend is production-ready as-is
# Set NODE_ENV=production in backend/.env- ESLint: Configured for TypeScript and React
- TypeScript: Strict mode enabled with comprehensive types
- Prettier: Code formatting (configure as needed)
- Validation: Runtime validation with Zod schemas
The React app can be deployed to:
- Vercel: Zero-config deployment
- Netlify: Static site hosting
- AWS S3 + CloudFront: Scalable CDN solution
- Docker: Containerized deployment
The Node.js API supports:
- Railway: Simple Node.js hosting
- Heroku: Platform-as-a-Service
- AWS EC2/ECS: Full control deployment
- DigitalOcean: Droplet or App Platform
- Docker: Containerized deployment
- MongoDB Atlas: Managed cloud database (recommended)
- Local MongoDB: Self-hosted installation
- Docker MongoDB: Containerized database
We welcome contributions! Please follow these steps:
- Fork the Repository
- Create Feature Branch:
git checkout -b feature/amazing-feature - Commit Changes:
git commit -m 'Add amazing feature' - Push to Branch:
git push origin feature/amazing-feature - Open Pull Request
- Follow TypeScript best practices
- Add proper error handling
- Include input validation
- Write meaningful commit messages
- Test thoroughly before submitting
- Google Gemini: AI-powered speech generation
- Microsoft Azure: Professional speech analysis
- Murf AI: High-quality voice synthesis
- MongoDB: Flexible document database
- React Team: Amazing frontend framework
- Tailwind CSS: Utility-first CSS framework
For support, email meetbatra56@gmail.com or create an issue on GitHub.
Made with ❤️ by Meet Batra
Empowering communication through AI-driven speech technology