🎤 Speakly

AI-Powered Speech Generation and Pronunciation Training Platform

Speakly is a comprehensive full-stack application that leverages cutting-edge AI technologies to help users create engaging speeches and improve their pronunciation through intelligent analysis and feedback.

🌟 Features

📝 AI-Powered Speech Generation

Intelligent Content Creation: Generate speeches using Google Gemini 2.5 Flash AI
Multi-Language Support: Support for 20+ languages including English, Spanish, French, German, Hindi, Japanese, Chinese, and more
Customizable Tone & Style: Choose from formal, casual, persuasive, inspirational, and other tones
Accent Adaptation: Speeches optimized for different regional accents
Automatic Audio Generation: High-quality speech synthesis using Murf AI

🎯 Advanced Pronunciation Training

Azure Speech Services Integration: Professional-grade pronunciation assessment
Real-Time Analysis: Accuracy, fluency, and completeness scoring (0-100 scale)
Word-Level Feedback: Detailed analysis of individual word pronunciation
AI-Generated Coaching: Personalized feedback using Gemini AI
Progress Tracking: Monitor improvement over time with detailed statistics

🔐 Robust Authentication System

Multi-Provider Login: Email/password and Google OAuth 2.0 integration
JWT Security: Secure token-based authentication with refresh tokens
Protected Routes: Role-based access control
Session Management: Automatic token refresh and secure logout

📊 Comprehensive Dashboard

Performance Analytics: Track speech count, practice sessions, and average scores
Language Insights: Favorite languages and usage patterns
Progress Metrics: Improvement rates and learning streaks
Speech History: Complete history with advanced filtering and pagination

🎨 Modern User Experience

Glassmorphic Design: Beautiful dark theme with backdrop blur effects
Responsive Layout: Mobile-first design that works on all devices
Real-Time Feedback: Toast notifications and loading states
Intuitive Navigation: Clean, modern interface with smooth animations

🏗️ Technology Stack

Frontend

Framework: React 19.1 with TypeScript
Build Tool: Vite for fast development and optimized builds
State Management: Zustand for lightweight, scalable state management
Styling: Tailwind CSS 4.x with custom components
UI Components: shadcn/ui component library built on Radix UI primitives
Form Handling: React Hook Form with Zod validation
Routing: React Router DOM v7 with protected routes
Icons: Lucide React for consistent iconography

Backend

Runtime: Node.js with Express.js framework
Database: MongoDB with Mongoose ODM
Authentication: JWT tokens with bcrypt password hashing
File Upload: Multer middleware for audio file handling
Validation: Zod schemas for request/response validation
Security: Helmet, CORS, rate limiting, and input sanitization
Audio Processing: FFmpeg for audio format conversion

AI & Cloud Services

Speech Generation: Google Gemini 2.5 Flash AI
Audio Synthesis: Murf AI for high-quality voice generation
Pronunciation Analysis: Microsoft Azure Speech Services
Cloud Storage: AWS S3 for audio file storage
Authentication: Google OAuth 2.0 integration

🚀 Getting Started

Prerequisites

Node.js: v18.0.0 or higher
MongoDB: Local installation or MongoDB Atlas cluster
Git: For version control

Environment Setup

Backend Configuration

Create a .env file in the backend/ directory:

# Server Configuration
NODE_ENV=development
PORT=8080
FRONTEND_URL=http://localhost:5173

# Database
MONGODB_URI=your_mongodb_connection_string

# JWT Secret (generate a secure random string)
JWT_SECRET=your_secure_jwt_secret_key

# AI Services
GEMINI_API_KEY=your_google_gemini_api_key
MURF_API_KEY=your_murf_api_key
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=your_azure_region

# AWS Configuration (for audio storage)
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_BUCKET_NAME=your_s3_bucket_name

Frontend Configuration

Create a .env file in the frontend/ directory:

# API Configuration
VITE_API_BASE_URL=http://localhost:8080/api
VITE_API_TIMEOUT=180000

# Environment
VITE_NODE_ENV=development

# Google OAuth
VITE_GOOGLE_CLIENT_ID=your_google_oauth_client_id

Installation & Setup

Clone the Repository

git clone https://github.com/meetbatra/speakly.git
cd speakly

Backend Setup
```
cd backend
npm install
npm run dev
```
Frontend Setup (in a new terminal)
```
cd frontend
npm install
npm run dev
```
Access the Application
- Frontend: http://localhost:5173
- Backend API: http://localhost:8080
- Health Check: http://localhost:8080/health

📁 Project Structure

speakly/
├── backend/                    # Node.js Express API
│   ├── config/                 # Configuration files
│   ├── controllers/            # Route controllers
│   ├── middleware/             # Custom middleware
│   ├── models/                 # MongoDB models
│   ├── routes/                 # API route definitions
│   ├── services/               # Business logic services
│   ├── utils/                  # Utility functions
│   │   ├── auth/               # Authentication utilities
│   │   ├── db/                 # Database connection
│   │   ├── services/           # External service integrations
│   │   └── validations/        # Input validation schemas
│   ├── uploads/                # Temporary file uploads
│   ├── temp/                   # Temporary audio processing
│   └── app.js                  # Main application entry point
│
├── frontend/                   # React TypeScript SPA
│   ├── public/                 # Static assets
│   ├── src/
│   │   ├── api/                # API service functions
│   │   ├── components/         # Reusable React components
│   │   │   └── ui/             # UI component library
│   │   ├── config/             # Configuration files
│   │   ├── hooks/              # Custom React hooks
│   │   ├── pages/              # Route page components
│   │   ├── routes/             # Application routing
│   │   ├── stores/             # Zustand state stores
│   │   ├── styles/             # Global styles and themes
│   │   ├── types/              # TypeScript type definitions
│   │   └── main.tsx            # Application entry point
│   ├── components.json         # UI component configuration
│   └── vite.config.ts          # Vite build configuration
│
└── README.md                   # This file

🔌 API Endpoints

Authentication

POST /api/auth/register - User registration
POST /api/auth/login - User login
POST /api/auth/google-login - Google OAuth login
POST /api/auth/refresh - Refresh access token
POST /api/auth/logout - User logout

Speech Management

GET /api/speech/options - Get supported languages/accents
POST /api/speech/generate - Generate new speech with AI
POST /api/speech/modify - Modify existing speech
GET /api/speech - Get user's speeches (paginated)
GET /api/speech/:id - Get specific speech by ID
PUT /api/speech/:id/regenerate-audio - Regenerate speech audio
DELETE /api/speech/:id - Delete speech
GET /api/speech/stats - Get speech statistics

Practice & Analysis

POST /api/practice/:speechId/attempt - Create practice attempt
GET /api/practice/attempt/:attemptId - Get practice results
GET /api/practice/attempt/:attemptId/status - Check analysis status
GET /api/practice/history - Get practice history
GET /api/practice/stats - Get practice statistics
DELETE /api/practice/attempt/:attemptId - Delete practice attempt

User Management

GET /api/user/profile - Get user profile
PUT /api/user/profile - Update user profile
GET /api/user/statistics - Get dashboard statistics

🌍 Language Support

Speech Generation (20+ Languages)

All languages support content generation and audio synthesis:

English, Spanish, French, German, Italian, Portuguese
Dutch, Russian, Japanese, Korean, Chinese (Mandarin)
Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada
Malayalam, Marathi, Punjabi, Malay

Pronunciation Assessment (13 Languages)

Advanced pronunciation analysis available for:

✅ English, Spanish, French, German, Italian
✅ Portuguese, Dutch, Russian, Japanese, Korean
✅ Chinese (Mandarin), Hindi, Tamil

Note: Languages not listed above support speech generation but have "Practice with AI" disabled for pronunciation assessment.

🔧 Key Features Explained

AI-Powered Speech Generation

The platform uses Google Gemini 2.5 Flash to generate contextually appropriate speeches based on:

Topic: User-provided subject matter
Tone: Formal, casual, persuasive, inspirational, educational, etc.
Language: 20+ supported languages with cultural adaptation
Accent: Regional variations and pronunciation preferences
Length: Optimized for different speech durations

Pronunciation Assessment Workflow

Audio Upload: Users record themselves reading the generated speech
Azure Analysis: Microsoft Speech Services performs detailed assessment
AI Feedback: Gemini AI generates personalized coaching feedback
Audio Response: Murf AI creates spoken feedback for the user
Progress Tracking: Results stored for long-term progress monitoring

Security Architecture

Password Security: Bcrypt hashing with salt rounds
Token Management: JWT access tokens (15min) + refresh tokens (7 days)
Rate Limiting: IP-based request throttling
Input Validation: Zod schema validation on all endpoints
CORS Protection: Configured for frontend domain
Helmet Security: Standard web security headers

🛠️ Development

Running Tests

# Backend tests (when implemented)
cd backend
npm test

# Frontend tests (when implemented)
cd frontend
npm test

Building for Production

# Build frontend
cd frontend
npm run build

# Backend is production-ready as-is
# Set NODE_ENV=production in backend/.env

Code Quality

ESLint: Configured for TypeScript and React
TypeScript: Strict mode enabled with comprehensive types
Prettier: Code formatting (configure as needed)
Validation: Runtime validation with Zod schemas

🚀 Deployment

Frontend Deployment

The React app can be deployed to:

Vercel: Zero-config deployment
Netlify: Static site hosting
AWS S3 + CloudFront: Scalable CDN solution
Docker: Containerized deployment

Backend Deployment

The Node.js API supports:

Railway: Simple Node.js hosting
Heroku: Platform-as-a-Service
AWS EC2/ECS: Full control deployment
DigitalOcean: Droplet or App Platform
Docker: Containerized deployment

Database Options

MongoDB Atlas: Managed cloud database (recommended)
Local MongoDB: Self-hosted installation
Docker MongoDB: Containerized database

🤝 Contributing

We welcome contributions! Please follow these steps:

Fork the Repository
Create Feature Branch: git checkout -b feature/amazing-feature
Commit Changes: git commit -m 'Add amazing feature'
Push to Branch: git push origin feature/amazing-feature
Open Pull Request

Development Guidelines

Follow TypeScript best practices
Add proper error handling
Include input validation
Write meaningful commit messages
Test thoroughly before submitting

🙏 Acknowledgments

Google Gemini: AI-powered speech generation
Microsoft Azure: Professional speech analysis
Murf AI: High-quality voice synthesis
MongoDB: Flexible document database
React Team: Amazing frontend framework
Tailwind CSS: Utility-first CSS framework

📞 Support

For support, email meetbatra56@gmail.com or create an issue on GitHub.

Made with ❤️ by Meet Batra

Empowering communication through AI-driven speech technology

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🎤 Speakly

🌟 Features

📝 AI-Powered Speech Generation

🎯 Advanced Pronunciation Training

🔐 Robust Authentication System

📊 Comprehensive Dashboard

🎨 Modern User Experience

🏗️ Technology Stack

Frontend

Backend

AI & Cloud Services

🚀 Getting Started

Prerequisites

Environment Setup

Backend Configuration

Frontend Configuration

Installation & Setup

📁 Project Structure

🔌 API Endpoints

Authentication

Speech Management

Practice & Analysis

User Management

🌍 Language Support

Speech Generation (20+ Languages)

Pronunciation Assessment (13 Languages)

🔧 Key Features Explained

AI-Powered Speech Generation

Pronunciation Assessment Workflow

Security Architecture

🛠️ Development

Running Tests

Building for Production

Code Quality

🚀 Deployment

Frontend Deployment

Backend Deployment

Database Options

🤝 Contributing

Development Guidelines

🙏 Acknowledgments

📞 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages