Skip to content

meetbatra/speakly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎤 Speakly

AI-Powered Speech Generation and Pronunciation Training Platform

Speakly is a comprehensive full-stack application that leverages cutting-edge AI technologies to help users create engaging speeches and improve their pronunciation through intelligent analysis and feedback.

Made with React Made with Node.js Made with TypeScript Made with MongoDB

🌟 Features

📝 AI-Powered Speech Generation

  • Intelligent Content Creation: Generate speeches using Google Gemini 2.5 Flash AI
  • Multi-Language Support: Support for 20+ languages including English, Spanish, French, German, Hindi, Japanese, Chinese, and more
  • Customizable Tone & Style: Choose from formal, casual, persuasive, inspirational, and other tones
  • Accent Adaptation: Speeches optimized for different regional accents
  • Automatic Audio Generation: High-quality speech synthesis using Murf AI

🎯 Advanced Pronunciation Training

  • Azure Speech Services Integration: Professional-grade pronunciation assessment
  • Real-Time Analysis: Accuracy, fluency, and completeness scoring (0-100 scale)
  • Word-Level Feedback: Detailed analysis of individual word pronunciation
  • AI-Generated Coaching: Personalized feedback using Gemini AI
  • Progress Tracking: Monitor improvement over time with detailed statistics

🔐 Robust Authentication System

  • Multi-Provider Login: Email/password and Google OAuth 2.0 integration
  • JWT Security: Secure token-based authentication with refresh tokens
  • Protected Routes: Role-based access control
  • Session Management: Automatic token refresh and secure logout

📊 Comprehensive Dashboard

  • Performance Analytics: Track speech count, practice sessions, and average scores
  • Language Insights: Favorite languages and usage patterns
  • Progress Metrics: Improvement rates and learning streaks
  • Speech History: Complete history with advanced filtering and pagination

🎨 Modern User Experience

  • Glassmorphic Design: Beautiful dark theme with backdrop blur effects
  • Responsive Layout: Mobile-first design that works on all devices
  • Real-Time Feedback: Toast notifications and loading states
  • Intuitive Navigation: Clean, modern interface with smooth animations

🏗️ Technology Stack

Frontend

  • Framework: React 19.1 with TypeScript
  • Build Tool: Vite for fast development and optimized builds
  • State Management: Zustand for lightweight, scalable state management
  • Styling: Tailwind CSS 4.x with custom components
  • UI Components: shadcn/ui component library built on Radix UI primitives
  • Form Handling: React Hook Form with Zod validation
  • Routing: React Router DOM v7 with protected routes
  • Icons: Lucide React for consistent iconography

Backend

  • Runtime: Node.js with Express.js framework
  • Database: MongoDB with Mongoose ODM
  • Authentication: JWT tokens with bcrypt password hashing
  • File Upload: Multer middleware for audio file handling
  • Validation: Zod schemas for request/response validation
  • Security: Helmet, CORS, rate limiting, and input sanitization
  • Audio Processing: FFmpeg for audio format conversion

AI & Cloud Services

  • Speech Generation: Google Gemini 2.5 Flash AI
  • Audio Synthesis: Murf AI for high-quality voice generation
  • Pronunciation Analysis: Microsoft Azure Speech Services
  • Cloud Storage: AWS S3 for audio file storage
  • Authentication: Google OAuth 2.0 integration

🚀 Getting Started

Prerequisites

  • Node.js: v18.0.0 or higher
  • MongoDB: Local installation or MongoDB Atlas cluster
  • Git: For version control

Environment Setup

Backend Configuration

Create a .env file in the backend/ directory:

# Server Configuration
NODE_ENV=development
PORT=8080
FRONTEND_URL=http://localhost:5173

# Database
MONGODB_URI=your_mongodb_connection_string

# JWT Secret (generate a secure random string)
JWT_SECRET=your_secure_jwt_secret_key

# AI Services
GEMINI_API_KEY=your_google_gemini_api_key
MURF_API_KEY=your_murf_api_key
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=your_azure_region

# AWS Configuration (for audio storage)
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_BUCKET_NAME=your_s3_bucket_name

Frontend Configuration

Create a .env file in the frontend/ directory:

# API Configuration
VITE_API_BASE_URL=http://localhost:8080/api
VITE_API_TIMEOUT=180000

# Environment
VITE_NODE_ENV=development

# Google OAuth
VITE_GOOGLE_CLIENT_ID=your_google_oauth_client_id

Installation & Setup

  1. Clone the Repository

    git clone https://github.com/meetbatra/speakly.git
    cd speakly
  2. Backend Setup

    cd backend
    npm install
    npm run dev
  3. Frontend Setup (in a new terminal)

    cd frontend
    npm install
    npm run dev
  4. Access the Application

📁 Project Structure

speakly/
├── backend/                    # Node.js Express API
│   ├── config/                 # Configuration files
│   ├── controllers/            # Route controllers
│   ├── middleware/             # Custom middleware
│   ├── models/                 # MongoDB models
│   ├── routes/                 # API route definitions
│   ├── services/               # Business logic services
│   ├── utils/                  # Utility functions
│   │   ├── auth/               # Authentication utilities
│   │   ├── db/                 # Database connection
│   │   ├── services/           # External service integrations
│   │   └── validations/        # Input validation schemas
│   ├── uploads/                # Temporary file uploads
│   ├── temp/                   # Temporary audio processing
│   └── app.js                  # Main application entry point
│
├── frontend/                   # React TypeScript SPA
│   ├── public/                 # Static assets
│   ├── src/
│   │   ├── api/                # API service functions
│   │   ├── components/         # Reusable React components
│   │   │   └── ui/             # UI component library
│   │   ├── config/             # Configuration files
│   │   ├── hooks/              # Custom React hooks
│   │   ├── pages/              # Route page components
│   │   ├── routes/             # Application routing
│   │   ├── stores/             # Zustand state stores
│   │   ├── styles/             # Global styles and themes
│   │   ├── types/              # TypeScript type definitions
│   │   └── main.tsx            # Application entry point
│   ├── components.json         # UI component configuration
│   └── vite.config.ts          # Vite build configuration
│
└── README.md                   # This file

🔌 API Endpoints

Authentication

  • POST /api/auth/register - User registration
  • POST /api/auth/login - User login
  • POST /api/auth/google-login - Google OAuth login
  • POST /api/auth/refresh - Refresh access token
  • POST /api/auth/logout - User logout

Speech Management

  • GET /api/speech/options - Get supported languages/accents
  • POST /api/speech/generate - Generate new speech with AI
  • POST /api/speech/modify - Modify existing speech
  • GET /api/speech - Get user's speeches (paginated)
  • GET /api/speech/:id - Get specific speech by ID
  • PUT /api/speech/:id/regenerate-audio - Regenerate speech audio
  • DELETE /api/speech/:id - Delete speech
  • GET /api/speech/stats - Get speech statistics

Practice & Analysis

  • POST /api/practice/:speechId/attempt - Create practice attempt
  • GET /api/practice/attempt/:attemptId - Get practice results
  • GET /api/practice/attempt/:attemptId/status - Check analysis status
  • GET /api/practice/history - Get practice history
  • GET /api/practice/stats - Get practice statistics
  • DELETE /api/practice/attempt/:attemptId - Delete practice attempt

User Management

  • GET /api/user/profile - Get user profile
  • PUT /api/user/profile - Update user profile
  • GET /api/user/statistics - Get dashboard statistics

🌍 Language Support

Speech Generation (20+ Languages)

All languages support content generation and audio synthesis:

  • English, Spanish, French, German, Italian, Portuguese
  • Dutch, Russian, Japanese, Korean, Chinese (Mandarin)
  • Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada
  • Malayalam, Marathi, Punjabi, Malay

Pronunciation Assessment (13 Languages)

Advanced pronunciation analysis available for:

  • ✅ English, Spanish, French, German, Italian
  • ✅ Portuguese, Dutch, Russian, Japanese, Korean
  • ✅ Chinese (Mandarin), Hindi, Tamil

Note: Languages not listed above support speech generation but have "Practice with AI" disabled for pronunciation assessment.

🔧 Key Features Explained

AI-Powered Speech Generation

The platform uses Google Gemini 2.5 Flash to generate contextually appropriate speeches based on:

  • Topic: User-provided subject matter
  • Tone: Formal, casual, persuasive, inspirational, educational, etc.
  • Language: 20+ supported languages with cultural adaptation
  • Accent: Regional variations and pronunciation preferences
  • Length: Optimized for different speech durations

Pronunciation Assessment Workflow

  1. Audio Upload: Users record themselves reading the generated speech
  2. Azure Analysis: Microsoft Speech Services performs detailed assessment
  3. AI Feedback: Gemini AI generates personalized coaching feedback
  4. Audio Response: Murf AI creates spoken feedback for the user
  5. Progress Tracking: Results stored for long-term progress monitoring

Security Architecture

  • Password Security: Bcrypt hashing with salt rounds
  • Token Management: JWT access tokens (15min) + refresh tokens (7 days)
  • Rate Limiting: IP-based request throttling
  • Input Validation: Zod schema validation on all endpoints
  • CORS Protection: Configured for frontend domain
  • Helmet Security: Standard web security headers

🛠️ Development

Running Tests

# Backend tests (when implemented)
cd backend
npm test

# Frontend tests (when implemented)
cd frontend
npm test

Building for Production

# Build frontend
cd frontend
npm run build

# Backend is production-ready as-is
# Set NODE_ENV=production in backend/.env

Code Quality

  • ESLint: Configured for TypeScript and React
  • TypeScript: Strict mode enabled with comprehensive types
  • Prettier: Code formatting (configure as needed)
  • Validation: Runtime validation with Zod schemas

🚀 Deployment

Frontend Deployment

The React app can be deployed to:

  • Vercel: Zero-config deployment
  • Netlify: Static site hosting
  • AWS S3 + CloudFront: Scalable CDN solution
  • Docker: Containerized deployment

Backend Deployment

The Node.js API supports:

  • Railway: Simple Node.js hosting
  • Heroku: Platform-as-a-Service
  • AWS EC2/ECS: Full control deployment
  • DigitalOcean: Droplet or App Platform
  • Docker: Containerized deployment

Database Options

  • MongoDB Atlas: Managed cloud database (recommended)
  • Local MongoDB: Self-hosted installation
  • Docker MongoDB: Containerized database

🤝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork the Repository
  2. Create Feature Branch: git checkout -b feature/amazing-feature
  3. Commit Changes: git commit -m 'Add amazing feature'
  4. Push to Branch: git push origin feature/amazing-feature
  5. Open Pull Request

Development Guidelines

  • Follow TypeScript best practices
  • Add proper error handling
  • Include input validation
  • Write meaningful commit messages
  • Test thoroughly before submitting

🙏 Acknowledgments

  • Google Gemini: AI-powered speech generation
  • Microsoft Azure: Professional speech analysis
  • Murf AI: High-quality voice synthesis
  • MongoDB: Flexible document database
  • React Team: Amazing frontend framework
  • Tailwind CSS: Utility-first CSS framework

📞 Support

For support, email meetbatra56@gmail.com or create an issue on GitHub.


Made with ❤️ by Meet Batra

Empowering communication through AI-driven speech technology

About

Speakly is an AI-driven platform that helps users create and practice speeches with real-time pronunciation feedback. It uses Gemini AI, Azure Speech Services, and Murf AI to generate content, assess speech, and provide lifelike audio — all through a clean, modern interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors