Skip to content

achouhan93/EUDial-LexGuide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LexGuide Framework

This repository contains an implementation of the LexGuide framework, a proactive dialogue system for legal QA with:

  • Retrieval-Augmented Generation (RAG)
  • Hierarchical topic organization (BERTopic)
  • System-driven follow-up generation

Also included are baselines for comparison:

  • RAG-Basic
  • RAG-MMR
  • ConvRAG

Steps

Inputs

Place your uploaded files and point to them when running scripts:

Corpus:

  • ../path/non_pdf_urls.json
  • ../path/pdf_urls.json

EUDial Dataset:

  • ../path/conversations.jsonl
  • ../path/conversations_turns.jsonl

Building the Corpus

python -m 2_lexguide.scripts.build_corpus \
  --non_pdf_urls /path/non_pdf_urls.json \
  --pdf_urls /path/pdf_urls.json \
  --output_dir /path/outputs

Build FAISS once

python -m 2_lexguide.scripts.build_index \
  --output_dir outputs \
  --embed_model nlpaueb/bert-base-uncased-eurlex

Experiments Execution

Depth-2 (EUDial-style) hierarchy + BFS: Hierarchy mode = shallow Agglomerative hierarchy (binary merges) for experiments: Hierarchy mode = agglomerative

  python -m 2_lexguide.scripts.run_lexguide \
    --dialogues_file ./data/eudial_test.jsonl \
    --output_dir ./outputs \
    --provider openai \
    --model gpt-4o-mini \
    --strategy BFS \
    --tau 0.6

  python -m 2_lexguide.scripts.run_lexguide \
    --dialogues_file ./conversations.jsonl \
    --output_dir ./outputs \
    --provider groq \
    --model llama-3.1-8b-instant \
    --strategy BFS

  python -m 2_lexguide.scripts.run_lexguide \
      --dialogues_file ./conversations.jsonl \
      --output_dir ./outputs \
      --provider ollama \
      --model gemma2:2b \
      --strategy BFS

Test with 1-2 dialogues:

python -m 2_lexguide.scripts.run_experiments
--dialogues_file ./test_subset.jsonl
--output_dir ./test_outputs
--strategy BFS
--use_mmr

Full Evaluation

python -m 2_lexguide.scripts.run_experiments \
    --dialogues_file ./conversations.jsonl \
    --output_dir ./outputs \

LexGuide Evaluation Scripts

This collection of scripts evaluation and eval folder provides comprehensive evaluation capabilities for LexGuide experiment results. One can evaluate individual JSONL run files and generate detailed comparison tables and statistical analyses.

Quick Start

For the fastest evaluation of your runs directory:

python quick_eval.py /path/to/your/runs/directory results.csv <path>/conversations_normalized.jsonl

This will:

  • Find all JSONL files in the directory
  • Evaluate each one using your existing eval_pipeline.py
  • Generate a summary table in the console
  • Save detailed results to CSV

Scripts Overview

  1. evaluate_runs.py - Main Evaluation Script

The primary script for evaluating individual JSONL run files with full control.

Usage:

python evaluate_runs.py --runs_dir /path/to/runs --output evaluation_results.csv /path/to/conversations_normalized.jsonl

Key Features:

  • Automatically detects method and model names from filenames
  • Handles expected filename format: runs_{METHOD}__{MODEL}.jsonl
  • Comprehensive error handling and logging
  • Generates both CSV output and console summary
  • Uses optimized balanced metrics computation

Parameters:

  • --runs_dir: Directory containing JSONL run files (required)
  • --output: Output CSV filename (default: evaluation_results.csv)
  • --normalized_path: Path of conversations_normalized.jsonl file
  • --embed_model: Embedding model for evaluation (default: nlpaueb/bert-base-uncased-eurlex)
  • --pattern: File pattern to match (default: *.jsonl)
  • --use_balanced: Use balanced metrics computation (recommended, default: True)
  • --json_output: Also save results as JSON
  • --verbose: Enable verbose logging
  1. batch_evaluator.py - Advanced Analysis

Advanced evaluation with statistical analysis and comparison features.

Usage:

python batch_evaluator.py --runs_dir /path/to/runs --output_dir analysis_results

Key Features:

  • Statistical significance testing between methods
  • Method vs. model performance comparisons
  • Effect size calculations (Cohen's d)
  • Comprehensive summary reports
  • Multiple output formats (CSV, JSON, TXT)

Parameters:

  • --comparison_mode: Choose 'method' or 'model' for primary comparison
  • --metric: Primary metric for comparisons (default: groundedness_percent)
  • --output_dir: Directory for comprehensive results
  1. quick_eval.py - Fast Evaluation

Minimal setup script for quick results.

Usage:

python quick_eval.py /path/to/runs [output.csv] /path/to/conversations.normalized.jsonl

Expected File Structure

The scripts expect your run files to follow this naming convention:

runs_{METHOD}__{MODEL}.jsonl

Examples:

  • runs_RAG_Basic__gpt-4o-mini.jsonl
  • runs_LexGuide__llama-3.1-8b-instant.jsonl
  • runs_ConvRAG__gemma-2b-it.jsonl

Evaluation Metrics

The scripts evaluate the following metrics:

Answer Quality Metrics

  • Completeness (ROUGE-L): Text overlap with gold responses
  • Readability (FRE): Flesch Reading Ease score
  • Groundedness (%): Percentage of generated content grounded in retrieved documents
  • Legal Relevance (BERTScore): Semantic similarity to gold responses

Follow-up Question Metrics

  • Relevance: Semantic relevance to gold follow-ups
  • Diversity: Diversity among generated follow-ups
  • Contextual Relevance: Relevance to conversation context
  • Temporal Consistency: Consistency in conversation flow

Topic Navigation Metrics

  • Topic Coverage (%): Percentage of gold topics covered

Console Summary Formatted table showing key metrics:

📊 EVALUATION RESULTS SUMMARY
===============================================
Method          Model               Completeness  Groundedness  ...
RAG_Basic      gpt-4o-mini         0.245         67.3         ...
LexGuide       gpt-4o-mini         0.289         72.1         ...
...

📈 METHOD AVERAGES:
Method          Completeness  Groundedness  ...
RAG_Basic      0.240         65.2         ...
LexGuide       0.285         70.8         ...

About

The code for construction of EUDial Dialogue dataset and LexGuide framework, a proactive dialogue system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages