@@ -119,4 +119,90 @@ amazon/AmazonQAC ["prefixes", "final_search_term"]
119119- ` --max_train_samples ` : Limit training samples for testing (default: None)
120120- ` --max_val_samples ` : Limit validation samples for testing (default: None)
121121- ` --val_ratio ` : Validation split ratio (default: 0.1)
122- - ` --gpus ` : Number of GPUs (default: 0 for CPU)
122+ - ` --gpus ` : Number of GPUs (default: 0 for CPU)
123+
124+ ---
125+
126+ ## Personalized Model (SIN with History)
127+
128+ The personalized model extends the base architecture with:
129+ - ** HistoricalIntentionReformulationEncoder** : Attention-based historical search encoding
130+ - ** SearchIntentEvolutionInferencer** : Captures intent evolution from history to current prefix
131+
132+ ### Training Personalized Model
133+
134+ ``` bash
135+ # With Apple Silicon MPS acceleration
136+ python train_personalized.py \
137+ --dataset_path ./data/amazon_qac_processed_5m \
138+ --batch_size 512 \
139+ --mps
140+
141+ # CPU training
142+ python train_personalized.py \
143+ --dataset_path ./data/amazon_qac_processed_5m \
144+ --batch_size 256 \
145+ --num_workers 6
146+ ```
147+
148+ ### Evaluating Personalized Model
149+
150+ #### Without History (Non-personalized baseline)
151+
152+ ``` bash
153+ python evaluate_personalized.py \
154+ --checkpoint ./lightning_logs/sin_personalized/version_0/checkpoints/personalized-epoch=05-val_loss=0.4321.ckpt \
155+ --prefix " arma" \
156+ --candidates " armadillo,armageddon,armor,armani"
157+ ```
158+
159+ #### With History (Personalized)
160+
161+ The model uses search history to personalize rankings:
162+
163+ ``` bash
164+ # User with movie-related search history
165+ python evaluate_personalized.py \
166+ --prefix " arma" \
167+ --candidates " armadillo,armageddon,armor,armani" \
168+ --history " alien vs predator,avengers,action movies"
169+ ```
170+
171+ Expected: "armageddon" (movie) ranks higher due to movie-related history.
172+
173+ ``` bash
174+ # User with fashion-related search history
175+ python evaluate_personalized.py \
176+ --prefix " arma" \
177+ --candidates " armadillo,armageddon,armor,armani" \
178+ --history " gucci bags,designer clothes,fashion brands"
179+ ```
180+
181+ Expected: "armani" (fashion brand) ranks higher due to fashion-related history.
182+
183+ #### Programmatic Evaluation
184+
185+ ``` python
186+ from evaluate_personalized import score_candidates_personalized, load_model_for_evaluation, build_tokenizer
187+
188+ # Load model and tokenizer
189+ model = load_model_for_evaluation(" path/to/checkpoint.ckpt" )
190+ tokenizer = build_tokenizer()
191+
192+ # Without history
193+ scores = score_candidates_personalized(
194+ model = model,
195+ prefix_text = " arma" ,
196+ candidate_texts = [" armadillo" , " armageddon" , " armor" ],
197+ tokenizer = tokenizer,
198+ )
199+
200+ # With history
201+ scores = score_candidates_personalized(
202+ model = model,
203+ prefix_text = " arma" ,
204+ candidate_texts = [" armadillo" , " armageddon" , " armor" ],
205+ tokenizer = tokenizer,
206+ history = [" alien vs predator" , " avengers" ],
207+ )
208+ ```
0 commit comments