Feature/test time scaling and audio support#243
Conversation
…tors Add EnsembleStrategy enum with majority_vote, weighted_average, consensus, and confidence_threshold strategies. Includes EnsembleConfig dataclass for configuration and Ensemble class for aggregating multiple LLM samples. Closes lotus-data#200
Add AudioDtype and AudioArray classes for storing audio data in DataFrames. Supports .wav, .mp3, .mp4, .m4a, .flac, .ogg, .webm formats with caching and base64 encoding for LLM processing. Closes lotus-data#196
Add test_ensembling.py with 40+ test cases for all ensemble strategies. Add test_audio_array.py with tests for AudioDtype, AudioArray indexing, methods, MIME types, and pandas integration.
- Remove unused imports (io, os, tempfile, Path) - Sort imports according to PEP 8 - Use 'is' instead of '==' for type comparison - Remove unused exception variable
|
Hi Rakshitha! Thanks for the PR. Audio:
Ensembling:
|
- Add AudioDtype handling to task_instructions.py for multimodal prompts - Update context_formatter and user_message_formatter for audio inputs - Add n_sample, ensemble, temperature params to sem_filter for test-time scaling - Integrate Ensemble class for multi-sample aggregation (PR lotus-data#209 alignment) - Add per-run rollout fields to SemanticFilterOutput for detailed analysis Addresses feedback on PR lotus-data#243
This PR implements audio data support and test-time scaling features for LOTUS, enhancing multimodal processing and accuracy.
Added a section for type of change and checklist to PR description.
|
Hi @harshitgupta412, thanks for the detailed feedback!
2. Ensembling (Aligning with PR #209)
The PR description has been updated to reflect these integration details. Thank you |
harshitgupta412
left a comment
There was a problem hiding this comment.
Please add test, docs and examples on how to use the new functionalities of sem filter
| additional_cot_instructions: str = "", | ||
| n_sample: int = 1, | ||
| ensemble: EnsembleStrategy | None = None, | ||
| temperature: float = 1.0, |
There was a problem hiding this comment.
The temperature is specified directly in the model. I don't think this is needed here
There was a problem hiding this comment.
Done! Removed temperature from sem_filter parameters. Users can configure this in the model settings instead.
There was a problem hiding this comment.
Added:
- examples/ensembling_example.py - 4 usage examples
- tests/test_sem_filter_ensembling.py - Integration tests for RawOutputs, SemanticFilterOutput, and EnsembleConfig
There was a problem hiding this comment.
explanations,raw_outputs, answer should be returned for each run as well as the ensembled answer.
There was a problem hiding this comment.
Addressed! The new SemanticFilterOutput stores all per-run data in _raw_outputs (a RawOutputs dataclass) and outputs contains the final aggregated result.
There was a problem hiding this comment.
It should be available in the dataframe as well. The columns will look like:
|<input cols> | raw_output_1 | explanation_1 | parsed_output_1 | raw_output_2 | expl_2 | parsed_output_2 ... | ensemble_answer (suffix specified by user) |
- Refactor EnsembleConfig: remove n_samples/temperature, add weights/default - Add RawOutputs dataclass for per-run data organization - Update SemanticFilterOutput with backward-compat properties - Update sem_filter to accept Ensemble object, remove temperature param - Add examples/ensembling_example.py with usage demos - Add tests/test_sem_filter_ensembling.py for integration tests Addresses Harshit's feedback on PR lotus-data#243
|
Hi @harshitgupta412 , I've addressed your feedback:
Thank you. |
There was a problem hiding this comment.
It should be available in the dataframe as well. The columns will look like:
|<input cols> | raw_output_1 | explanation_1 | parsed_output_1 | raw_output_2 | expl_2 | parsed_output_2 ... | ensemble_answer (suffix specified by user) |
| @@ -0,0 +1,56 @@ | |||
| ## Purpose | |||
There was a problem hiding this comment.
please remove this file
There was a problem hiding this comment.
Removed in this commit.
|
also please add tests in |
…SCRIPTION.md, add tests - Modified sem_filter.py to expose raw_output_i, explanation_i, parsed_output_i columns for n_sample > 1 - Removed PR_DESCRIPTION.md as requested - Added test_filter_ensembling in lm_tests.py - Added test_filter_operation_audio in multimodality_tests.py
|
The DataFrame now includes per-run columns: Implemented! When using Done! Added:
|
| assert expected_result == list(zip(joined_df["image"], joined_df["element"])) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("model", get_enabled("gpt-4o-audio-preview")) |
There was a problem hiding this comment.
need to enable gpt-4o-audio-preview
There was a problem hiding this comment.
Done. I enabled gpt-4o-audio-preview in multimodality_tests.py and updated the test to use a valid WAV file input. The test test_filter_operation_audio now passes locally!
There was a problem hiding this comment.
@harshitgupta412 , Could you please review the changes ?
|
Hello @harshitgupta412 , Could you please check and review the work ? And also i am happy to work for other issues or any help and contributing to this project, Please let me know. I am open to work. Thank you. |
This PR adds two highly requested features to LOTUS:
Changes:
Feature 1: Test-Time Scaling (lotus/sem_ops/ensembling.py)
Adds ensemble-based test-time scaling strategies for improving semantic operator accuracy:
Feature 2: Audio Data Support (lotus/dtype_extensions/audio.py)
Extends LOTUS to support audio data processing:
Tests
Work done by: Ireddi Rakshitha & Yaswanth Devavarapu