GitHub - PatchouliTIS/FireredASR-vLLM: Naive implementation of FireredASR in vLLM.

Easy, fast, and cheap LLM serving for everyone

Summary

Based on vLLM-0.10.0

The current repo is a specialized adaptation tailored to the original FireredASR-LLM model architecture and input parameters, containing extensive hard-coded elements. Significant work remains to be done before it can be merged into the main vLLM branch:

Modify the FireredASR-LLM model files to match the standard loading procedure in vLLM
Modify the input format to support raw features data
Remove the separate fireredasr directory in vllm/model_executor/models

Getting Started

Run tools/merge_lora_weights.py under the directory of FireRedASR-LLM-L to get the complete Qwen2-7B LLM model with LoRA weights.
Run tools/save_tokenizer.py to get the specific tokenizer of Qwen2-7B model.
Set the soft link of Qwen2-7B-Instruct under the directory of FireRedASR-LLM-L to Qwen2-7B-Instruct-LoRA.
Copy the file tools/fireredasr_config_template.json to the directory of FireRedASR-LLM-L as FireRedASR-LLM-L/config.json.
Install vLLM from source:

Visit offical documentation to learn more.

Recommended environment:
- flash-attn==2.8.3
- torch==2.7.1

Simple Example

See files examples/fireredasr_vllm_example.py

Sampling Parameters

Parameter	Default	Description
`max_tokens`	min(2048,len(audio))	Maximum number of tokens to generate(should be adjusted to the actual length of audio file)
`min_tokens`	0	Minimum number of tokens to generate
`temperature`	0.1	Sampling temperature
`top_p`	1.0	Top-p (nucleus) sampling
`repetition_penalty`	1.05	Penalty for repeating tokens

Name		Name	Last commit message	Last commit date
Latest commit History 8,688 Commits
.buildkite		.buildkite
.gemini		.gemini
.github		.github
benchmarks		benchmarks
cmake		cmake
csrc		csrc
docker		docker
docs		docs
examples		examples
requirements		requirements
scripts		scripts
tests		tests
tools		tools
vllm		vllm
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.shellcheckrc		.shellcheckrc
.yapfignore		.yapfignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DCO		DCO
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
find_cuda_init.py		find_cuda_init.py
format.sh		format.sh
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py
use_existing_torch.py		use_existing_torch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Easy, fast, and cheap LLM serving for everyone

Summary

Getting Started

Simple Example

Sampling Parameters

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Easy, fast, and cheap LLM serving for everyone

Summary

Getting Started

Simple Example

Sampling Parameters

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages