MCTS performs not good enough (a comparison with CoT, and best-of-N)

Hi,

Thanks for the work. I have a question about the performance of MCTS compared to Best-of-N on MATH500 dataset using Qwen2.5-Math-7B-Instruct model. In my experiments, MCTS could not get higher _majority_vote_ results than best-of-N. I am sharing my configs and the comparative results below. Considering MCTS's more complex structure, I believe that it should achieve higher results than the Best-of-N, which has a very direct way of reasoning. Do you have any suggestions on improving MCTS results?

Thanks.


Table 1. Comparative results of different reasoning techniques.

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=OneNote.File>
<meta name=Generator content="Microsoft OneNote 15">
</head>

<body lang=en-US style='font-family:Calibri;font-size:11.0pt'>


<div style='direction:ltr'>


method | majority_vote
-- | --
CoT | 0.836
best-of-N | 0.876
MCTS | 0.872



</div>


</body>

</html>

Table 2. The parameter setting used in the experiments.

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=OneNote.File>
<meta name=Generator content="Microsoft OneNote 15">
</head>

<body lang=en-US style='font-family:Calibri;font-size:11.0pt'>


<div style='direction:ltr'>


parameter | value
-- | --
temperature | 0.7
num_sequence | 8
max_new_tokens | 2048
num_worker | 32



</div>


</body>

</html>

### **System Info**
Operating System = Linux
Python version = 3.10
Hardware = A40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCTS performs not good enough (a comparison with CoT, and best-of-N) #95

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MCTS performs not good enough (a comparison with CoT, and best-of-N) #95

Description

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions