Question about web benchmark implementation and tool integration in AtomMem

Hi, thanks for the great work!

I’ve been reading both the paper and the codebase carefully, and I have a question regarding the **web benchmark part (Asearcher / GAIA / WebWalkerQA)**.

From the paper, it seems that:

* AtomMem is evaluated on multi-turn web tasks
* The agent is equipped with tools like **search engine** and **URL reader**
* The environment allows up to 40 tool calls per task

However, in the current codebase, I can only find implementations related to **long-context QA (e.g., document chunking, memory + retrieval pipeline)**. I wasn’t able to locate:

1. The implementation of the **web environment wrapper**
2. The definition of **external tools** (e.g., search / URL reader)
3. How different web datasets (Asearcher / GAIA / WebWalkerQA) are **unified into a common interaction protocol**
4. The rollout loop for multi-turn **tool-augmented interaction**

So I’m wondering:

* Is the web benchmark part **not included in the current release**?
* Or is it implemented in another repository / branch?
* If it exists, could you point me to the relevant modules?

Additionally, I’m particularly interested in how you handle:

* Tool abstraction (e.g., unified XML action → actual API call)
* Environment feedback format (observation construction after each tool call)
* Compatibility across different web benchmarks

Thanks again for the excellent paper — the idea of modeling memory as a decision process is very inspiring!

Best regards.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about web benchmark implementation and tool integration in AtomMem #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question about web benchmark implementation and tool integration in AtomMem #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions