Hi, thanks for the great work!
I’ve been reading both the paper and the codebase carefully, and I have a question regarding the web benchmark part (Asearcher / GAIA / WebWalkerQA).
From the paper, it seems that:
- AtomMem is evaluated on multi-turn web tasks
- The agent is equipped with tools like search engine and URL reader
- The environment allows up to 40 tool calls per task
However, in the current codebase, I can only find implementations related to long-context QA (e.g., document chunking, memory + retrieval pipeline). I wasn’t able to locate:
- The implementation of the web environment wrapper
- The definition of external tools (e.g., search / URL reader)
- How different web datasets (Asearcher / GAIA / WebWalkerQA) are unified into a common interaction protocol
- The rollout loop for multi-turn tool-augmented interaction
So I’m wondering:
- Is the web benchmark part not included in the current release?
- Or is it implemented in another repository / branch?
- If it exists, could you point me to the relevant modules?
Additionally, I’m particularly interested in how you handle:
- Tool abstraction (e.g., unified XML action → actual API call)
- Environment feedback format (observation construction after each tool call)
- Compatibility across different web benchmarks
Thanks again for the excellent paper — the idea of modeling memory as a decision process is very inspiring!
Best regards.
Hi, thanks for the great work!
I’ve been reading both the paper and the codebase carefully, and I have a question regarding the web benchmark part (Asearcher / GAIA / WebWalkerQA).
From the paper, it seems that:
However, in the current codebase, I can only find implementations related to long-context QA (e.g., document chunking, memory + retrieval pipeline). I wasn’t able to locate:
So I’m wondering:
Additionally, I’m particularly interested in how you handle:
Thanks again for the excellent paper — the idea of modeling memory as a decision process is very inspiring!
Best regards.