Skip to content

Question about web benchmark implementation and tool integration in AtomMem #2

Description

@GenSouKa1

Hi, thanks for the great work!

I’ve been reading both the paper and the codebase carefully, and I have a question regarding the web benchmark part (Asearcher / GAIA / WebWalkerQA).

From the paper, it seems that:

  • AtomMem is evaluated on multi-turn web tasks
  • The agent is equipped with tools like search engine and URL reader
  • The environment allows up to 40 tool calls per task

However, in the current codebase, I can only find implementations related to long-context QA (e.g., document chunking, memory + retrieval pipeline). I wasn’t able to locate:

  1. The implementation of the web environment wrapper
  2. The definition of external tools (e.g., search / URL reader)
  3. How different web datasets (Asearcher / GAIA / WebWalkerQA) are unified into a common interaction protocol
  4. The rollout loop for multi-turn tool-augmented interaction

So I’m wondering:

  • Is the web benchmark part not included in the current release?
  • Or is it implemented in another repository / branch?
  • If it exists, could you point me to the relevant modules?

Additionally, I’m particularly interested in how you handle:

  • Tool abstraction (e.g., unified XML action → actual API call)
  • Environment feedback format (observation construction after each tool call)
  • Compatibility across different web benchmarks

Thanks again for the excellent paper — the idea of modeling memory as a decision process is very inspiring!

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions