docs: add open-source docs for running, operating, and configuring MEE Node#193
docs: add open-source docs for running, operating, and configuring MEE Node#193fichiokaku merged 6 commits intodevelopfrom
Conversation
| See `.env.example` in this directory. Main options: | ||
|
|
||
| - **Server**: `SERVER_HOST` (default `127.0.0.1`), `SERVER_PORT` (default `3000` in code; `.env.example` uses `5000` to match the node’s default) | ||
| - **Chains / RPCs**: For each chain you need, set either `{CHAIN}_RPC` (primary RPC with debug/trace) or `{CHAIN}_FORK_RPC` (e.g. for Anvil fork). Examples: `ETHEREUM_RPC`, `BASE_RPC`, `ETHEREUM_FORK_RPC`, etc. |
There was a problem hiding this comment.
Can you please mention if the RPC doesn't support token detection ? Fallback to a fork mode. This happens for some chains and anvil is a best choice for such chains
| ## Operational notes | ||
|
|
||
| - **RPC at boot**: The service builds one RPC provider per configured chain at startup. If **any** chain’s RPC (or Anvil fork) fails during init, the process can **exit** and may restart in a loop until the RPC is fixed. Use stable RPCs; for exotic or unreliable chains, consider a minimal instance with only the chains you need. See [Dependencies — RPC and boot behavior](../../docs/dependencies.md#rpc-and-boot-behavior). | ||
| - **Broken pipe**: Occasional "broken pipe" errors have been observed (e.g. during infra upgrades); root cause is not fully clarified. This does not block MEE Node execution; the node falls back to default gas when the token service is unavailable. |
There was a problem hiding this comment.
I don't think we need to document this. Instead we need to fix this or identify a root cause from infra perspective.
|
|
||
| - **RPC at boot**: The service builds one RPC provider per configured chain at startup. If **any** chain’s RPC (or Anvil fork) fails during init, the process can **exit** and may restart in a loop until the RPC is fixed. Use stable RPCs; for exotic or unreliable chains, consider a minimal instance with only the chains you need. See [Dependencies — RPC and boot behavior](../../docs/dependencies.md#rpc-and-boot-behavior). | ||
| - **Broken pipe**: Occasional "broken pipe" errors have been observed (e.g. during infra upgrades); root cause is not fully clarified. This does not block MEE Node execution; the node falls back to default gas when the token service is unavailable. | ||
| - **When this service fails**: The MEE Node still executes supertransactions using the **default gas limit from the SDK**, which is sufficient for many flows. Complex flows may fail with insufficient gas. See [Dependencies — Impact on execution](../../docs/dependencies.md#impact-on-execution-when-the-token-service-fails). |
There was a problem hiding this comment.
A new improvement is added as per our discussion where the secondary catch happens on MEE node also. In this case, if the should process token overridens for frequently used tokens or tokens that already used atleast once by anyone in the respective mee node
docs/architecture.md
Outdated
| - Consume jobs from the **simulator queue** (BullMQ) for that chain. | ||
| - Run simulation (e.g. `eth_call` with state overrides). | ||
| - Depend on token-storage-detection for ERC20 balance slots. |
There was a problem hiding this comment.
This is a wrong info. It misinterpreted the pre sims vs simulator.
docs/architecture.md
Outdated
| - Enqueues **simulator** jobs (per chain / userOp batch) so simulations can run. | ||
|
|
||
| 2. **Simulator** | ||
| - Picks simulator jobs from Redis (BullMQ). | ||
| - For ERC20 state overrides, calls **Token Storage Detection** service to get balance storage slot per token/chain. | ||
| - Runs simulation (e.g. via RPC manager). | ||
| - On success, marks userOps as simulated; batcher listens for completed jobs. | ||
|
|
||
| 3. **Batcher** | ||
| - Runs in the master. | ||
| - Listens for simulator and executor “completed” events. | ||
| - Groups simulated userOps per chain into batches under the chain’s batch gas limit. | ||
| - Enqueues **executor** jobs (one job per batch per chain). |
There was a problem hiding this comment.
Is it good to stay high level instead of diving deeper into its technical things ?
docs/architecture.md
Outdated
| - **Storage keys** (see `build-redis-key.ts`): | ||
| - `storage:quote:{hash}:data`, `storage:quote:{hash}:user-ops` | ||
| - `storage:user-op:{hash}:data`, `storage:user-op:{hash}:custom-fields` | ||
| - `storage:cache:{key}` |
There was a problem hiding this comment.
I don't think we need to document these details and etc...
I initially thought it has to be a high level setup and configuration docs
docs/chain-configuration.md
Outdated
| | **`rpcs`** | string[] | Yes* | RPC URLs for this chain. Used by RPC manager for simulation and execution. At least one must be set in `rpcs` or via `sharedNodeConfigs` (see below). | | ||
| | **`sharedNodeConfigs`** | array | No | Optional list of node-specific overrides; each can provide `rpcs`. Used when multiple node IDs share config but use different RPCs. | | ||
|
|
||
| \*Either `rpcs` or `sharedNodeConfigs[].rpcs` must supply at least one RPC. |
There was a problem hiding this comment.
I would advise to not document about the sharedNodeConfigs. This is only required for us and not for everyone else. Because we keep only one secret for all the nodes
| - **`price.type: "fixed"`** — constant `value` and `decimals` in config. No extra chain or RPC. | ||
| - **`price.type: "oracle"`** — price is read from a **Chainlink-style aggregator** (e.g. `latestRoundData`, `decimals`) on a specific chain. You must set **`price.chainId`** and **`price.oracle`** (contract address). | ||
|
|
||
| **Important:** If `price` is `"oracle"` and references a **chainId**, that chain **must exist in your chains config** and have working RPCs. The node calls `RpcManagerService.executeRequest(chainId, ...)` to read the oracle contract. If that chain is not configured, the request will fail and native/payment pricing can break. |
There was a problem hiding this comment.
Some code level examples can be avoid on the docs here.
docs/chain-configuration.md
Outdated
|
|
||
| ### Gas and chain type (gas estimator, L1/L2) | ||
|
|
||
| | Field | Type | Required | Description / impact | |
There was a problem hiding this comment.
Its good to add a defaults column to show what will be the default value
docs/chain-configuration.md
Outdated
| | **`simulator.numWorkers`** | number | No | Number of simulator thread workers for this chain. Default from env `DEFAULT_NUM_SIMULATOR_WORKERS_PER_CHAIN` or schema. | | ||
| | **`simulator.workerConcurrency`** | number | No | Concurrency per simulator worker. Default from `DEFAULT_SIMULATOR_WORKER_CONCURRENCY`. | | ||
| | **`simulator.stalledJobsRetryInterval`** | number (ms) | No | Interval for retrying stalled simulation jobs. | | ||
| | **`simulator.rateLimitMaxRequestsPerInterval`** | number | No | Rate limit: max requests per interval. | | ||
| | **`simulator.rateLimitDuration`** | number | No | Rate limit interval (e.g. seconds). | | ||
| | **`simulator.traceCallRetryDelay`** | number (ms) | No | Delay before retrying a failed trace/simulation call. | |
There was a problem hiding this comment.
Some of these things are coming from ENVs and not chain config files. We have to properly differentiate the chain level config vs node level envs
docs/chain-configuration.md
Outdated
| | **`executor.stalledJobsRetryInterval`** | number (ms) | No | Interval for retrying stalled execution jobs. | | ||
| | **`executor.rateLimitMaxRequestsPerInterval`** | number | No | Rate limit: max requests per interval. | | ||
| | **`executor.rateLimitDuration`** | number | No | Rate limit interval. | |
docs/chain-configuration.md
Outdated
| | **`paymentTokens`** | array | Yes (non-empty for execution chains) | List of supported payment tokens (e.g. USDC) for this chain. | | ||
| | **`paymentTokens[].name`** | string | Yes | Token name. | | ||
| | **`paymentTokens[].address`** | address | Yes | Token contract address. | | ||
| | **`paymentTokens[].symbol`** | string | Yes | Token symbol. | | ||
| | **`paymentTokens[].price`** | object | Yes | Same shape as native `price`: `{ type: "fixed", value, decimals }` or `{ type: "oracle", chainId, oracle }`. | | ||
| | **`paymentTokens[].permitEnabled`** | boolean | No | Whether permit (signature-based approval) is supported. | | ||
|
|
||
| If **any** payment token uses **`price.type: "oracle"`** with a **`chainId`**, that chain must be present in your chains config (same rule as for native price). |
There was a problem hiding this comment.
We also have to document about the arbitrary token support features.
docs/chain-configuration.md
Outdated
|
|
||
| --- | ||
|
|
||
| ## Adding a new chain (step-by-step) |
There was a problem hiding this comment.
Its good to talk about funding the master EOA in this process as per the worker configuration
docs/dependencies.md
Outdated
| ### Known issues: broken pipe | ||
|
|
||
| The token server has been observed to **fail occasionally with a "broken pipe" error** in some environments (e.g. during infra or cluster upgrades). The root cause is not fully clarified; it may be related to connection lifecycle or load balancer behavior. This does **not** affect STX (supertransaction) execution directly: if the token service fails, the node falls back to default gas limits (see below). Tracking and hardening this behavior is recommended while the team is available. |
There was a problem hiding this comment.
Lets remove this and we can collaborate with Takwa to fix this instead
docs/operations.md
Outdated
|
|
||
| ### Token Storage Detection unreachable or errors | ||
|
|
||
| - **Symptom**: Simulations that need token balance overrides fail (e.g. “Token overrides failed” or “SlotNotFound”). Health may show token-slot-detection as unhealthy for some chains. |
There was a problem hiding this comment.
Adjust this content based on new changes. Please refer previous comments
vr16x
left a comment
There was a problem hiding this comment.
The documentation has a good coverage but it sometimes dive too deep into the code snippets and etc...
This can make things very hard for new operators to run the node. It should be a very straight forward and step by step progressive setup process and it would be easy
| - **RPC at boot**: The service builds one RPC provider per configured chain at startup. If **any** chain’s RPC (or Anvil fork) fails during init, the process can **exit** and may restart in a loop until the RPC is fixed. Use stable RPCs; for exotic or unreliable chains, consider a minimal instance with only the chains you need. See [Dependencies — RPC and boot behavior](../../docs/dependencies.md#rpc-and-boot-behavior). | ||
| - **Broken pipe**: Occasional "broken pipe" errors have been observed (e.g. during infra upgrades); root cause is not fully clarified. This does not block MEE Node execution; the node falls back to default gas when the token service is unavailable. | ||
| - **When this service fails**: The MEE Node still executes supertransactions using the **default gas limit from the SDK**, which is sufficient for many flows. Complex flows may fail with insufficient gas. See [Dependencies — Impact on execution](../../docs/dependencies.md#impact-on-execution-when-the-token-service-fails). | ||
| - **When this service fails**: The MEE Node still executes supertransactions using the **default gas limit from the SDK**, which is sufficient for many flows. Complex flows may fail with insufficient gas. When the service is unavailable, the node can fall back to an **in-memory cache** of balance storage slots: tokens that were successfully resolved at least once (to detect their storage slot) are cached. This cache is **non-persistent** (lost on node restart). See [Dependencies — Impact on execution](../../docs/dependencies.md#impact-on-execution-when-the-token-service-fails). |
There was a problem hiding this comment.
Its not an in memory cache but a redis permanent cache
docs/architecture.md
Outdated
| - Run simulation (e.g. `eth_call` with state overrides). | ||
| - Depend on token-storage-detection for ERC20 balance slots. | ||
| - Consume jobs from the **simulator queue** for that chain (async batch simulation after quote). | ||
| - Run simulation; they use token-storage-detection for ERC20 balance slots when processing queued jobs. |
There was a problem hiding this comment.
Token slot stuff never happens on simulation phase. Its only a thing for pre simulation and gas estimation phase
docs/architecture.md
Outdated
| - Depend on token-storage-detection for ERC20 balance slots. | ||
| - Consume jobs from the **simulator queue** for that chain (async batch simulation after quote). | ||
| - Run simulation; they use token-storage-detection for ERC20 balance slots when processing queued jobs. | ||
| - Distinct from **pre-simulation**: quote requests can run a quick pre-simulation in the API for gas estimation; the simulator workers handle the full simulation pass that confirms userOps before execution. |
There was a problem hiding this comment.
This statement is misleading. Pre simulations fill the on chain state gap with state overrides and attempts to perform the full sims for gas estimation and calldata validity.
The execution simulation does the same job but without state overrides to fill the gap. Instead it will orchestrate for on chain condition to be met before execution phase.
Please adjust this
| - Client sends the signed quote (same hash as stored). | ||
| - Node loads quote from Redis, validates signature and deadline. | ||
| - If simulations are not yet done, execution is driven by the same queues; the API may wait or return once the execution job is accepted/done depending on implementation. | ||
| 5. **Execute** — Client sends the signed quote; node loads from Redis, validates, and execution is driven by the same queues until the execution job is done. |
There was a problem hiding this comment.
This point is irrelevant here when the intention is to explain how the end to end pipeline works.
There was a problem hiding this comment.
Can we remove this 5th point or reorder this ?
docs/architecture.md
Outdated
|
|
||
| So: **Redis** is the backbone for queues (BullMQ), quote/userOp storage, and cache; **Token Storage Detection** is only used during simulation to build correct state overrides. | ||
|
|
||
| **Redis** backs queues, quote/userOp storage, and cache. **Token Storage Detection** is used during simulation (both pre-simulation in the API and in simulator workers) to build correct ERC20 state when needed. |
There was a problem hiding this comment.
Token service is not being used in simulator workers
docs/chain-configuration.md
Outdated
| | Field | Type | Required | Default | Description / impact | | ||
| |-------|------|----------|--------|----------------------| | ||
| | **`chainId`** | string (numeric) | Yes | — | Chain id (e.g. `"1"`, `"8453"`). Must match the key in the config file/directory. | | ||
| | **`name`** | string | Yes | — | Human-readable name (logs, errors, gas estimator). | | ||
| | **`rpcs`** | string[] | Yes | — | RPC URLs for this chain. Used by RPC manager for simulation and execution. At least one required. | |
There was a problem hiding this comment.
Default column is not prefilled here
docs/chain-configuration.md
Outdated
| | **`executor.pollInterval`** | number (ms) | No | Schema default | Poll interval when waiting for transaction receipt. | | ||
| | **`executor.stalledJobsRetryInterval`** | number (ms) | No | Schema default | Interval for retrying stalled execution jobs. | | ||
| | **`executor.rateLimitMaxRequestsPerInterval`** | number | No | Schema default | Rate limit: max requests per interval. | | ||
| | **`executor.rateLimitDuration`** | number | No | Schema default | Rate limit interval. | | ||
| | **`executor.workerFunding`** | string (ether) | No | e.g. `"0.001"` | Target balance used when funding worker EOAs (e.g. via disperse). | | ||
| | **`executor.workerFundingThreshold`** | string (ether) | No | Schema default | **Minimum native balance** each worker (or master when used as worker) must have to be considered healthy. Worker must have at least this balance to trigger an EVM transaction; it effectively **limits the maximum executable call gas limit** for that chain. Set high enough for the largest transaction you expect to execute. | | ||
| | **`executor.workerCount`** | number | No | From env / schema | Number of worker EOAs to use for execution on this chain. **0** = use master only; otherwise between **1** and **`MAX_EXTRA_WORKERS`** (env). Higher values allow more parallel execution. Workers are derived from `NODE_ACCOUNTS_MNEMONIC` or `NODE_ACCOUNTS_PRIVATE_KEYS`. | |
There was a problem hiding this comment.
This explains the object based property but doesn't show which env variables. I think these keys needs to be replaced with envs and node can handle them internal however it wants to be
docs/chain-configuration.md
Outdated
| | **`paymasterVerificationGasLimit`** | Global default | Paymaster verification gas. | | ||
| | **`senderCreateGasLimit`** | Global default | Gas for sender contract creation (when initCode is set). | | ||
| | **`baseVerificationGasLimit`** | Global default | Base verification gas. | | ||
| | **`fixedHandleOpsGas`** | Global default | Fixed gas for handleOps. | | ||
| | **`perAuthBaseCost`** | Global default | Per-signature/auth base cost (e.g. EIP-7702). | |
There was a problem hiding this comment.
Same this should be ENVs rather than object properties used by the node internally
docs/dependencies.md
Outdated
|
|
||
| - If the token service is down or returns errors, the node may still **execute** supertransactions: the **default gas limit from the SDK** is used, which is sufficient for many flows. | ||
| - **Complex flows** (e.g. with custom token logic or higher gas needs) may **fail with insufficient gas** when token slot detection is unavailable, because simulation cannot refine gas estimates. | ||
| - The node can fall back to an **in-memory, non-persistent cache** of balance storage slots: only tokens that were successfully resolved at least once (to detect their slot) are cached. After a restart, the cache is empty until those tokens are requested again while the service is up. |
There was a problem hiding this comment.
Same as previous comment, fix this in memory non permanent cache statement
docs/run-and-maintain.md
Outdated
|
|
||
| - **Native token** — When the user pays in the chain’s native coin (e.g. ETH). | ||
| - **Configured payment tokens** — When the user pays in a token listed in the chain’s `paymentTokens` (e.g. USDC). The payment userOp transfers that token to the fee beneficiary. | ||
| - **Arbitrary tokens** — When arbitrary payment is enabled (see section 7). The user pays in a token that is not in `paymentTokens`; the token is still received at the fee beneficiary. The node does not swap it; the operator is responsible for swapping and rebalancing. |
There was a problem hiding this comment.
Let's add a context that the arbitrary token cannot be a random mock tokens or dead meme tokens. Only the tokens supported by those swap providers will be accepted and this way the node is safe against random dead tokens
|
|
||
| Chains and RPCs used by the node must support: | ||
|
|
||
| - **`debug_traceCall`** — Used for simulation (e.g. tracing handleOps). If the RPC does not support it, simulation may fail; for the Token Storage Detection service, use **fork mode** (e.g. Anvil) for such chains. See [Token Storage Detection README](../apps/token-storage-detection/README.md). |
There was a problem hiding this comment.
Debug trace call should be always supported here irrespective of its usage in token service,
| **`TRUSTED_GAS_TANK_ADDRESS`** (env) is the address of a **trusted gas tank**. When the **payment userOp** of a supertransaction is a **sponsored** payment (userOps indicate sponsorship) and the **sender** of that payment userOp is this trusted gas tank address: | ||
|
|
||
| - The node treats it as **trusted sponsorship**. | ||
| - **Simulation of the payment userOp is skipped** (signature is still verified against the expected gas tank owner). | ||
| - The node **executes all other userOps** in the supertransaction **without requiring fees from the user** — the “payment” is considered covered by the trusted tank. |
There was a problem hiding this comment.
Can you document about the self hosted the sponsorship for these cases. As they might not have the same sponsorship setup as ours.
docs/chain-configuration.md
Outdated
| | — | `executor.pollInterval` | number (ms) | `1000` | Poll interval when waiting for transaction receipt. | | ||
| | — | `executor.stalledJobsRetryInterval` | number (ms) | `5000` (5 s) | Interval for retrying stalled execution jobs. | | ||
| | — | `executor.rateLimitMaxRequestsPerInterval` | number | `100` | Rate limit: max requests per interval. | | ||
| | — | `executor.rateLimitDuration` | number (s) | `1` | Rate limit interval in seconds. | | ||
| | — | `executor.workerFunding` | string (ether) | `"0.001"` | Target balance used when funding worker EOAs (e.g. via disperse). | | ||
| | — | `executor.workerFundingThreshold` | string (ether) | `"0"` | **Minimum native balance** each worker (or master when used as worker) must have to be considered healthy. Limits the maximum executable call gas limit for that chain; set high enough for the largest transaction you expect. | |
There was a problem hiding this comment.
Some of these ENVs and would be good to explicitly mention the env
vr16x
left a comment
There was a problem hiding this comment.
It has improved a lot, approving this PR
Summary
This PR adds documentation so the MEE Node can be run and operated as an open-source project: setup, dependencies (Redis + Token Storage), operations, and full chain configuration (including price oracles).
Changes
README
maxmemory+maxmemory-policy) so cache doesn’t grow unbounded; linked todocs/dependencies.md..env.example.start,start:dev, build +start:prod.docker run; Token Storage image reference./v1/info,/v1/quote,/v1/quote-permit,/v1/exec,/v1/explorer/:hash)./v1/info, logs, shutdown; link to operations doc.New docs
docs/architecture.md— Process model (cluster + workers), quote → storage → simulator → batcher → executor flow, Redis usage, health checks, and how config is pushed to workers.docs/dependencies.md— Redis: Role, config, running (Docker/compose/production), eviction policy (e.g.allkeys-lru), health check. Token Storage: Role, API, config, running (source/Docker). Token Storage specifics: Adding new chains (code change: enum +FromStr+ RPC env); RPC/boot behavior (one bad RPC can prevent startup); known "broken pipe" issue; impact when service fails (default gas used, complex flows may fail); soft health check. Summary table for both dependencies.docs/operations.md— Startup order (Redis → Token Storage → node), health checks, config/secrets, logs, graceful shutdown, handling Redis/Token Storage failures, scaling (API workers, EOA workers, Redis, Token Storage), Docker notes, troubleshooting checklist.docs/chain-configuration.md— Where chain config is loaded (CUSTOM_CHAINS_CONFIG_PATH, default paths), directory vs single-file layout. Price oracles: Native and payment token prices can befixedororacle. Requirement: Any chain referenced by a price oracle (price.chainId) must exist in chain config with valid RPCs, even if not used for execution. Full field reference: Identification/RPC, contracts (entryPointV7,pmFactory,disperse), batcher, gas/chain type (type,eip1559,gasPriceMode,l1ChainId,feeHistoryBlockTagOverride), nativeprice(fixed vs oracle), paymaster funding, confirmations, simulator (numWorkers, concurrency, rate limit, retries), executor (poll interval, rate limit),gasLimitOverrides,paymentTokens(with price and permit). Step-by-step "adding a new chain", minimal JSON example, Token Storage difference, pointer to schema and defaults in code.Token Storage app README
apps/token-storage-detection/README.md: API, config, adding new chains (enum +FromStr+ RPC env), run locally, Docker. Operational notes: RPC at boot (one failing RPC can crash/restart loop), broken pipe, fallback when service fails (default gas). Links to main repo's dependencies and chain-configuration docs.Testing
Checklist