# Large heap performance

Large heaps are increasingly common for modern Java applications. Eclipse MAT
*can* be used to deal with heaps of 100GB or more, but it does need some fine 
tuning to get the best performance.

## MAT processing overview

MAT has two stages to handling a heapdump.

1. First time a heapdump is loaded, MAT reads the entire heap to build an index.

2. Subsequent queries use the index to quickly find relevant objects.

Building the index is the most resource-intensive step, and uses a significant
amount of memory and CPU (and disk IO). After the index has been built, viewing 
the data typically requires much less resource.

### Indexing stages

1. Pass 1: scan dump file (e.g. hprof, etc.) front-to-back to identify offsets of every object.
   This pass depends mostly on fast linear IO.

2. Pass 2: re-scan dump file to build more detailed object mapping data.
   This pass depends on a mix of linear and random IO.

3. Fill: Logical pass to restructure and identify data per "Object": including
   forward-and-backward references between objects, histogram counts,
   approximating object sizes, and other object metadata. This pass can be
   heavily parallelized with plenty of CPU.

4. Dominator Tree - most memory-expensive pass, we require ~6x arrays to track
   the dominator tree. For a hypothetical heap object with 2billion objects, MAT
   requires 2bn * 6 * 4 bytes = 48GB of memory for these 6 arrays alone, on top
   of other overheads. This is currently single-threaded and the most memory
   intensive pass.

#### Indexing setup for large heaps

Configure MemoryAnalyzer.ini to allocate more memory for the index build.
Specifically, in MemoryAnalyzer.ini, set the JVM args to increase heap space:

```
# default is 1024m (MB), if we have 128GB of RAM, this allows to use 100GB
-Xms100g
-Xmx100g
```

Use `ParseHeapDump.sh` for a headless parse - this can be helpful to run on a
dedicated machine with plenty of RAM and disk space. Or, on a smaller machine,
it helps to maximise heap space available for the index build.

- The constructed files `*.index` and `*.threads` can be `scp` copied to the
  local machine and placed alongside the dump file (e.g. `.hprof`, etc.).

- If doing this, ensure that the index files are both in the same directory AND
  that the file dates of the index are newer than the dump file (e.g. `.hprof`, etc.), otherwise
  the index will be dropped and rebuilt on the next open.

- See more in [headless and programmatic](./headless_and_programmatic).

### Discard policy (2 billion objects)

For really large heaps, you may need a discard policy if the heap has > 2bn
objects. This is currently an architectural limitation of MAT, due to using
Java arrays internally which are 32-bits long.

You should apply this if you see the following error during indexing:

```
[..........java.lang.OutOfMemoryError: Requested length of new long[2,147,483,640] exceeds limit of 2,147,483,639.
Consider enabling object discard, see Window > Preferences > Memory Analyzer > Enable discard
```

See more in [the FAQ](./faq), and [the online help](https://help.eclipse.org/latest/index.jsp?topic=%2Forg.eclipse.mat.ui.help%2Ftasks%2Fconfigure_mat.html&anchor=task_configure_mat__discard).

### Viewing performance

For the most part, this relies on:

- plenty of free heap space for uncompressed index data, and,

- finally fast random IO to support reading any data from the heap itself.

- fast random IO helps to seek and read the various index files for some query
  types

    - for example, queries that search on String contents require fast random
      IO to scan through the string data in the dump file.

## Help Wanted

We would always appreciate any help from the community in improving parsing
performance. Even if you're not a developer, you can help by reporting specific
bugs, slow queries, or suggesting improvements. If you are a developer, you can
contribute directly by submitting a pull request with your changes, OR, even
submitting some performance profiling data that you've collected will still be
helpful.