# Large heap performance Large heaps are increasingly common for modern Java applications. Eclipse MAT *can* be used to deal with heaps of 100GB or more, but it does need some fine tuning to get the best performance. ## MAT processing overview MAT has two stages to handling a heapdump. 1. First time a heapdump is loaded, MAT reads the entire heap to build an index. 2. Subsequent queries use the index to quickly find relevant objects. Building the index is the most resource-intensive step, and uses a significant amount of memory and CPU (and disk IO). After the index has been built, viewing the data typically requires much less resource. ### Indexing stages 1. Pass 1: scan dump file (e.g. hprof, etc.) front-to-back to identify offsets of every object. This pass depends mostly on fast linear IO. 2. Pass 2: re-scan dump file to build more detailed object mapping data. This pass depends on a mix of linear and random IO. 3. Fill: Logical pass to restructure and identify data per "Object": including forward-and-backward references between objects, histogram counts, approximating object sizes, and other object metadata. This pass can be heavily parallelized with plenty of CPU. 4. Dominator Tree - most memory-expensive pass, we require ~6x arrays to track the dominator tree. For a hypothetical heap object with 2billion objects, MAT requires 2bn * 6 * 4 bytes = 48GB of memory for these 6 arrays alone, on top of other overheads. This is currently single-threaded and the most memory intensive pass. #### Indexing setup for large heaps Configure MemoryAnalyzer.ini to allocate more memory for the index build. Specifically, in MemoryAnalyzer.ini, set the JVM args to increase heap space: ``` # default is 1024m (MB), if we have 128GB of RAM, this allows to use 100GB -Xms100g -Xmx100g ``` Use `ParseHeapDump.sh` for a headless parse - this can be helpful to run on a dedicated machine with plenty of RAM and disk space. Or, on a smaller machine, it helps to maximise heap space available for the index build. - The constructed files `*.index` and `*.threads` can be `scp` copied to the local machine and placed alongside the dump file (e.g. `.hprof`, etc.). - If doing this, ensure that the index files are both in the same directory AND that the file dates of the index are newer than the dump file (e.g. `.hprof`, etc.), otherwise the index will be dropped and rebuilt on the next open. - See more in [headless and programmatic](./headless_and_programmatic). ### Discard policy (2 billion objects) For really large heaps, you may need a discard policy if the heap has > 2bn objects. This is currently an architectural limitation of MAT, due to using Java arrays internally which are 32-bits long. You should apply this if you see the following error during indexing: ``` [..........java.lang.OutOfMemoryError: Requested length of new long[2,147,483,640] exceeds limit of 2,147,483,639. Consider enabling object discard, see Window > Preferences > Memory Analyzer > Enable discard ``` See more in [the FAQ](./faq), and [the online help](https://help.eclipse.org/latest/index.jsp?topic=%2Forg.eclipse.mat.ui.help%2Ftasks%2Fconfigure_mat.html&anchor=task_configure_mat__discard). ### Viewing performance For the most part, this relies on: - plenty of free heap space for uncompressed index data, and, - finally fast random IO to support reading any data from the heap itself. - fast random IO helps to seek and read the various index files for some query types - for example, queries that search on String contents require fast random IO to scan through the string data in the dump file. ## Help Wanted We would always appreciate any help from the community in improving parsing performance. Even if you're not a developer, you can help by reporting specific bugs, slow queries, or suggesting improvements. If you are a developer, you can contribute directly by submitting a pull request with your changes, OR, even submitting some performance profiling data that you've collected will still be helpful.