DRAFT: ADBDEV-7181 - Orphaned files removal #1631
Draft
whitehawk wants to merge 22 commits intoadb-6.x-devfrom
Draft
DRAFT: ADBDEV-7181 - Orphaned files removal #1631whitehawk wants to merge 22 commits intoadb-6.x-devfrom
whitehawk wants to merge 22 commits intoadb-6.x-devfrom
Conversation
C doesn't have any sort of built-in understanding of a pointer relative to some arbitrary base address, but dynamic shared memory segments can be mapped at different addresses in different processes, so any sort of shared data structure stored within a dynamic shared memory segment can't use absolute pointers. We could use something like Size to represent a relative pointer, but then the compiler provides no type-checking. Use stupid macro tricks to get some type-checking. Patch originally by me. Concept suggested by Andres Freund. Recently resubmitted as part of Thomas Munro's work on dynamic shared memory allocation. Discussion: 20131205144434.GG12398@alap2.anarazel.de Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com (cherry picked from commit fbc1c12)
This is intended as infrastructure for a full-fledged allocator for dynamic shared memory. The interface looks a bit like a real allocator, but only supports allocating and freeing memory in multiples of the 4kB page size. Further, to free memory, you must know the size of the span you wish to free, in pages. While these are make it unsuitable as an allocator in and of itself, it still serves as very useful scaffolding for a full-fledged allocator. Robert Haas and Thomas Munro. This code is mostly the same as my 2014 submission, but Thomas fixed quite a few bugs and made some changes to the interface. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com (cherry picked from commit 13e14a7)
If you have previously pinned a segment and decide that you don't actually want to keep it around until shutdown, this new API lets you remove the pin. This is pretty trivial except on Windows, where it requires closing the duplicate handle that was used to implement the pin. Thomas Munro and Amit Kapila, reviewed by Amit Kapila and by me. (cherry picked from commit 0fda682) Changes from original commit: remove API changes and windows compatibility code to keep binary compatibility with 6.x
Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com (cherry picked from commit 13df76a) Changes from original commit: removed extra argument from dsm_create() call
The comments in dsa.c suggested that areas were owned by resource owners, but it was not in fact tracked explicitly. The DSM attachments held by the dsa were owned by resource owners, but not the area itself. That led to confusion if you used one resource owner to attach or create the area, but then switched to a different resource owner before allocating or even just accessing the allocations in the area with dsa_get_address(). The additional DSM segments associated with the area would get owned by a different resource owner than the initial segment. To fix, add an explicit 'resowner' field to dsa_area. It replaces the 'mapping_pinned' flag; resowner == NULL now indicates that the mapping is pinned. This is arguably a bug fix, but I'm not backpatching because it doesn't seem to be a live bug in the back branches. In 'master', it is a bug because commit b8bff07daa made ResourceOwners more strict so that you are no longer allowed to remember new resources in a ResourceOwner after you have started to release it. Merely accessing a dsa pointer might need to attach a new DSM segment, and before this commit it was temporarily remembered in the current owner for a very brief period even if the DSA was pinned. And that could happen in AtEOXact_PgStat(), which is called after the owner is already released. Reported-by: Alexander Lakhin Reviewed-by: Alexander Lakhin, Thomas Munro, Andres Freund Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com (cherry picked from commit postgres/postgres@a8b330f)
This covers basic calls within a single backend process, and also calling dsa_allocate() or dsa_get_address() while in a different resource owners. The latter case was fixed by the previous commit. Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com (cherry picked from commit postgres/postgres@325f540) Changes from original commit: test was adapted to 6.x
1. Implement redo module for pending deletes. This module is responsible for operations that are related to redo process: - Inserting XLOG_PENDING_DELETE xlog record in xlog when checkpointer requests it. - Parsing XLOG_SMGR_CREATE xlog to retrieve relfilenodes, that should be added to pending deletes hash table on redo. - Replaying XLOG_PENDING_DELETE by adding items to pending deletes redo hash table. - Removing nodes from pending deletes redo hash table for committed or aborted transactions. - Dropping orphaned files basing on redo hash table with pending deletes in the end of the recovery process. 2. Add unit test for the module. 3. Add GUC 'gp_track_pending_delete'. Ticket: ADBDEV-7304
The module contains functions to maintain doubly linked lists of (RelFileNodePendingDelete, transaction id) pairs for all backends in shared memory. The shared memory can be initialized using the PdlShmemSize and PdlShmemInit functions. Backend can add and remove pairs to its own list using PdlShmemAdd and PdlShmemRemove respectively. The backends lists can be got in the format suitable for XLOG_PENDING_DELETE using PdlXLogShmemDump. When backend stops, the module cleanups its pending deletes list. The size argument is removed from PdlXLogShmemDump, because this value can be calculated by caller using the function return value. Ticket: ADBDEV-7303
Problem description: XLOG_SMGR_CREATE WAL record doesn't contain information about relstorage for the created relation. Orphaned files removal feature requires knowing the relstorage info, otherwise it can't properly handle the removal of all orphaned files for AO tables. Fix: In order to store relation's relstorage in xlog record and keep backward compatibility with previous versions we introduce a new xlog record type XLOG_SMGR_CREATE_PDL. This new record type contains info about relstorage and is used instead of XLOG_SMGR_CREATE. Plus, this patch updates 'log_smgrcreate()' - now it creates XLOG_SMGR_CREATE_PDL record and flushes it right after the creation (otherwise, in case of a crash after file creation, the file may be orphaned). No special tests are presented in this patch, as the added functionality will be tested later together with other parts for the orphaned file removal feature. At this point, it is enough to pass the current standard test set.
Extend the PendingRelDelete structure. Now it stores the shmemPtr pointer to the corresponding shared pending deletes list node. The pointer is filled in RelationCreateStorage as a return value of PdlShmemAdd. The pointer is set as invalid in RelationDropStorage, because storage dropping can't lead to orphaned relfilenode. Replace pfree for the pendingDeletes list entry with a new PendingRelDeleteFree function which calls PdlShmemRemove before pfree when shmemPtr is valid. Add initialization of shared memory for the storage_pending_deletes module. Increase number of LWLocks by MaxBackends to add LWLocks for the module. Fix Assert in the dsm_create function, because the module is used in the stand-alone mode, for example, when initdb runs 'postgres --single'. No special tests are presented in this patch, as the added functionality will be tested later together with other parts for the orphaned file removal feature. At this point, it is enough to pass the current standard test set. Ticket: ADBDEV-7410
Tests are not added, because xlog_desc is used for debug purposes only. Ticket: ADBDEV-7409
…1547) When a table was created right after transaction beginning, then the XLOG_SMGR_CREATE_PDL record was added with InvalidTransactionId, because XLogInsert calls GetCurrentTransactionIdIfAny and the transaction id has not been got at this moment. Get the transaction id before the log_smgrcreate function is called. No special tests are presented in this patch, as the added functionality will be tested later together with other parts for the orphaned file removal feature. At this point, it is enough to pass the current standard test set. Ticket: ADBDEV-7458
Problem description: After following scenario: 1. primary segment started a transaction; 2. primary segment created a relation; 3. primary segment did a checkpoint; 4. mirror created a restartpoint by timeout; 5. both primary and mirror crashed and, then, recovered; primary has removed the orphaned file, but mirror didn't. Root cause: On step 5, mirror started replaying WAL from the restartpoint, so it didn't meet the XLOG_SMGR_CREATE_PDL record (which was at the moment of table creation, before the restartpoint). The information about the table's relfilenode is also stored in the XLOG_PENDING_DELETE WAL record. But the mirror skipped the processing of the XLOG_PENDING_DELETE record. Fix: Enable processing of the XLOG_PENDING_DELETE record by the mirror. This record is created on each checkpoint, so if the mirror starts from a restartpoint, it will be one of the first records to replay. Mirror will obtain the relfilenode information from this record and will remove the orphaned file.
This patch: - Adds clean up of orphaned files into `StartupXLOG()`. - Adds function RemovePendingDeletesForPreparedTransactions(), which removes prepared transactions xids from redo pending deletes before the orphaned files cleanup is performed. Orphaned files removal feature should consider prepared transactions xids and remove them from redo pending deletes, because otherwise some crash scenarios (for ex. 'crash_recovery_dtm' isolation2 test) would drop files, that shouldn't be dropped. No special tests are added. For now, it is enough to pass the standard test set. Plus, abi-check is fixed.
The tests check that orphaned files are not left for all access methods, before and after checkpoint, on segments and on coordinator. Ticket: ADBDEV-7305
…ns (#1737) This patch introduces additional test cases for the orphaned files removal feature, when primary segment goes down completely without immediate crash recovery, and mirror promotion happens. Plus, this patch updates the ignore file for abi check, as the baseline has changed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DRAFT: ADBDEV-7181 - Orphaned files removal