develop [CBRD-24026] Speedup redo recovery by reducing time spent on…

https://jira.cubrid.org/browse/CBRD-24026

Investigation, phase 1:
- during log_recovery_redo it was found (via perf) that logtb_find_tran_index take a lot of time because it does a sequential search to find the index of a trid across the entirety of the transactions found in the log
- only to, afterwards, free that transaction index.

Investigation, phase 2:
-this freeing of a transaction index in log_recovery_redo looked suspect
- subsequent investigation revealed that the checkpoint mechanism considers, at the point it is executed all transactions that it finds in the transaction table regardless of their state
- thus, the checkpoint ends up picking transactions which are actually committed (ie: the commit/abort log record has been added) but have not yet been cleaned-up from the transaction table (note that threre might be a race condition in this area) and adds them to the LOG_END_CHKPT log entry that it adds
- at recovery analysis, the following log entries for a given transaction are found (an example): LOG_MVCC_UNDOREDO_DATA, LOG_MVCC_DIFF_UNDOREDO_DATA, LOG_COMMIT, LOG_END_CHKPT
- the last but one of these will clear the transaction from the transaction table
- the last one will actually re-add an entry in the transaction table for the same transaction (via the succession of calls: log_recovery_analysis - log_rv_analysis_record - log_rv_analysis_end_checkpoint - logtb_rv_find_allocate_tran_index)
- this entry is the one that was being cleared in log_recovery_redo on the LOG_COMMIT/LOG_ABORT switch branch.

Implementation:
- make the checkpoint mechanism not pick up transactions which have had their commit/abort log record entry added
- a new log_tdes.commit_abort_lsa flag is added, initialized to NULL_LSA and set to the LSA of the commit/abort log record once that is added - this is done under log_prior lock in prior_lsa_next_record_internal
- make the checkpoint mechanism not pick up transactions which have had their commit/abort added, based on the flag - in logpb_checkpoint_trans
- in log_recovery_redo only assert that there are no LOG_COMMIT/ABORT log records, unless required to stop_at a certain time (restore)

cristiarg

Push event #10156.1 passed

  • Ran for
AMD64
C++
Git
CC=gcc-8 CXX=g++-8
CUBRID/cubrid:.travis.yml@8c09972
dist: trusty
language: cpp
env: CC=gcc-8 CXX=g++-8
os:
- linux
addons:
  apt:
    sources:
      - kalakris-cmake
      - ubuntu-toolchain-r-test
    packages:
      - gcc-8
      - g++-8
      - cmake
      - systemtap-sdt-dev
      - libelf-dev
script:
  - cmake -E make_directory build
  - cmake -E chdir build cmake ..
  - cmake --build build
Build Config
{
  "language": "cpp",
  "os": [
    "linux"
  ],
  "dist": "trusty",
  "env": {
    "jobs": [
      {
        "CC": "gcc-8",
        "CXX": "g++-8"
      }
    ]
  },
  "addons": {
    "apt": {
      "sources": [
        {
          "name": "kalakris-cmake"
        },
        {
          "name": "ubuntu-toolchain-r-test"
        }
      ],
      "packages": [
        "gcc-8",
        "g++-8",
        "cmake",
        "systemtap-sdt-dev",
        "libelf-dev"
      ]
    }
  },
  "script": [
    "cmake -E make_directory build",
    "cmake -E chdir build cmake ..",
    "cmake --build build"
  ]
}