Supported database versions

If you are using MariaDB 10.3 and later, Percona XtraBackup is not supported and you should follow Mariabackup’s documentation instead. See https://mariadb.com/kb/en/percona-xtrabackup-overview/#compatibility-with-mariadb.

Also note that MariaDB and MySQL have different GTID implementations, and the following guide is only for MySQL 8.0.

Preparations

  1. Enable log_bin, log_slave_updates (only needed on slave), gtid_mode and enforce_gtid_consistency.
    1. - and - can be used interchangeably in MySQL configuration.
    2. If not already enabled, SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN; first to check if there are any incompatible queries.
    3. You can enable gtid_mode online by following this documentation on MySQL 5.7.6 or later: https://dev.mysql.com/doc/mysql-replication-excerpt/5.7/en/replication-mode-change-online-enable-gtids.html
  2. server_id must be different on master and slave database.
  3. If hostname:port read from database configuration is not directly reachable from your orchestrator instance, you may need to configure report_host and report_port to let orchestrator find your instance via "MySQLHostnameResolveMethod": "@@report_host".
  4. Binary log is stored in select @@global.log_bin_basename; which normally reside in the data directory.
  5. Revoke unused SUPER and CONNECTION_ADMIN privileges and and make sure no one is actively writing data with these privileges.
    1. List all grants with these privileges with SELECT GRANTEE, PRIVILEGE_TYPE FROM INFORMATION_SCHEMA.USER_PRIVILEGES WHERE PRIVILEGE_TYPE IN ('SUPER', 'CONNECTION_ADMIN'); https://dev.mysql.com/doc/refman/8.0/en/privileges-provided.html#dynamic-privileges-migration-from-super
    2. SELECT user FROM INFORMATION_SCHEMA.PROCESSLIST; to list active users and their privileges.
    3. Note that static global privileges like SUPER are unaffected for a connected client. These changes take effect only in sessions for subsequent connections.
    4. Since MySQL 5.7.8, you could also set UseSuperReadOnly in orchestrator’s config to ensure no one can write data.
[mysqld]
server_id=2
report_host=192.168.1.x
report_port=3306
 
log_bin
log_slave_updates
 
gtid_mode=ON
enforce-gtid-consistency=ON

Backup and restore

Percona XtraBackup and MySQL version must match. For example, use Percona XtraBackup 8.0 for MySQL 8.0 database backup.

Preparations

  1. docker pull percona/percona-xtrabackup:8.0.35-32 on both the source and destination servers.
  2. Prepare the target database, started once with your my.cnf config to make sure it works, then shut it down and empty its data directory.
  3. Check database connectivity from source to destination and vice versa for reverse replication.
  4. Make sure there is sufficient disk space on the source server to keep the binary log files needed for replication.

Backup

NOTE

If you have custom InnoDB options, you may need to mount the my.cnf config file to /etc/my.cnf in the backup container to let XtraBackup respect them. The following example skips this for brevity.

WARNING

Note your server’s bandwidth limit, especially if this is a server serving production traffic.

# WARNING: docker will store a copy of the logs, make sure you have log rotation set up, or else run the pipeline inside your container.
docker run --name pxb --volumes-from mysqldb --link mysqldb:mysql -it --rm --user root percona/percona-xtrabackup:8.0.35-32 /bin/bash
 
microdnf install nmap-ncat
 
xtrabackup --backup --stream=xbstream --throttle=4 --datadir=/var/lib/mysql --host=mysql --port=3306 --user=root --password=mysecretpassword | tee >(sha1sum > source_xt_checksum) | nc desthost 9999 && echo SUCCESS
 
cat source_xt_checksum
  1. --throttle=4 limits the bandwidth used to 40 MB/s. There are some overhead, so be conservative.
  2. docker run -t mixes stdout and stderr, but in our case we are streaming stdout elsewhere, so it’s fine.
  3. Don’t use stdout of docker run as a pipeline for data. The Docker daemon stores a copy of it, wasting disk I/O and space.
  4. If you are using bind mount for the data directory, use that instead of --volumes-from mysqldb.
  5. With nmap-ncat, you don’t need to specify -q 5 to let nc terminate the connection automatically upon EOF.

Restore

# Start a new container and stop it for data recovery
docker run -d -p 3306:3306 --name mysqldb -v /path/to/datadir:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=mysecretpassword mysql:8.0
docker stop mysqldb
 
# Receive and prepare the backup
nc -l 9999 | tee >(sha1sum > destination_checksum) > /path/to/backupdir/backup.xbstream
 
# Start an interactive shell with xtrabackup CLI
# NOTE: Destructive commands like `rm -rf` should be ran interactively in the container to discard shell history.
docker run -v /path/to/backupdir:/backup -v /path/to/datadir:/var/lib/mysql -it --rm --user root percona/percona-xtrabackup:8.0.35-32 /bin/bash
 
# Extract
xbstream -xv -C /backup/20250415 < /backup/backup.xbstream
# Prepare
xtrabackup --prepare --target-dir=/backup/20250415
 
# Restore
rm -rf /var/lib/mysql/*
# Use --copy-back --parallel=2 if you have sufficient space and time for doing another copy
xtrabackup --move-back --datadir=/var/lib/mysql --target-dir=/backup/20250415
chown -R mysql:mysql /var/lib/mysql/
 
exit
 
# Start the database and check its logs
docker start mysqldb
docker logs -f mysqldb

Replication

On master

Replace $replicapass with a strong password and 192.168.1.% with your trusted network prefix.

CREATE USER 'repl'@'192.168.1.%' IDENTIFIED BY '$replicapass';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.1.%';

Once replication is properly set up, which you could verify with show slave hosts\G, you may execute PURGE BINARY LOGS to free some disk space, but don’t use RESET MASTER as it could break replication.

RESET MASTER is not intended to be used while any replicas are running. The behavior of RESET MASTER when used while replicas are running is undefined (and thus unsupported), whereas PURGE BINARY LOGS may be safely used while replicas are running.

On slave

-- Set read_only
set @@global.read_only=on; -- Updates form clients who have the SUPER privilege are exempted, but you should avoid doing that during replication.
 
-- Check Executed_Gtid_Set
show master status\G
-- Only execute these if gtid does not match and you have confidence that data is not corrupt.
RESET MASTER;
SET @@global.gtid_purged='<gtid_string_found_in_xtrabackup_binlog_info>';
 
-- Needed if source was a replica
RESET SLAVE ALL;
 
-- Set up replication
CHANGE MASTER TO
MASTER_HOST = '...',
MASTER_PORT = ...,
MASTER_USER = '...',
MASTER_PASSWORD = '...',
MASTER_AUTO_POSITION=1,
GET_MASTER_PUBLIC_KEY=1;
start slave;
 
show slave status\G
show warnings\G
 
show processlist\G -- run on both master and slave DB
show slave hosts\G -- run on master only

Note that if the caching_sha2_password authentication plugin is used on master, you need to add GET_MASTER_PUBLIC_KEY=1 to the CHANGE MASTER TO statement.

Orchestrator

First, follow https://github.com/percona/orchestrator/blob/master/docs/install.md to set up a backend MySQL server for orchestrator.

Alternatively, you could use SQLite as the backend database for orchestrator. Make a copy of conf/orchestrator-sample-sqlite.conf.json and configure relevant fields as follows.

{
  // Backend DB
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/path/to/orchestrator.sqlite3",
 
  // Security
  "AuthenticationMethod": "basic",
  "HTTPAuthUser": "dba_team",
  "HTTPAuthPassword": "xxxxx", // replace it with a strong password
 
  // Discovery
  "MySQLHostnameResolveMethod": "@@report_host",
 
  // Replication
  "UseSuperReadOnly": true,
}

For databases in the topology, grant privileges on the master and let it propagate through replication. Change orch_topology_password to a strong password and orch_host to the host running orchestrator.

CREATE USER 'orchestrator'@'orch_host' IDENTIFIED BY 'orch_topology_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'orch_host';
GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'orch_host';
GRANT SELECT ON ndbinfo.processes TO 'orchestrator'@'orch_host'; -- Only for NDB Cluster
FLUSH PRIVILEGES;

Graceful master promotion

https://github.com/percona/orchestrator/blob/master/docs/topology-recovery.md#graceful-master-promotion

In a graceful takeover:

  • either the user or orchestrator choose an existing replica as the designated new master.
  • orchestrator ensures the designated replica takes over its siblings as intermediate master
  • orchestrator turns the master to be read-only (possibly also super-read-only)
  • orchestrator makes sure your designated server is caught up with replication.
  • orchestrator promotes your designated server as the new master.
  • orchestrator turns promoted server to be writable.
  • orchestrator demotes the old master and places it as a direct replica of the new master
  • if possible, orchestrator sets replication user/password for the demoted master
  • in the graceful-master-takeover-auto variant (see following), orchestrator starts replication on demoted master.

The operation can take a few seconds, during which time your app is expected to complain, seeing that the master is read-only.

Invoke graceful takeover via:

  • Web interface: drag a direct master’s replica onto the left half of the master’s box. The web interface uses the graceful-master-takeover variation; the replication on demoted master will not kick in.

Safety notes:

  • super-read-only is used if UseSuperReadOnly is set in orchestrator config.
  • Once the server has advertised itself as --read-only, MySQL blocks ongoing transactions’ COMMIT from completing.

Operation procedure:

  1. Make sure replication lag is within an acceptable range.
  2. Drag the replica to the starting point, and drop when it says “PROMOTE AS MASTER.”
  3. Confirm the dialog.

After it stabilizes, the now-replica database should be in read-only mode and have an empty Slave_SQL_Running_State in show slave status\G.

You could start replication in the reverse direction with start slave; on the now-replica database, and downtime status of the database shown on orchestrator will end automatically.

Automated recovery of master failure

RecoverMasterClusterFilters and RecoverIntermediateMasterClusterFilters controls auto-recovery. In the example configuration files, auto-recovery is disabled with a pattern that you would not normally use in production.

Manual master promotion

With the slave caught up to near-instant, set master to read-only.

set @@global.read_only=on;

Compare Executed_Gtid_Set from show master status\G on the master and show slave status\G on the slave. If they match, stop replication and disable read_only mode on slave.

-- on slave DB
stop slave;
set @@global.read_only=off;

Clients can now connect to the then-slave DB, which is acting as the master now.

References