| PostgreSQL 9.0.9 Documentation | ||||
|---|---|---|---|---|
| Prev | Up | Chapter 25. High Availability, Load Balancing, and Replication | Next | |
Continuous archiving can be used to create a high availability (HA) cluster configuration with one or more standby servers ready to take over operations if the primary server fails. This capability is widely referred to as warm standby or log shipping.
The primary and standby server work together to provide this capability, though the servers are only loosely coupled. The primary server operates in continuous archiving mode, while each standby server operates in continuous recovery mode, reading the WAL files from the primary. No changes to the database tables are required to enable this capability, so it offers low administration overhead compared to some other replication solutions. This configuration also has relatively low performance impact on the primary server.
Directly moving WAL records from one database server to another is typically described as log shipping. PostgreSQL implements file-based log shipping, which means that WAL records are transferred one file (WAL segment) at a time. WAL files (16MB) can be shipped easily and cheaply over any distance, whether it be to an adjacent system, another system at the same site, or another system on the far side of the globe. The bandwidth required for this technique varies according to the transaction rate of the primary server. Record-based log shipping is also possible with streaming replication (see Section 25.2.5).
It should be noted that the log shipping is asynchronous, i.e., the WAL records are shipped after transaction commit. As a result, there is a window for data loss should the primary server suffer a catastrophic failure; transactions not yet shipped will be lost. The size of the data loss window in file-based log shipping can be limited by use of the archive_timeout parameter, which can be set as low as a few seconds. However such a low setting will substantially increase the bandwidth required for file shipping. If you need a window of less than a minute or so, consider using streaming replication (see Section 25.2.5).
Recovery performance is sufficiently good that the standby will typically be only moments away from full availability once it has been activated. As a result, this is called a warm standby configuration which offers high availability. Restoring a server from an archived base backup and rollforward will take considerably longer, so that technique only offers a solution for disaster recovery, not high availability. A standby server can also be used for read-only queries, in which case it is called a Hot Standby server. See Section 25.5 for more information.
It is usually wise to create the primary and standby servers so that they are as similar as possible, at least from the perspective of the database server. In particular, the path names associated with tablespaces will be passed across unmodified, so both primary and standby servers must have the same mount paths for tablespaces if that feature is used. Keep in mind that if CREATE TABLESPACE is executed on the primary, any new mount point needed for it must be created on the primary and all standby servers before the command is executed. Hardware need not be exactly the same, but experience shows that maintaining two identical systems is easier than maintaining two dissimilar ones over the lifetime of the application and system. In any case the hardware architecture must be the same — shipping from, say, a 32-bit to a 64-bit system will not work.
In general, log shipping between servers running different major PostgreSQL release levels is not possible. It is the policy of the PostgreSQL Global Development Group not to make changes to disk formats during minor release upgrades, so it is likely that running different minor release levels on primary and standby servers will work successfully. However, no formal support for that is offered and you are advised to keep primary and standby servers at the same release level as much as possible. When updating to a new minor release, the safest policy is to update the standby servers first — a new minor release is more likely to be able to read WAL files from a previous minor release than vice versa.
In standby mode, the server continuously applies WAL received from the master server. The standby server can read WAL from a WAL archive (see restore_command) or directly from the master over a TCP connection (streaming replication). The standby server will also attempt to restore any WAL found in the standby cluster's pg_xlog directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the master before the restart, but you can also manually copy files to pg_xlog at any time to have them replayed.
At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.
Standby mode is exited and the server switches to normal operation, when a trigger file is found (trigger_file). Before failover, any WAL immediately available in the archive or in pg_xlog will be restored, but no attempt is made to connect to the master.
Set up continuous archiving on the primary to an archive directory accessible from the standby, as described in Section 24.3. The archive location should be accessible from the standby even when the master is down, i.e. it should reside on the standby server itself or another trusted server, not on the master server.
If you want to use streaming replication, set up authentication on the primary server to allow replication connections from the standby server(s); that is, provide a suitable entry or entries in pg_hba.conf with the database field set to replication. Also ensure max_wal_senders is set to a sufficiently large value in the configuration file of the primary server.
Take a base backup as described in Section 24.3.2 to bootstrap the standby server.
To set up the standby server, restore the base backup taken from primary server (see Section 24.3.3). Create a recovery command file recovery.conf in the standby's cluster data directory, and turn on standby_mode. Set restore_command to a simple command to copy files from the WAL archive.
Note: Do not use pg_standby or similar tools with the built-in standby mode described here. restore_command should return immediately if the file does not exist; the server will retry the command again if necessary. See Section 25.4 for using tools like pg_standby.
If you want to use streaming replication, fill in primary_conninfo with a libpq connection string, including the host name (or IP address) and any additional details needed to connect to the primary server. If the primary needs a password for authentication, the password needs to be specified in primary_conninfo as well.
If you're setting up the standby server for high availability purposes, set up WAL archiving, connections and authentication like the primary server, because the standby server will work as a primary server after failover. You will also need to set trigger_file to make it possible to fail over. If you're setting up the standby server for reporting purposes, with no plans to fail over to it, trigger_file is not required.
If you're using a WAL archive, its size can be minimized using the archive_cleanup_command parameter to remove files that are no longer required by the standby server. The pg_archivecleanup utility is designed specifically to be used with archive_cleanup_command in typical single-standby configurations, see Section F.22. Note however, that if you're using the archive for backup purposes, you need to retain files needed to recover from at least the latest base backup, even if they're no longer needed by the standby.
A simple example of a recovery.conf is:
standby_mode = 'on' primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' restore_command = 'cp /path/to/archive/%f %p' trigger_file = '/path/to/trigger_file' archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
You can have any number of standby servers, but if you use streaming replication, make sure you set max_wal_senders high enough in the primary to allow them to be connected simultaneously.
Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they're generated, without waiting for the WAL file to be filled.
Streaming replication is asynchronous, so there is still a small delay between committing a transaction in the primary and for the changes to become visible in the standby. The delay is however much smaller than with file-based log shipping, typically under one second assuming the standby is powerful enough to keep up with the load. With streaming replication, archive_timeout is not required to reduce the data loss window.
If you use streaming replication without file-based continuous archiving, you have to set wal_keep_segments in the master to a value high enough to ensure that old WAL segments are not recycled too early, while the standby might still need them to catch up. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always use the archive to catch up.
To use streaming replication, set up a file-based log-shipping standby server as described in Section 25.2. The step that turns a file-based log-shipping standby into streaming replication standby is setting primary_conninfo setting in the recovery.conf file to point to the primary server. Set listen_addresses and authentication options (see pg_hba.conf) on the primary so that the standby server can connect to the replication pseudo-database on the primary server (see Section 25.2.5.1).
On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.
Set the maximum number of concurrent connections from the standby servers (see max_wal_senders for details).
When the standby is started and primary_conninfo is set correctly, the standby will connect to the primary after replaying all WAL files available in the archive. If the connection is established successfully, you will see a walreceiver process in the standby, and a corresponding walsender process in the primary.
It is very important that the access privileges for replication be set up so that only trusted users can read the WAL stream, because it is easy to extract privileged information from it. Standby servers must authenticate to the primary as a superuser account. So a role with the SUPERUSER and LOGIN privileges needs to be created on the primary.
Client authentication for replication is controlled by a pg_hba.conf record specifying replication in the database field. For example, if the standby is running on host IP 192.168.1.100 and the superuser's name for replication is foo, the administrator can add the following line to the pg_hba.conf file on the primary:
# Allow the user "foo" from host 192.168.1.100 to connect to the primary # as a replication standby if the user's password is correctly supplied. # # TYPE DATABASE USER CIDR-ADDRESS METHOD host replication foo 192.168.1.100/32 md5
The host name and port number of the primary, connection user name, and password are specified in the recovery.conf file. The password can also be set in the ~/.pgpass file on the standby (specify replication in the database field). For example, if the primary is running on host IP 192.168.1.50, port 5432, the superuser's name for replication is foo, and the password is foopass, the administrator can add the following line to the recovery.conf file on the standby:
# The standby connects to the primary that is running on host 192.168.1.50 # and port 5432 as the user "foo" whose password is "foopass". primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
     An important health indicator of streaming replication is the amount
     of WAL records generated in the primary, but not yet applied in the
     standby. You can calculate this lag by comparing the current WAL write
     location on the primary with the last WAL location received by the
     standby. They can be retrieved using
     pg_current_xlog_location on the primary and the
     pg_last_xlog_receive_location on the standby,
     respectively (see Table 9-56 and
     Table 9-57 for details).
     The last WAL receive location in the standby is also displayed in the
     process status of the WAL receiver process, displayed using the
     ps command (see Section 27.1 for details).