好得很程序员自学网

<tfoot draggable='sEl'></tfoot>

PostgreSQL+Pgpool实现HA主备切换的操作

PostgreSQL 流复制实现 HA 主备切换

环境说明和主机规划

操作系统 主机名 主机 角色 端口
CentOS 7 master 10.0.0.11 PG-Master 54321
CentOS 7 slave 10.0.0.12 PG-Slave 54321
CentOS 7 pool 10.0.0.13 pgpool 54321

基础环境配置(所有主机操作)

配置HOSTS

echo -e "10.0.0.11 master\n10.0.0.12 slave\n10.0.0.13 pool" >> /etc/hosts # 执行一次即可

配置统一的时间(若已配置,请忽略)

?

1

2

3

yum install -y ntpdate && ntpdate ntp1.aliyun.com

echo -e "# sync time from ntp1.aliyun.com\n5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com > /dev/null 2>&1

" >> /var/spool/cron/root # 写入定时任务,执行一次即可

创建postgres用户

useradd postgres && echo "your_password" | passwd --stdin postgres

配置免密钥登陆

?

1

2

3

4

5

6

su - postgres

ssh-keygen -t rsa -f /home/postgres/.ssh/id_rsa -P ""

cd ~/.ssh/

ssh-copy-id postgres@master # 三台主机执行

scp authorized_keys postgres@slave:~/.ssh # 只在master主机执行

scp authorized_keys postgres@pool:~/.ssh # 只在master主机执行

安装Postgresql数据库(PG9.6)

yum install -y https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm

yum install -y postgresql96-server postgresql96-contrib postgresql96 postgresql96-libs

创建统一的目录结构

mkdir /data1/pg_{data,bin,logs} -p

chown -R postgres.postgres /data1/

修改系统变量

?

1

2

3

4

5

6

7

vi /etc/profile #增加以下内容

export PGHOME=/usr/pgsql-9.6/

export PGDATA=/data1/pg_data

export PGPORT=54321

export PATH=$PATH:$PGHOME/bin

# 生效

source /etc/profile

PostgreSQL流复制结构(master和slave主机操作)

master主机操作

初始化系统

/usr/pgsql-9.6/bin/postgresql96-setup initdb

vi /usr/lib/systemd/system/postgresql-9.6.service

修改postgresql-9.6.service

内容如下:

?

1

2

3

4

5

# Include the default config:

.include /usr/lib/systemd/system/postgresql-9.6.service

 

[Service]

Environment=PGDATA=/data1/pg_data

重启PG服务

?

1

2

3

4

systemctl daemon-reload

su - postgres -c '/usr/pgsql-9.6/bin/initdb -D /data1/pg_data'

systemctl restart postgresql-9.6

systemctl enable postgresql-9.6.service

修改系统配置(以下用postgres用户操作)

?

1

2

3

4

5

6

7

8

9

10

cp /data1/pg_data/pg_hba.conf{,.bak}

cat >/data1/pg_data/pg_hba.conf<<EOF

local all     all             trust

host all     all       10.0.0.11/32   trust

host all     all       10.0.0.12/32   trust

host all     all       0.0.0.0/0    md5

host all     all       ::1/128     trust

host replication  stream_replication  0.0.0.0/0    md5

EOF

#host replication  stream_replication  0.0.0.0/0    md5 为流复制用户

64G

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

cp /data1/pg_data/postgresql.conf{,.bak}

cat >/data1/pg_data/postgresql.conf<<EOF

listen_addresses = '*'

port = 54321

max_connections = 256

shared_buffers = 16GB

effective_cache_size = 48GB

work_mem = 64MB

maintenance_work_mem = 2GB

min_wal_size = 2GB

max_wal_size = 4GB

checkpoint_completion_target = 0.9

wal_buffers = 16MB

default_statistics_target = 100

wal_level = hot_standby

wal_log_hints = on

max_wal_senders = 1

hot_standby = on

logging_collector = on

log_directory = 'pg_log'

EOF

#操作完记得重启 pg_ctl restart

128G

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

listen_addresses = '*'

port = 54321

max_connections = 256

shared_buffers = 32GB

effective_cache_size = 96GB

work_mem = 128MB

maintenance_work_mem = 2GB

min_wal_size = 2GB

max_wal_size = 4GB

checkpoint_completion_target = 0.9

wal_buffers = 16MB

default_statistics_target = 100

wal_level = hot_standby

wal_log_hints = on

max_wal_senders = 1

hot_standby = on

logging_collector = on

log_directory = 'pg_log'

在主库中创建流复制用户(stream_replication)和PGPool用户(srcheck)

CREATE USER stream_replication replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';

CREATE USER srcheck replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';

修改主库pg_hba.conf文件(已操作见cat >/data1/pg_data/pg_hba.conf<<EOF)

host replication stream_replication 0.0.0.0/0 md5

slave主机操作

初始化系统

/usr/pgsql-9.6/bin/postgresql96-setup initdb

vi /usr/lib/systemd/system/postgresql-9.6.service

修改postgresql-9.6.service

内容如下:

?

1

2

3

4

5

# Include the default config:

.include /usr/lib/systemd/system/postgresql-9.6.service

 

[Service]

Environment=PGDATA=/data1/pg_data

重启PG服务

systemctl daemon-reload

基础备份复制到备库服务器

rm -rf /data1/pg_data # 如果没有重要数据可操作,主要为同步主库路径

su - postgres -c 'pg_basebackup -D $PGDATA --format=p -h master -p 54321 -U stream_replication -W'

修改备库配置信息

cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf

vi $PGDATA/recovery.conf

增加以下内容

?

1

2

3

4

5

6

7

standby_mode= 'on'

primary_conninfo = 'host=master port=54321 user=stream_replication password=your_password'

restore_command = ''

recovery_target_timeline = 'latest'

# 重启PG服务

systemctl restart postgresql-9.6

systemctl enable postgresql-9.6.service

验证

主节点执行

?

1

2

3

create table test (id int4, create_time timestamp (0) without time zone);

insert into test values (1, now());

select * from test;

备节点执行

select * from test;

其他查询

进入测试数据库test,主库上执行如下命令返回f,备库上返回t。 select pg_is_in_recovery();

执行如下命令查看快照,它返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1。

select txid_current_snapshot();

执行如下命令可以查看主备同步状态。

select * from pg_stat_replication;

字段state显示的同步状态有:startup(连接中)、catchup(同步中)、streaming(同步);字段sync_state显示的模式有:async(异步)、sync(同步)、potential(虽然现在是异步模式,但是有可能升级到同步模式)。

主备切换

假设主库崩溃了,备库如何从只读状态切换为读写状态呢?只要把备库的postgresql.conf中hot_standby修改为off,并且删除recovery.conf,然后重启库就可以提供服务了。

PGPool2(pool主机操作)

安装PGPool2

?

1

2

3

yum install -y http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm

yum -y install pgpool-II-pg96 pgpool-II-pg96-debuginfo pgpool-II-pg96-devel pgpool-II-pg96-extensions

systemctl enable pgpool.service #开启自动启动

添加Pgpool-II运行用户

?

1

2

3

useradd postgres # 环境准备时已操作

chown -R postgres.postgres /etc/pgpool-II

chown -R postgres.postgres /var/run/pgpool/

配置pool_hba.conf

cp /etc/pgpool-II/pool_hba.conf{,.bak}

vi /etc/pgpool-II/pool_hba.conf

增加内容

host all all 0.0.0.0/0 md5

配置pcp.conf

主节点登陆后执行:

?

1

2

3

4

5

6

7

8

postgres=# select rolname,rolpassword from pg_authid;

   rolname  |    rolpassword   

--------------------+-------------------------------------

  pg_signal_backend |

  srcheck   | md5662c10f61b27a9ab38ce69157186b25f

  postgres   | md5d3612d57ee8d4c147cf27b11e3a0974d

  stream_replication | md59279ef6b904bc483e4f85e6d44cfc0ed

(4 rows )

vi /etc/pgpool-II/pool_passwd

增加SQL执行结果的内容,形式为$rolname:$rolpassword例如:

srcheck:md5662c10f61b27a9ab38ce69157186b25f

或者:

pg_md5 -u postgres your_password

vi /etc/pgpool-II/pcp.conf ## 加入 postgres:上一命令的输出

配置pgpool.conf

cp /etc/pgpool-II/pgpool.conf{,.bak}

vi /etc/pgpool-II/pgpool.conf

内容如下:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

# CONNECTIONS

 

listen_addresses = '*'

port = 54321

socket_dir = '/var/run/pgpool'

pcp_listen_addresses = '*'

pcp_port = 9898

pcp_socket_dir = '/var/run/pgpool'

 

# - Backend Connection Settings -

 

backend_hostname0 = 'master'

backend_port0 = 54321

backend_weight0 = 1

backend_data_directory0 = '/data1/pg_data'

backend_flag0 = 'ALLOW_TO_FAILOVER'

 

backend_hostname1 = 'slave'

backend_port1 = 54321

backend_weight1 = 1

backend_data_directory1 = '/data1/pg_data'

backend_flag1 = 'ALLOW_TO_FAILOVER'

 

# - Authentication -

 

enable_pool_hba = on

pool_passwd = 'pool_passwd'

 

# FILE LOCATIONS

 

pid_file_name = '/var/run/pgpool/pgpool.pid'

logdir = '/data1/pg_logs'

 

replication_mode = off

load_balance_mode = on

master_slave_mode = on

master_slave_sub_mode = 'stream'

 

sr_check_period = 5

sr_check_user = 'srcheck'

sr_check_password = '123456'

sr_check_database = 'postgres'

 

# HEALTH CHECK 健康检查

 

health_check_period = 10

health_check_timeout = 20

health_check_user = 'srcheck'

health_check_password = '123456'

health_check_database = 'postgres'

 

# FAILOVER AND FAILBACK

 

failover_command = '/data1/pg_bin/failover_stream.sh %H'

failover_stream.sh脚本

?

1

2

3

4

5

6

vim /data1/pg_bin/failover_stream.sh

chmod 777 /data1/pg_bin/failover_stream.sh

chmod u+s /sbin/ifconfig

chmod u+s /usr/sbin

pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 & ## 启动

pgpool -m fast stop ## 关闭

failover_stream.sh内容:

?

1

2

3

4

5

6

7

8

9

10

11

#! /bin/sh

# Failover command for streaming replication.

# Arguments: $1: new master hostname.

 

new_master=$1

trigger_command= "$PGHOME/bin/pg_ctl promote -D $PGDATA"

 

# Prompte standby database .

/usr/bin/ssh -T $new_master $trigger_command

 

exit 0;

登陆设置

当执行pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &后可查看集群状态:

?

1

2

3

4

5

6

7

8

[postgres@pool pgpool-II]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres

 

postgres=# show pool_nodes;

  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay

---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------

  0  | master | 54321 | up  | 0.500000 | primary | 0   | false     | 0

  1  | slave | 54321 | up  | 0.500000 | standby | 0   | true     | 0

(2 rows )

如果未发现集群状态,请在master和slave主机分别执行以下操作:

?

1

2

3

[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 0

[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 1

#详情查询命令pcp_attach_node

HA切换

模拟master主机宕机

Master端:

?

1

2

3

[postgres@master ~]$ pg_ctl stop

waiting for server to shut down.... done

server stopped

当前集群状态

?

1

2

3

4

5

6

7

8

9

10

[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres

psql (9.6.1)

Type "help" for help.

 

postgres=# show pool_nodes;

  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay

---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------

  0  | master | 5432 | down| 0.500000 | standby | 0    | false | 0

  1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0

(2 rows )

发现master已经是standby了,且down机了

修改master,启动

当master主机宕机后,此时slave主机PG数据库成为主库,修改master成为slave的从库即可

?

1

2

3

4

5

[postgres@master ~]$ vim recovery.conf

standby_mode= 'on'

primary_conninfo = 'host=slave port=54321 user=stream_replication password=your_password'

restore_command = ''

recovery_target_timeline = 'latest'

同步时间线

?

1

2

3

4

5

6

7

#如果报时间线冲突落后,先停掉pg服务,然后执行同步时间线,否知直接看状态

[postgres@master ~]$ pg_rewind --target-pgdata=/data1/pg_data --source-server='host=slave port=54321 user=postgres dbname=postgres'

servers diverged at WAL position 0/5000098 on timeline 1

rewinding from last common checkpoint at 0/5000028 on timeline 1

Done!

# 重新启动数据库

[postgres@master ~]$ pg_ctl start

再次查看当前状态

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres

postgres=# show pool_nodes;

  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay

---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------

  0  | master | 5432 | down| 0.500000 | standby | 0    | false | 0

  1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0

(2 rows )

 

#注意虽然master已经启动了,但是还是down,需要手动将master节点添加进pgpool,master的node_id是0,所以-n 0

[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 54321 -n 0

#提示输入密码,输入pcp管理密码

#查看当前状态

[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres

postgres=# show pool_nodes;

  node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay

---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------

  0  | master | 5432 | up | 0.500000 | standby | 0    | false | 0

  1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0

(2 rows )

现在两个节点都是up了。

主从两节点pgpool健康检查脚本(pgpool_check.sh)

说明:此脚本是基于PGpool只安装到master和slave两个主机上的情况下使用,在master主机有了pgpool进程后,可在slave主机执行sh pgpool_check.sh & 即可

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

#! /bin/bash

# Check Master host pgpool-process

 

while true

do

  pgcount=$(nmap 10.0.0.11|egrep '9898|9999' |wc -l)

 

  if [ $pgcount -eq 2 ] ; then

   echo 'Master host pgpool is GOOD!!!' > /dev/ null 2>&1

  else

   echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"

   echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"

   echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"

   echo -e "SYSTEM WILL DO THE SHELL : \033[34m su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &' \033[0m"

   su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &'

   pgport=$(netstat -lntup|egrep '9898|9999' |wc -l)

   [ $pgport -gt 0 ] && echo -e "Slave host pgpool is \033[32m RUNNING!!! \033[0m"

   exit 0

  fi

done

以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。

原文链接:https://blog.csdn.net/weixin_42509278/article/details/81484655

查看更多关于PostgreSQL+Pgpool实现HA主备切换的操作的详细内容...

  阅读:50次