heartbeat 配置

lvinie

浏览: 110881 次
性别:
来自: 郑州

最近访客更多访客>>

caoyxpsj

yqztgc123

zihai367

pingxu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

linux
tool

1、软件下载

heartbeat-1.2.3.tar.gz和相关依赖软件(如果已经安装了开发工具包，一般只需要下载libnet-1.1.0-1.fr.c.1.um.1.i386.rpm即可)

www.linux-ha.org

2、软件安装

rpm –ivh libnet-1.1.0-1.fr.c.1.um.1.i386.rpm

tar heartbeat-1.2.3.tar.gz

./configure

make

make install

3、软件配置

如果不指定安装路径，默认情况下，配置文件在/usr/local/etc/ha.d/下，而相关可执行程序在/usr/local/lib/heartbeat/下。主要的配置文件有3个，aukeys,ha.cf和haresources。下面具体说一下这3个文件的具体功能以及配置。

在说明配置之前先大致阐述一下heartbeat的工作原理：heartbeat最核心的包括两个部分，心跳监测部分和资源接管部分，心跳监测可以通过网络链路和串口进行，而且支持冗余链路，目前1.2.3版本只支持2个节点间的监测和备份(

release 2将支持多个节点，可惜正在开发之中),它们之间相互发送报文来告诉对方自己当前的状态，如果在指定的时间内未受到对方发送的报文，那么就认为对方失效，这时需启动资源接管模块来接管运行在对方主机上的资源或者服务。

3.1 authkeys

heartbeat的认证配置文件

# Authentication file. Must be mode 600

# Must have exactly one auth directive at the front.

# auth send authentication using this method-id

# Then, list the method and key that go with that method-id

# Available methods: crc sha1, md5. Crc doesn't need/want a key.

# You normally only have one authentication method-id listed in this file

# Put more than one to make a smooth transition when changing auth

# methods and/or keys.

# sha1 is believed to be the "best", md5 next best.

# crc adds no security, except from packet corruption.

# Use only on physically secure networks.

#auth 1

#1 crc

#2 sha1 HI!

#3 md5 Hello!

注释说得很清楚，在这里我还是解释一下，该文件主要是用于集群中两个节点的认证，采用的算法和密钥(如果有的话)在集群中节点上必须相同，目前提供了3种算法：md5,sha1和crc。其中crc不能够提供认证，它只能够用于校验数据包是否损坏，而sha1,md5需要一个密钥来进行认证，从资源消耗的角度来讲，md5消耗的比较多，sha1次之，因此建议一般使用sha1算法。

我们如果要采用sha1算法，只需要将authkeys中的auth 指令(去掉注释符)改为2，而对应的2 sha1行则需要去掉注释符(#)，后面的密钥自己改变(两节点上必须相同)。改完之后，保存，同时需要改变该文件的属性为600，否则heartbeat启动将失败。具体命令为：chmod 600 authkeys

3.2 ha.cf

heartbeat的主要配置文件，由于该文件比较大，我的注释就直接写在相关地方了，如果我们要采用哪个配置选项(或指令)，只需要去掉前面的注释符即可。

# There are lots of options in this file. All you have to have is a set

# of nodes listed {"node ...} one of {serial, bcast, mcast, or ucast},

# and a value for "auto_failback".

# ATTENTION: As the configuration file is read line by line,

# THE ORDER OF DIRECTIVE MATTERS!

# In particular, make sure that the udpport, serial baud rate

# etc. are set before the heartbeat media are defined!

# debug and log file directives go into effect when they

# are encountered.

# All will be fine if you keep them ordered as in this example.

# Note on logging:

# If any of debugfile, logfile and logfacility are defined then they

# will be used. If debugfile and/or logfile are not defined and

# logfacility is defined then the respective logging and debug

# messages will be loged to syslog. If logfacility is not defined

# then debugfile and logfile will be used to log messges. If

# logfacility is not defined and debugfile and/or logfile are not

# defined then defaults will be used for debugfile and logfile as

# required and messages will be sent there.

# File to write debug messages to 用于记录heartbeat的调试信息

#debugfile /var/log/ha-debug

# File to write other messages to 用于记录heartbeat的日志信息

#logfile /var/log/ha-log

# Facility to use for syslog()/logger

#如果未定义上述的日志文件，那么日志信息将送往local0(对应的#/var/log/messages)，如果这3个日志文件都未定义，那么heartbeat默认情况下

#将在/var/log下建立ha-debug和ha-log来记录相应的日志信息。

logfacility local0

# A note on specifying "how long" times below...

# The default time unit is seconds

# 10 means ten seconds

# You can also specify them in milliseconds

# 1500ms means 1.5 seconds

# keepalive: how long between heartbeats?

#发送心跳报文的间隔，默认单位为秒，如果你毫秒为单位，那么需要在后面跟

#ms单位，如1500ms即代表1.5s

#keepalive 2

# deadtime: how long-to-declare-host-dead?

# If you set this too low you will get the problematic

# split-brain (or cluster partition) problem.

# See the FAQ for how to use warntime to tune deadtime.

#用于配置认为对方节点菪掉的间隔

#deadtime 30

# warntime: how long before issuing "late heartbeat" warning?

# See the FAQ for how to use warntime to tune deadtime.

#发出最后的心跳警告报文的间隔

#warntime 10

# Very first dead time (initdead)

# On some machines/OSes, etc. the network takes a while to come up

# and start working right after you've been rebooted. As a result

# we have a separate dead time for when things first come up.

# It should be at least twice the normal dead time.

#网络启动的时间

#initdead 120

# What UDP port to use for bcast/ucast communication?

#广播/单播通讯使用的udp端口

#udpport 694

# Baud rate for serial ports...

#串口通讯的波特率

#baud 19200

#使用的串口设备，在linux上即为/dev/ttyS0(1,2,3…)

# serial serialportname ...

#serial /dev/ttyS0 # Linux

#serial /dev/cuaa0 # FreeBSD

#serial /dev/cua/a # Solaris

# What interfaces to broadcast heartbeats over?

#心跳所使用的网络接口

#bcast eth0 # Linux

#bcast eth1 eth2 # Linux

#bcast le0 # Solaris

#bcast le1 le2 # Solaris

# Set up a multicast heartbeat medium

# mcast [dev] [mcast group] [port] [ttl] [loop]

# [dev] device to send/rcv heartbeats on

# [mcast group] multicast group to join (class D multicast address

# 224.0.0.0 - 239.255.255.255)

# [port] udp port to sendto/rcvfrom (set this value to the

# same value as "udpport" above)

# [ttl] the ttl value for outbound heartbeats. this effects

# how far the multicast packet will propagate. (0-255)

# Must be greater than zero.

# [loop] toggles loopback for outbound multicast heartbeats.

# if enabled, an outbound packet will be looped back and

# received by the interface it was sent on. (0 or 1)

# Set this value to zero.

#如果采用组播通讯，在这里可以设置组播通讯所使用的接口，绑定的组播ip地#址(在224.0.0.0 - 239.255.255.255间)，通讯端口，ttl(time to live)所能经过路由的#跳数，是否允许环回(也就是本地发出的数据包时候还接收)

#mcast eth0 225.0.0.1 694 1 0

# Set up a unicast / udp heartbeat medium

# ucast [dev] [peer-ip-addr]

# [dev] device to send/rcv heartbeats on

# [peer-ip-addr] IP address of peer to send packets to

#如果采用单播，那么可以配置其网络接口以及所使用的ip地址

#ucast eth0 192.168.1.2

# About boolean values...

# Any of the following case-insensitive values will work for true:

# true, on, yes, y, 1

# Any of the following case-insensitive values will work for false:

# false, off, no, n, 0

# auto_failback: determines whether a resource will

# automatically fail back to its "primary" node, or remain

# on whatever node is serving it until that node fails, or

# an administrator intervenes.

# The possible values for auto_failback are:

# on - enable automatic failbacks

# off - disable automatic failbacks

# legacy - enable automatic failbacks in systems

# where all nodes do not yet support

# the auto_failback option.

# auto_failback "on" and "off" are backwards compatible with the old

# "nice_failback on" setting.

# See the FAQ for information on how to convert

# from "legacy" to "on" without a flash cut.

# (i.e., using a "rolling upgrade" process)

# The default value for auto_failback is "legacy", which

# will issue a warning at startup. So, make sure you put

# an auto_failback directive in your ha.cf file.

# (note: auto_failback can be any boolean or "legacy")

#用于决定，当拥有该资源的属主恢复之后，资源是否变迁：是迁移到属主上，

#还是在当前节点上继续运行，直到当前节点出现故障。

auto_failback on

# Basic STONITH support

# Using this directive assumes that there is one stonith

# device in the cluster. Parameters to this device are

# read from a configuration file. The format of this line is:

# stonith <stonith_type> <configfile>

# NOTE: it is up to you to maintain this file on each node in the

# cluster!

#用于共享资源的集群环境中，采用stonith防御技术来保证数据的一致性

#stonith baytech /etc/ha.d/conf/stonith.baytech

# STONITH support

# You can configure multiple stonith devices using this directive.

# The format of the line is:

# stonith_host <hostfrom> <stonith_type> <params...>

# <hostfrom> is the machine the stonith device is attached

# to or * to mean it is accessible from any host.

# <stonith_type> is the type of stonith device (a list of

# supported drives is in /usr/lib/stonith.)

# <params...> are driver specific parameters. To see the

# format for a particular device, run:

# stonith -l -t <stonith_type>

# Note that if you put your stonith device access information in

# here, and you make this file publically readable, you're asking

# for a denial of service attack ;-)

# To get a list of supported stonith devices, run

# stonith -L

# For detailed information on which stonith devices are supported

# and their detailed configuration options, run this command:

# stonith -h

#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword

#stonith_host ken3 rps10 /dev/ttyS1 kathy 0

#stonith_host kathy rps10 /dev/ttyS1 ken3 0

# Watchdog is the watchdog timer. If our own heart doesn't beat for

# a minute, then our machine will reboot.

# NOTE: If you are using the software watchdog, you very likely

# wish to load the module with the parameter "nowayout=0" or

# compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even

# an orderly shutdown of heartbeat will trigger a reboot, which is

# very likely NOT what you want.

#该指令是用于设置看门狗定时器，如果节点一分钟内都没有心跳，那么节点将

#重新启动

#watchdog /dev/watchdog

# Tell what machines are in the cluster

# node nodename ... -- must match uname –n

#设置集群中的节点，注意：节点名必须与uname –n相匹配

#node ken3

#node kathy

# Less common options...

# Treats 10.10.10.254 as a psuedo-cluster-member

# Used together with ipfail below...

#ping指令以及下面的ping_group指令是用于建立伪集群成员，它们必须与下述#的ipfail指令一起使用，它们的作用是监测物理链路，也就是说如果集群节点

#与上述伪设备不相通，那么该节点也将无权接管资源或服务，它将释放掉资源。

#ping 10.10.10.254

# Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member

# called group1. If either 10.10.10.254 or 10.10.10.253 are up

# then group1 is up

# Used together with ipfail below...

#ping_group group1 10.10.10.254 10.10.10.253

# Processes started and stopped with heartbeat. Restarted unless

# they exit with rc=100

#可以定义与heartbeat一起启动和停止的进程

#respawn userid /path/name/to/run

#respawn hacluster /usr/lib/heartbeat/ipfail

# Access control for client api

# default is no access

#设置你所指定的启动进程的权限

#apiauth client-name gid=gidlist uid=uidlist

#apiauth ipfail gid=haclient uid=hacluster

###########################

#下面是一些非常用选项，在这里就不祥述了

# Unusual options.

###########################

# hopfudge maximum hop count minus number of nodes in config

#hopfudge 1

# deadping - dead time for ping nodes

#deadping 30

# hbgenmethod - Heartbeat generation number creation method

# Normally these are stored on disk and incremented as needed.

#hbgenmethod time

# realtime - enable/disable realtime execution (high priority, etc.)

# defaults to on

#realtime off

# debug - set debug level

# defaults to zero

#debug 1

# API Authentication - replaces the fifo-permissions-based system of the past

# You can put a uid list and/or a gid list.

# If you put both, then a process is authorized if it qualifies under either

# the uid list, or under the gid list.

# The groupname "default" has special meaning. If it is specified, then

# this will be used for authorizing groupless clients, and any client groups

# not otherwise specified.

#apiauth ipfail uid=hacluster

#apiauth ccm uid=hacluster

#apiauth ping gid=haclient uid=alanr,root

#apiauth default gid=haclient

# message format in the wire, it can be classic or netstring, default is classic

#msgfmt netstring

3.3 haresource

heartbeat的资源配置文件

# This is a list of resources that move from machine to machine as

# nodes go down and come up in the cluster. Do not include

# "administrative" or fixed IP addresses in this file.

# <VERY IMPORTANT NOTE>

# The haresources files MUST BE IDENTICAL on all nodes of the cluster.

# The node names listed in front of the resource group information

# is the name of the preferred node to run the service. It is

# not necessarily the name of the current machine. If you are running

# auto_failback ON (or legacy), then these services will be started

# up on the preferred nodes - any time they're up.

# If you are running with auto_failback OFF, then the node information

# will be used in the case of a simultaneous start-up, or when using

# the hb_standby {foreign,local} command.

# BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.

# If your files are different then almost certainly something

# won't work right.

# </VERY IMPORTANT NOTE>

# We refer to this file when we're coming up, and when a machine is being

# taken over after going down.

# You need to make this right for your installation, then install it in

# /etc/ha.d

# Each logical line in the file constitutes a "resource group".

# A resource group is a list of resources which move together from

# one node to another - in the order listed. It is assumed that there

# is no relationship between different resource groups. These

# resource in a resource group are started left-to-right, and stopped

# right-to-left. Long lists of resources can be continued from line

# to line by ending the lines with backslashes ("").

# These resources in this file are either IP addresses, or the name

# of scripts to run to "start" or "stop" the given resource.

# The format is like this:

#node-name resource1 resource2 ... resourceN

# If the resource name contains an :: in the middle of it, the

# part after the :: is passed to the resource script as an argument.

# Multiple arguments are separated by the :: delimeter

# In the case of IP addresses, the resource script name IPaddr is

# implied.

# For example, the IP address 135.9.8.7 could also be represented

# as IPaddr::135.9.8.7

# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

# The given IP address is directed to an interface which has a route

# to the given address. This means you have to have a net route

# set up outside of the High-Availability structure. We don't set it

# up here -- we key off of it.

# The broadcast address for the IP alias that is created to support

# an IP address defaults to the highest address on the subnet.

# The netmask for the IP alias that is created defaults to the same

# netmask as the route that it selected in in the step above.

# The base interface for the IPalias that is created defaults to the

# same netmask as the route that it selected in in the step above.

# If you want to specify that this IP address is to be brought up

# on a subnet with a netmask of 255.255.255.0, you would specify

# this as IPaddr::135.9.8.7/24 .

# If you wished to tell it that the broadcast address for this subnet

# was 135.9.8.210, then you would specify that this way:

# IPaddr::135.9.8.7/24/135.9.8.210

# If you wished to tell it that the interface to add the address to

# is eth0, then you would need to specify it this way:

# IPaddr::135.9.8.7/24/eth0

# And this way to specify both the broadcast address and the

# interface:

# IPaddr::135.9.8.7/24/eth0/135.9.8.210

# The IP addresses you list in this file are called "service" addresses,

# since they're they're the publicly advertised addresses that clients

# use to get at highly available services.

# For a hot/standby (non load-sharing) 2-node system with only

# a single service address,

# you will probably only put one system name and one IP address in here.

# The name you give the address to is the name of the default "hot"

# system.

# Where the nodename is the name of the node which "normally" owns the

# resource. If this machine is up, it will always have the resource

# it is shown as owning.

# The string you put in for nodename must match the uname -n name

# of your machine. Depending on how you have it administered, it could

# be a short name or a FQDN.

#-------------------------------------------------------------------

# Simple case: One service address, default subnet and netmask

# No servers that go up and down with the IP address

#just.linux-ha.org 135.9.216.110

#-------------------------------------------------------------------

# Assuming the adminstrative addresses are on the same subnet...

# A little more complex case: One service address, default subnet

# and netmask, and you want to start and stop http when you get

# the IP address...

#just.linux-ha.org 135.9.216.110 http

#-------------------------------------------------------------------

# A little more complex case: Three service addresses, default subnet

# and netmask, and you want to start and stop http when you get

# the IP address...

#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd

#-------------------------------------------------------------------

# One service address, with the subnet, interface and bcast addr

# explicitly defined.

#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd

#-------------------------------------------------------------------

# An example where a shared filesystem is to be used.

# Note that multiple aguments are passed to this script using

# the delimiter '::' to separate each argument.

#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2

# Regarding the node-names in this file:

# They must match the names of the nodes listed in ha.cf, which in turn

# must match the `uname -n` of some node in the cluster. So they aren't

# virtual in any sense of the word.

上面是haresource文件，该文件主要是为你部署的集群配置资源或者服务，它的每一有效行的格式如下：

node-name resource1 resource2 ... resourceN

其中node-name即为集群中某一节点的名称，必须与uname –n相同，

后面的资源组resource1 resource2 …resourceN中每一个资源都是一个shell脚本，它们的搜索路径为/etc/init.d/和/usr/local/etc/ha.d/resource.d(该路径根据你所安装heartbeat的路径有所不同)，heartbeat为我们提供了一个非常好的资源扩展框架，如果我们需要控制一种自己的资源，只需要实现一个支持start和stop参数的shell脚本就可以了，目前heartbeat所支持的资源脚本可以在我提供的上述路径中去查看。

分享到：

tomcat start stop 脚本 | linux常用命令

2011-09-30 17:02
浏览 1197
评论(4)
分类:开源软件
查看更多

4 楼 lvinie 2013-07-05

Ray_Mysterio 写道

请问一下，我现在安装heartbeat的时候出现了一个问题。两台虚拟机（主：192.168.1.55，从：192.168.1.58），虚拟ip是192.168.1.87.
安装完成之后，我只能在55机器通过浏览器访问192.168.1.87，或者停掉55号机器以后，可以在58机器通过浏览器访问192.168.1.87.但是只能这样在本机访问，不能在其他机器访问192.168.1.87.请问啥原因呢？

不好意思。好久没登录。刚看到留言

其他的ip网段是什么，在不在同一个网段，防火墙设置之类看下是不是有问题

3 楼 Ray_Mysterio 2013-05-02

2 楼 Ray_Mysterio 2013-05-02

1 楼 Ray_Mysterio 2013-05-02

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论