11月31,长山发现varitas复制不正常,那时的状态如下:
# ./vvrstate
Replicated Data Set: repvg
Primary:
HostName: vvr440 <localhost>
RvgName: repvg
DgName: data
datavol_cnt: 1
srl: srl_vol
RLinks:
name=rlk_vvr240_repvg, detached=off, synchronous=override
Secondary:
HostName: vvr240
RvgName: repvg
DgName: data
datavol_cnt: 1
srl: srl_vol
RLinks:
name=rlk_vvr440_repvg, detached=off, synchronous=override
Fri Oct 31 07:58:36 GMT 2008
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_vvr240_repvg. DCM co
ntains 12834400 Kbytes.
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_vvr240_repvg. DCM co
ntains 12834400 Kbytes.
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_vvr240_repvg. DCM co
ntains 12834400 Kbytes.
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_vvr240_repvg. DCM co
ntains 12834400 Kbytes.
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_vvr240_repvg. DCM co
ntains 12834400 Kbytes.
从状态的情况来看,和以前正常状态的情况差不多,只是数据量一直在增加,当时我想同步可能需要一个过程,就坚持看了10分钟,发现数据量还是在不断的增加,我马上让长山帮我登陆v440查看了一下情况
# vradmin printrvg
Replicated Data Set: repvg
Primary:
HostName: vvr440 <localhost>
RvgName: repvg
DgName: data
Secondary:
HostName: vvr240
RvgName: repvg
DgName: data
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 auto:none - - online invalid
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:none - - online invalid
c1t3d0s2 auto:none - - online invalid
c3t5d0s2 auto:cdsdisk c3t5d0 data online
c3t5d1s2 auto:cdsdisk c3t5d1 data online
c3t5d2s2 auto:cdsdisk c3t5d2s2 v440fs online
# vxprint -g data -v
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v ora_vol1 repvg ENABLED 314558464 - ACTIVE - -
v srl_vol repvg ENABLED 209702912 SRL ACTIVE - -
# vxprint
Disk group: v440fs
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
dg v440fs v440fs - - - - - -
dm c3t5d2s2 c3t5d2s2 - 1077868288 - - - -
v v440fs_vol1 fsgen ENABLED 1077866496 - ACTIVE - -
pl v440fs_vol1-01 v440fs_vol1 ENABLED 1077866496 - ACTIVE - -
sd c3t5d2s2-01 v440fs_vol1-01 ENABLED 1077866496 0 - - -
Disk group: data
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
dg data data - - - - - -
dm c3t5d0 c3t5d0s2 - 209704704 - - - -
dm c3t5d1 c3t5d1s2 - 314560256 - - - -
rv repvg - ENABLED - - ACTIVE - -
rl rlk_vvr240_repvg repvg CONNECT - - ACTIVE - -
v ora_vol1 repvg ENABLED 314558464 - ACTIVE - -
pl ora_vol1-01 ora_vol1 ENABLED 314558464 - ACTIVE - -
sd c3t5d1-01 ora_vol1-01 ENABLED 314558464 0 - - -
pl ora_vol1-02 ora_vol1 ENABLED LOGONLY - ACTIVE - -
sd c3t5d0-02 ora_vol1-02 ENABLED 512 LOG - - -
pl ora_vol1-03 ora_vol1 ENABLED LOGONLY - ACTIVE - -
sd c3t5d1-02 ora_vol1-03 ENABLED 512 LOG - - -
v srl_vol repvg ENABLED 209702912 SRL ACTIVE - -
pl srl_vol-01 srl_vol ENABLED 209702912 - ACTIVE - -
sd c3t5d0-01 srl_vol-01 ENABLED 209702912 0 - - -
# vradmin -g data repstatus repvg
Replicated Data Set: repvg
Primary:
Host name: vvr440
RVG name: repvg
DG name: data
RVG state: enabled for I/O
Data volumes: 1
SRL name: srl_vol
SRL size: 99.99 G
Total secondaries: 1
Secondary:
Host name: vvr240
RVG name: repvg
DG name: data
Data status: consistent, behind
Replication status: logging to DCM (needs dcm resynchronization)
Current mode: asynchronous
Logging to: DCM (contains 12841600 Kbytes) (SRL protection log
ging)
Timestamp Information: N/A
从以上情况来看,复制卷repvg的状态正常,数据卷ora_vol1和日志卷srl_vol的状态正常,两个组data,v440fs的状态正常,随后又登陆了v240看了一下卷,组的状态发现正常。当时想用命令停止和启动一下varitas看看能否正常,在停止varitas的时候发现不能停止,情况如下
# vradmin -g data stoprep repvg
VxVM VVR vradmin WARNING V-5-52-92 Secondary data volumes will become out-of-da
te.
vradmin: Continue with stoprep (y/n)? y
Message from Primary:
VxVM VVR vxrlink ERROR V-5-1-6467 Data volumes are in use.
If detaching is necessary, use the force detach option (-f). Before restarting r
eplication a complete synchronization of the secondary data volumes must then be
performed.
# vradmin -g data -a startrep repvg
Message from Primary:
VxVM VVR vxrlink ERROR V-5-1-3531 Rlink rlk_vvr240_repvg is already attached
后来我给资旗源公司的张工打电话说了一下情况,他说数据量大的话同步需要一段时间,让多观
察一下,过了一两天,发现数据量一直在增加,随后又给资旗源打电话说了一下情况,他们说可能是软件故障,让重新启动一下服务器,考虑到重新启动v440的时候可能出现v440不能起来的情况
(现在v440上已经出现一块系统引导盘出现坏块现象)给闫老师说了一下目前的情况,我们又停止,启动了一下varitas一下,情况和上面的情况一样,那天我们又用强制停止的命令试了一下也不成功(其实那次是用的强制命令不对的原因),当时我们考虑到可能是数据库正在运行,varitas正在写数据的原因,所以决定次日(11月4日)中午停一下数据库,停止和启动一下varitas看看情况,
在11月4日中午,虽然在服务器上ping 存储的ip地址192.168.1.60都能通,但是我考虑到v240和v440 之间的存储可能因为软件故障而停止agent通信(此前我登陆navispherehttp://10.0.1.64发现hosts的下面提示v240[10.0.1.61] agent is not reachable而且之前也出现过不可管理的的状态,当时我也问过长山是否正常,他说正常不过我当时没看此时也可能已经不正常了,出现agent is not reachable时,我登陆v440 和v240 cd /opt/Navisphere/bin 运行vradmin -h 10.0.1.64 getagent命令均不能显示信息),也是想把此次成功的故障降到最低,于是想了一下具体的过程:
1 由于240能正常关闭的启动,所以先关闭240
2 关闭与240连接的存储
3 开与240连接的存储
4 开240服务器,然后通过440观察复制的状态,如果正常的话确实说明是与240之间有通信软故障
如果不正常的话,停止oracle数据库后,再停止和启动varitas软件观察一下复制状态是否正常
如果正常的话说明是varitas软件故障的问题,如果不能解决问题考虑第5步
5 重新启动440的服务器和存储
中午的时候,让长山关闭240,我到机房的时候观察到连接240的存储的一块硬盘亮了黄灯,后面的电源和风扇模块亮了黄灯,我关存储又打开发现存储刚才亮黄灯的那块硬盘的黄灯消失,而电源和风扇模块仍亮了黄灯,但后来我就发现出现亮黄灯是由于供电模块的问题(以前供电模块已经有问题,只是在那个时候连接到电源和风扇模块上存储正常),然后隔开供电模块,使存储正常时,启动存储,启动240,登陆navisphere查看情况,发现v240[10.0.1.61] agent is not reachable消失,出现正常的v240[10.0.1.61] ,agent问题解决。查看复制状态仍不正常,仍不能停止varitas,于是停数据库,用强制停止varitas命令:
vradmin -g data -f stoprep repvg,停止了,然后又启动varitas 命令为vradmin -g data -a startrep repvg变为自动同步状态,经过一夜的复制,已经同步。不过在不停数据库的时候,也应该能用varitas强制命令停止,只不过前几天命令没用对,现在情况如下
Replicated Data Set: repvg
Primary:
HostName: vvr440 <localhost>
RvgName: repvg
DgName: data
datavol_cnt: 1
srl: srl_vol
RLinks:
name=rlk_vvr240_repvg, detached=off, synchronous=override
Secondary:
HostName: vvr240
RvgName: repvg
DgName: data
datavol_cnt: 1
srl: srl_vol
RLinks:
name=rlk_vvr440_repvg, detached=off, synchronous=override
Wed Nov 5 10:51:14 GMT 2008
VxVM VVR vxrlink INFO V-5-1-4467 Rlink rlk_vvr240_repvg is up to date
VxVM VVR vxrlink INFO V-5-1-4467 Rlink rlk_vvr240_repvg is up to date
评论