EXADATA升级—从11.2.3.1.0到11.2.3.3.0–(3)升级存储节点

升级存储节点的IMAGE之前,需要对环境做check。这里选择使用计算节点作为主要操作对象。

1.检查各个cells节点之间root用户的安全信任关系

[root@gxx2db01 tmp]# dcli -g all_group -l root date
gxx2db01: Sat Sep  6 12:14:41 CST 2014
gxx2db02: Sat Sep  6 12:14:40 CST 2014
gxx2cel01: Sat Sep  6 12:14:41 CST 2014
gxx2cel02: Sat Sep  6 12:14:41 CST 2014
gxx2cel03: Sat Sep  6 12:14:41 CST 2014
[root@gxx2db01 tmp]# dcli -g cell_group -l root 'hostname -i'
gxx2cel01: 10.100.84.104
gxx2cel02: 10.100.84.105
gxx2cel03: 10.100.84.106
2.检测磁盘组属性disk_repair_time配置

[grid@gxx2db02 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 6 12:20:14 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a
where dg.group_number=a.group_number and a.name='disk_repair_time';  
NAME              VALUE
-------          -----
DATA_GXX2       3.6h
DBFS_DG         3.6h
RECO_GXX2       3.6h

这里的时间是3.6个小时,修改这个主要是为了避免升级过程中达到缺省的3.6小时后在cell节点执行删除griddisk的操作。如果发生删除了griddisk的情况,那么,需要升级完成后手工添加这些磁盘组。这里先把它修改成24个小时吧。

SQL> alter diskgroup DATA_GXX2 set attribute 'disk_repair_time'='24h';
Diskgroup altered.

SQL> alter diskgroup DBFS_DG set attribute 'disk_repair_time'='24h';
Diskgroup altered.

SQL> alter diskgroup RECO_GXX2 set attribute 'disk_repair_time'='24h';
Diskgroup altered.

SQL> select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a
where dg.group_number=a.group_number and a.name='disk_repair_time';  
NAME              VALUE
-------          -----
DATA_GXX2        24h
DBFS_DG          24h
RECO_GXX2        24h
3.检查操作系统的内核版本

root@gxx2db01 tmp]# dcli -g all_group -l root 'uname -a'
gxx2db01: Linux gxx2db01.gx.csg.cn 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
gxx2db02: Linux gxx2db02.gx.csg.cn 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
gxx2cel01: Linux gxx2cel01.gx.csg.cn 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
gxx2cel02: Linux gxx2cel02.gx.csg.cn 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
gxx2cel03: Linux gxx2cel03.gx.csg.cn 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
4.检查操作系统版本

[root@gxx2db01 tmp]# dcli -g all_group -l root 'cat /etc/oracle-release'
gxx2db01: Oracle Linux Server release 5.7
gxx2db02: Oracle Linux Server release 5.7
gxx2cel01: Oracle Linux Server release 5.7
gxx2cel02: Oracle Linux Server release 5.7
gxx2cel03: Oracle Linux Server release 5.7
5.检查IMAGE版本

[root@gxx2db01 tmp]# dcli -g all_group -l root 'imageinfo'
gxx2db01:
gxx2db01: Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
gxx2db01: Image version: 11.2.3.1.0.120304
gxx2db01: Image activated: 2002-05-03 22:47:44 +0800
gxx2db01: Image status: success
gxx2db01: System partition on device: /dev/mapper/VGExaDb-LVDbSys1
gxx2db01:
gxx2db02:
gxx2db02: Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
gxx2db02: Image version: 11.2.3.1.0.120304
gxx2db02: Image activated: 2012-05-03 11:29:41 +0800
gxx2db02: Image status: success
gxx2db02: System partition on device: /dev/mapper/VGExaDb-LVDbSys1
gxx2db02:
gxx2cel01:
gxx2cel01: Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
gxx2cel01: Cell version: OSS_11.2.3.1.0_LINUX.X64_120304
gxx2cel01: Cell rpm version: cell-11.2.3.1.0_LINUX.X64_120304-1
gxx2cel01:
gxx2cel01: Active image version: 11.2.3.1.0.120304
gxx2cel01: Active image activated: 2012-05-03 03:00:13 -0700
gxx2cel01: Active image status: success
gxx2cel01: Active system partition on device: /dev/md6
gxx2cel01: Active software partition on device: /dev/md8
gxx2cel01:
gxx2cel01: In partition rollback: Impossible
gxx2cel01:
gxx2cel01: Cell boot usb partition: /dev/sdm1
gxx2cel01: Cell boot usb version: 11.2.3.1.0.120304
gxx2cel01:
gxx2cel01: Inactive image version: 11.2.2.3.5.110815
gxx2cel01: Inactive image activated: 2011-10-19 16:15:42 -0700
gxx2cel01: Inactive image status: success
gxx2cel01: Inactive system partition on device: /dev/md5
gxx2cel01: Inactive software partition on device: /dev/md7
gxx2cel01:
gxx2cel01: Boot area has rollback archive for the version: 11.2.2.3.5.110815
gxx2cel01: Rollback to the inactive partitions: Possible
gxx2cel02:
gxx2cel02: Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
gxx2cel02: Cell version: OSS_11.2.3.1.0_LINUX.X64_120304
gxx2cel02: Cell rpm version: cell-11.2.3.1.0_LINUX.X64_120304-1
gxx2cel02:
gxx2cel02: Active image version: 11.2.3.1.0.120304
gxx2cel02: Active image activated: 2012-05-03 02:59:52 -0700
gxx2cel02: Active image status: success
gxx2cel02: Active system partition on device: /dev/md6
gxx2cel02: Active software partition on device: /dev/md8
gxx2cel02:
gxx2cel02: In partition rollback: Impossible
gxx2cel02:
gxx2cel02: Cell boot usb partition: /dev/sdm1
gxx2cel02: Cell boot usb version: 11.2.3.1.0.120304
gxx2cel02:
gxx2cel02: Inactive image version: 11.2.2.3.5.110815
gxx2cel02: Inactive image activated: 2011-10-19 16:26:30 -0700
gxx2cel02: Inactive image status: success
gxx2cel02: Inactive system partition on device: /dev/md5
gxx2cel02: Inactive software partition on device: /dev/md7
gxx2cel02:
gxx2cel02: Boot area has rollback archive for the version: 11.2.2.3.5.110815
gxx2cel02: Rollback to the inactive partitions: Possible
gxx2cel03:
gxx2cel03: Kernel version: 2.6.18-274.18.1.0.1.el5 #1 SMP Thu Feb 9 19:07:16 EST 2012 x86_64
gxx2cel03: Cell version: OSS_11.2.3.1.0_LINUX.X64_120304
gxx2cel03: Cell rpm version: cell-11.2.3.1.0_LINUX.X64_120304-1
gxx2cel03:
gxx2cel03: Active image version: 11.2.3.1.0.120304
gxx2cel03: Active image activated: 2012-05-03 02:58:38 -0700
gxx2cel03: Active image status: success
gxx2cel03: Active system partition on device: /dev/md6
gxx2cel03: Active software partition on device: /dev/md8
gxx2cel03:
gxx2cel03: In partition rollback: Impossible
gxx2cel03:
gxx2cel03: Cell boot usb partition: /dev/sdm1
gxx2cel03: Cell boot usb version: 11.2.3.1.0.120304
gxx2cel03:
gxx2cel03: Inactive image version: 11.2.2.3.5.110815
gxx2cel03: Inactive image activated: 2011-10-19 16:26:59 -0700
gxx2cel03: Inactive image status: success
gxx2cel03: Inactive system partition on device: /dev/md5
gxx2cel03: Inactive software partition on device: /dev/md7
gxx2cel03:
gxx2cel03: Boot area has rollback archive for the version: 11.2.2.3.5.110815
gxx2cel03: Rollback to the inactive partitions: Possible

[root@gxx2db01 tmp]# dcli -g all_group -l root 'imagehistory'
gxx2db01: Version                              : 11.2.3.1.0.120304
gxx2db01: Image activation date                : 2002-05-03 22:47:44 +0800
gxx2db01: Imaging mode                         : fresh
gxx2db01: Imaging status                       : success
gxx2db01:
gxx2db02: Version                              : 11.2.3.1.0.120304
gxx2db02: Image activation date                : 2012-05-03 11:29:41 +0800
gxx2db02: Imaging mode                         : fresh
gxx2db02: Imaging status                       : success
gxx2db02:
gxx2cel01: Version                              : 11.2.2.3.5.110815
gxx2cel01: Image activation date                : 2011-10-19 16:15:42 -0700
gxx2cel01: Imaging mode                         : fresh
gxx2cel01: Imaging status                       : success
gxx2cel01:
gxx2cel01: Version                              : 11.2.3.1.0.120304
gxx2cel01: Image activation date                : 2012-05-03 03:00:13 -0700
gxx2cel01: Imaging mode                         : out of partition upgrade
gxx2cel01: Imaging status                       : success
gxx2cel01:
gxx2cel02: Version                              : 11.2.2.3.5.110815
gxx2cel02: Image activation date                : 2011-10-19 16:26:30 -0700
gxx2cel02: Imaging mode                         : fresh
gxx2cel02: Imaging status                       : success
gxx2cel02:
gxx2cel02: Version                              : 11.2.3.1.0.120304
gxx2cel02: Image activation date                : 2012-05-03 02:59:52 -0700
gxx2cel02: Imaging mode                         : out of partition upgrade
gxx2cel02: Imaging status                       : success
gxx2cel02:
gxx2cel03: Version                              : 11.2.2.3.5.110815
gxx2cel03: Image activation date                : 2011-10-19 16:26:59 -0700
gxx2cel03: Imaging mode                         : fresh
gxx2cel03: Imaging status                       : success
gxx2cel03:
gxx2cel03: Version                              : 11.2.3.1.0.120304
gxx2cel03: Image activation date                : 2012-05-03 02:58:38 -0700
gxx2cel03: Imaging mode                         : out of partition upgrade
gxx2cel03: Imaging status                       : success
gxx2cel03:
6.检查ofa版本

[root@gxx2db01 tmp]# dcli -g all_group -l root 'rpm -qa | grep ofa'
gxx2db01: ofa-2.6.18-274.18.1.0.1.el5-1.5.1-4.0.58
gxx2db02: ofa-2.6.18-274.18.1.0.1.el5-1.5.1-4.0.58
gxx2cel01: ofa-2.6.18-274.18.1.0.1.el5-1.5.1-4.0.58
gxx2cel02: ofa-2.6.18-274.18.1.0.1.el5-1.5.1-4.0.58
gxx2cel03: ofa-2.6.18-274.18.1.0.1.el5-1.5.1-4.0.58
7.检查硬件设备

[root@gxx2db01 tmp]# dcli -g all_group -l root 'dmidecode -s system-product-name'
gxx2db01: SUN FIRE X4170 M2 SERVER
gxx2db02: SUN FIRE X4170 M2 SERVER
gxx2cel01: SUN FIRE X4270 M2 SERVER
gxx2cel02: SUN FIRE X4270 M2 SERVER
gxx2cel03: SUN FIRE X4270 M2 SERVER
8.检查cells节点的日志

gxx2cel01: 36    2014-08-29T08:54:27+08:00       info            "This is a test trap"
gxx2cel02: 40_1  2014-08-28T20:01:24+08:00       warning         "Oracle Exadata Storage Server failed to auto-create cell disk and grid disks on the newly inserted physical disk. Physical Disk : 20:4  Status        : normal  Manufacturer  : SEAGATE  Model Number  : ST360057SSUN600G  Size          : 600G  Serial Number : E4CK7V  Firmware      : 0B25  Slot Number   : 4  "
gxx2cel02: 41    2014-08-29T08:54:04+08:00       info            "This is a test trap"gxx2cel03: 27_3  2014-08-13T18:28:11+08:00       clear           "Hard disk replaced.  Status        : NORMAL  Manufacturer  : HITACHI  Model Number  : HUS1560SCSUN600G  Size          : 600G  Serial Number : K7UL6N  Firmware      : A700  Slot Number   : 11  Cell Disk     : CD_11_gxx2cel03  Grid Disk     : DATA_GXX2_CD_11_gxx2cel03, RECO_GXX2_CD_11_gxx2cel03, DBFS_DG_CD_11_gxx2cel03"
gxx2cel03: 28    2014-08-29T08:54:43+08:00       info            "This is a test trap"
9.检查是否存在offline的grid盘

[root@gxx2db01 tmp]# dcli -g cell_group -l root "cellcli -e "LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != 'Yes'" "
10. 验证cell节点网络配置信息与cell.conf保持一致

[root@gxx2db01 tmp]# dcli -g cell_group -l root /opt/oracle.cellos/ipconf -verify
gxx2cel01: Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
gxx2cel01: Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks
gxx2cel02: Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
gxx2cel02: Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks
gxx2cel03: Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
gxx2cel03: Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks
11.停止CRS和存储节点的服务

[root@gxx2db01 tmp]dcli -g dbs_group -l root "/u01/app/11.2.0.3/grid/bin/crsctl stop crs -f"
[root@gxx2db01 tmp]dcli -g dbs_group -l root "ps -ef | grep d.bin"
[root@gxx2db01 tmp]dcli -g cell_group -l root "cellcli -e alter cell shutdown services all"
12.解压安装介质和解压插件

[root@gxx2db01 ExaImage]# unzip p16278923_112330_Linux-x86-64.zip
[root@gxx2db01 ExaImage]# unzip -d patch_11.2.3.3.0.131014.1/plugins/ p17938410_112330_Linux-x86-64.zip -x Readme.txt
[root@gxx2db01 ExaImage]# chmod +x patch_11.2.3.3.0.131014.1/plugins/*
13. 清理之前patchmgr运行后的环境

[root@gxx2db01 patch_11.2.3.3.0.131014.1]# ./patchmgr -cells /tmp/cell_group -reset_force
2014-09-06 13:48:44 +0800 DONE: reset_force

[root@gxx2db01 patch_11.2.3.3.0.131014.1]# ./patchmgr -cells  /tmp/cell_group -cleanup
2014-09-06 13:49:51 +0800 DONE: Cleanup
14.预安装检查

[root@gxx2db01 patch_11.2.3.3.0.131014.1]# ./patchmgr -cells /tmp/cell_group -patch_check_prereq
2014-09-06 14:27:26 +0800        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
2014-09-06 14:27:27 +0800        :SUCCESS: DONE: Check cells have ssh equivalence for root user.
2014-09-06 14:27:27 +0800        :Working: DO: Initialize files, check space and state of cell services. Up to 1 minute ...
2014-09-06 14:27:49 +0800        :SUCCESS: DONE: Initialize files, check space and state of cell services.
2014-09-06 14:27:49 +0800        :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...
2014-09-06 14:28:17 +0800 Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.

2014-09-06 14:28:18 +0800        :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.
2014-09-06 14:28:18 +0800        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
2014-09-06 14:29:01 +0800        :SUCCESS: DONE: Check prerequisites on all cells.
2014-09-06 14:29:01 +0800        :Working: DO: Execute plugin check for Patch Check Prereq ...
2014-09-06 14:29:01 +0800 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.1. Details in logfile /backup/ExaImage/patch_11.2.3.3.0.131014.1/patchmgr.stdout.
2014-09-06 14:29:01 +0800 :INFO: This plugin checks dbhomes across all nodes with oracle-user ssh equivalence, but only for those known to the local system. dbhomes that exist only on remote nodes must be checked manually.
2014-09-06 14:29:01 +0800 :SUCCESS: No exposure to bug 17854520 with non-rolling patching
2014-09-06 14:29:01 +0800        :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.
15.升级存储节点

[root@gxx2db01 patch_11.2.3.3.0.131014.1]# ./patchmgr -cells /tmp/cell_group -patch
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using
NOTE the cells are shut down for the duration of the patch or rollback.
NOTE For rolling patch or rollback, ensure all ASM instances using
NOTE the cells are up for the duration of the patch or rollback.

WARNING Do not start more than one instance of patchmgr.
WARNING Do not interrupt the patchmgr session.
WARNING Do not alter state of ASM instances during patch or rollback.
WARNING Do not resize the screen. It may disturb the screen layout.
WARNING Do not reboot cells or alter cell services during patch or rollback.
WARNING Do not open log files in editor in write mode or try to alter them.

NOTE All time estimates are approximate. Timestamps on the left are real.
NOTE You may interrupt this patchmgr run in next 60 seconds with control-c.


2014-09-06 14:32:49 +0800        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
2014-09-06 14:32:50 +0800        :SUCCESS: DONE: Check cells have ssh equivalence for root user.
2014-09-06 14:32:50 +0800        :Working: DO: Initialize files, check space and state of cell services. Up to 1 minute ...
2014-09-06 14:33:32 +0800        :SUCCESS: DONE: Initialize files, check space and state of cell services.
2014-09-06 14:33:32 +0800        :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...
2014-09-06 14:34:00 +0800 Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.


2014-09-06 14:34:01 +0800        :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.
2014-09-06 14:34:01 +0800        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
2014-09-06 14:34:43 +0800        :SUCCESS: DONE: Check prerequisites on all cells.
2014-09-06 14:34:43 +0800        :Working: DO: Copy the patch to all cells. Up to 3 minutes ...
2014-09-06 14:35:15 +0800        :SUCCESS: DONE: Copy the patch to all cells.
2014-09-06 14:35:17 +0800        :Working: DO: Execute plugin check for Patch Check Prereq ...
2014-09-06 14:35:17 +0800 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.1. Details in logfile /backup/ExaImage/patch_11.2.3.3.0.131014.1/patchmgr.stdout.
2014-09-06 14:35:17 +0800 :INFO: This plugin checks dbhomes across all nodes with oracle-user ssh equivalence, but only for those known to the local system. dbhomes that exist only on remote nodes must be checked manually.
2014-09-06 14:35:17 +0800 :SUCCESS: No exposure to bug 17854520 with non-rolling patching
2014-09-06 14:35:18 +0800        :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.
2014-09-06 14:35:18 +0800 1 of 5 :Working: DO: Initiate patch on cells. Cells will remain up. Up to 5 minutes ...
2014-09-06 14:35:30 +0800 1 of 5 :SUCCESS: DONE: Initiate patch on cells.
2014-09-06 14:35:30 +0800 2 of 5 :Working: DO: Waiting to finish pre-reboot patch actions. Cells will remain up. Up to 45 minutes ...
2014-09-06 14:36:30 +0800 Wait for patch pre-reboot procedures


2014-09-06 15:03:13 +0800 2 of 5 :SUCCESS: DONE: Waiting to finish pre-reboot patch actions.
2014-09-06 15:03:13 +0800        :Working: DO: Execute plugin check for Patching ...
2014-09-06 15:03:13 +0800        :SUCCESS: DONE: Execute plugin check for Patching.
2014-09-06 15:03:13 +0800 3 of 5 :Working: DO: Finalize patch on cells. Cells will reboot. Up to 5 minutes ...
2014-09-06 15:03:33 +0800 3 of 5 :SUCCESS: DONE: Finalize patch on cells.
2014-09-06 15:03:33 +0800 4 of 5 :Working: DO: Wait for cells to reboot and come online. Up to 120 minutes ...
2014-09-06 15:04:33 +0800 Wait for patch finalization and reboot

||||| Minutes left 076

2014-09-06 16:01:39 +0800 4 of 5 :SUCCESS: DONE: Wait for cells to reboot and come online.
2014-09-06 16:01:39 +0800 5 of 5 :Working: DO: Check the state of patch on cells. Up to 5 minutes ...
2014-09-06 16:02:14 +0800 5 of 5 :SUCCESS: DONE: Check the state of patch on cells.
2014-09-06 16:02:14 +0800        :Working: DO: Execute plugin check for Post Patch ...
2014-09-06 16:02:14 +0800 :INFO: /backup/ExaImage/patch_11.2.3.3.0.131014.1/plugins/001-post_11_2_3_3_0 - 17718598: Correct /etc/oracle-release.
2014-09-06 16:02:14 +0800 :INFO: /backup/ExaImage/patch_11.2.3.3.0.131014.1/plugins/001-post_11_2_3_3_0 - 17908298: Preserve password quality policies where applicable.
2014-09-06 16:02:15 +0800        :SUCCESS: DONE: Execute plugin check for Post Patch.

运行完成升级脚本后,系统会在屏幕上输出一系列的WORKING,SUCCESS等,如果运行到某一个地方出现Failed,则升级会中断,此时需要去解决这个问题。存储节点在升级的时候会自动重启,我们在计算节点可以看到下列日志:“SUCCESS: DONE: Wait for cells to reboot and come online.”最终在计算节点升级脚本运行完毕,一般需要1个半小时以上的时间。然后我可以检查下image的版本,判断是否升级成功。这期间要保证网络不断,因为我们是从计算节点发起的升级操作。所以最好使用vnc软件来执行升级,免得终端突然断掉引起不可预知的问题.

分享到: 更多