记一次ora.asm服务无法启动和GPnP Get Item的问题

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:记一次ora.asm服务无法启动和GPnP Get Item的问题
昨天晚上去打DB PSU,这是一套11.2.0.3 RAC。很快我就把数据库和集群全部停止,然后把补丁安装完了。我心想还是蛮顺利的啊,结果在重启集群的时候,突然发现集群没办法起来。通过crsctl check crs可以发现无法启动crsd的服务,而我去查看crsd.log的时候发现根本没有任何记录。一直是空的。然后我通过ocrcheck命令检查会一直hang住,执行crsctl query css votedisk确是好的,很快都能查到。然后我又执行了kfed去读取asm磁盘,发现也是能读的。通过进一步检查,发现是ora.asm服务启动不成功。进一步登陆到asm实例上查看asm的alert日志,发现能读取spfile文件,并且实例是能正常启动的。我通过查看asm的日志发现,磁盘组并没有走到最后一步mount成功的地方。如下所示:

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */ 
NOTE: Diskgroups listed in ASM_DISKGROUPS are
 	 DG_FRA
 	 DG_DATA
NOTE: Diskgroup used for Voting files is:
 	 DG_OCR
Diskgroup with spfile:DG_OCR
Diskgroup used for OCR is:DG_OCR
NOTE: cache registered group DG_DATA number=1 incarn=0x50ec3256
NOTE: cache began mount (not first) of group DG_DATA number=1 incarn=0x50ec3256
NOTE: cache registered group DG_FRA number=2 incarn=0x50fc3257
NOTE: cache began mount (not first) of group DG_FRA number=2 incarn=0x50fc3257
NOTE: cache registered group DG_OCR number=3 incarn=0x50fc3258
NOTE: cache began mount (not first) of group DG_OCR number=3 incarn=0x50fc3258
NOTE: Assigning number (1,3) to disk (/dev/diskgroup/dg_data4)
NOTE: Assigning number (1,2) to disk (/dev/diskgroup/dg_data3)
NOTE: Assigning number (1,1) to disk (/dev/diskgroup/dg_data2)
NOTE: Assigning number (1,0) to disk (/dev/diskgroup/dg_data1)
NOTE: Assigning number (2,0) to disk (/dev/diskgroup/dg_fra)
NOTE: Assigning number (3,0) to disk (/dev/diskgroup/dg_ocr)
GMON querying group 1 at 4 for pid 23, osid 128640
NOTE: cache opening disk 0 of grp 1: DG_DATA_0000 path:/dev/diskgroup/dg_data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: DG_DATA_0001 path:/dev/diskgroup/dg_data2
NOTE: cache opening disk 2 of grp 1: DG_DATA_0002 path:/dev/diskgroup/dg_data3
NOTE: cache opening disk 3 of grp 1: DG_DATA_0003 path:/dev/diskgroup/dg_data4
NOTE: cache mounting (not first) external redundancy group 1/0x50EC3256 (DG_DATA)

可以看到日志前面都是正常能够读盘的,中间做了一些其他的动作,在最后面出现下列错误:

NOTE: detached from domain 3
NOTE: cache dismounted group 3/0x50FC3258 (DG_OCR) 
NOTE: cache ending mount (fail) of group DG_OCR number=3 incarn=0x50fc3258
NOTE: cache deleting context for group DG_OCR 3/0x50fc3258
GMON dismounting group 3 at 9 for pid 23, osid 128640
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_OCR was not mounted

接下来我通过asm实例进入到数据库查了一下,在正常实例的磁盘组是mounted状态,而异常的数据库实例里面磁盘组的状态是mounting状态。也就是正在mount的状态。这里的一个疑点就是怎么mount不上。
为了把这个疑点弄清楚,我做了一个全局的hang分析。这里必须要做全局的才能知道磁盘组为什么无法mount。

*** 2018-10-26 04:36:34.280
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): +asm.+asm2, +asm.+asm1
  oradebug_node_dump_level: 3
  analysis initiated by oradebug
  os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 04:36:34 ]
      NOTE: scheduling delay has not been sampled for 0.184529 secs    0.000000 secs from [ 04:36:30 - 04:36:35 ], 5 sec avg
    0.000000 secs from [ 04:35:34 - 04:36:35 ], 1 min avg
    0.000000 secs from [ 04:31:34 - 04:36:35 ], 5 min avg
===============================================================================
 
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'GPnP Get Item'<='DFS lock handle'
     Chain 1 Signature Hash: 0xe08712d6
 [b] Chain 2 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention'
     Chain 2 Signature Hash: 0xec6b634f
 [c] Chain 3 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention'
     Chain 3 Signature Hash: 0xec6b634f
 
===============================================================================
Non-intersecting chains:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 2 (+asm.+asm2)
                   os id: 88049
              process id: 23, oracle@KSPX-DB2 (TNS V1-V3)
              session id: 1427
        session serial #: 3
    }
    is waiting for 'DFS lock handle' with wait info:
    {
                      p1: 'type|mode'=0x43490005
                      p2: 'id1'=0x3c
                      p3: 'id2'=0x2
            time in wait: 49 min 17 sec
           timeout after: never
                 wait id: 107
                blocking: 0 sessions
             current sql: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+1865<-kslwaitctx()+163<-kjusuc()+3619<-ksigeti()+3781<-ksbcic_int()+21066<-ksbcic()+12<-kfgbSendPeelWait()+53<-kfgFinalizeMount()+4736<-kfgscFinalize()+1405<-kfgForEachKfgsc()+193<-kfgsoFinalize()+135<-kfgFinalize()+396<-kfxdrvMount()+4781<-kfxdrvEntry()+2218<-opiexe()+20368<-opiosq0()+2948<-kpooprx()+274<-kpoal8()+829<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidr wait history: * time between current wait and wait #1: 0.000149 sec 1. event: 'rdbms ipc reply' time waited: 0.000338 sec wait id: 106 p1: 'from_process'=0x12 p2: 'timeout'=0x7fffffff * time between wait #1 and #2: 0.000054 sec 2. event: 'kfk: async disk IO' time waited: 0.000182 sec wait id: 105 p1: 'count'=0x1 p2: 'intr'=0x0 p3: 'timeout'=0xffffffff * time between wait #2 and #3: 0.000157 sec 3. event: 'rdbms ipc reply' time waited: 0.000372 sec wait id: 104 p1: 'from_process'=0x12 p2: 'timeout'=0x7fffffff } and is blocked by => Oracle session identified by:
    {
                instance: 1 (+asm.+asm1)
                   os id: 15643
              process id: 18, oracle@KSPX-DB1 (RBAL)
              session id: 1117
        session serial #: 1
    }
    which is waiting for 'GPnP Get Item' with wait info:
    {
            time in wait: 221831 min 33 sec
           timeout after: never
                 wait id: 124844244
                blocking: 44 sessions
             current sql: 
             short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-__poll()+47<-sgipcwWaitHelper()+5271<-sgipcwWait()+475<-gipcWaitOsd()+281<-gipcInternalWait()+14965<-gipcWaitF()+3234<-gipcInternalRecvSync()+8749<-gipcRecvSyncF()+3343<-clsgpnpm_gipcGets()+297<-clsgpnpm_receiveMsg()+408<-clsgpnpm_exchange()+1059<-clsgpnp_profileCallUrlInt()+3568<-clsgpnp_getProfileEx()+325<-clsgpnp_dbmsGetItem()+263<-kggpnpAttrGet()+1272<-kfdParseProfileString()+885<-kfdDiscoverShallow()+2023<-kfgbDriver()+1745<-ksbabs()+
            wait history:
              * time between current wait and wait #1: 0.000007 sec
              1.       event: 'GPnP Initialization'
                 time waited: 0.018700 sec
                     wait id: 124844243       
              * time between wait #1 and #2: 0.000207 sec
              2.       event: 'CSS initialization'
                 time waited: 0.000027 sec
                     wait id: 124844242       
              * time between wait #2 and #3: 0.000062 sec
              3.       event: 'rdbms ipc message'
                 time waited: 0.002787 sec
                     wait id: 124844241       p1: 'timeout'=0x12c
    }

通过全局的hang分析,我们可以发现,在节点2 asm启动之后,运行了”ALTER DISKGROUP ALL MOUNT”的命令,而现在这个命令没有执行成功,被阻塞了40多分钟。它是被节点1的rbal进程阻塞的,而该进程正在等待一个叫 “GPnP Get Item”的等待事件。那么既然我们知道问题的原因了,我们就可以开始在mos上查找相关资料了。通过查看文档Diskgroup Mount Hangs with RBAL Waiting on ‘GPnP Get Item’ and ‘enq: DD – contention’ (文档 ID 1375505.1),发现这个问题是这是由未发布的bug:12398300引起的。这个问题的最根本原因就是gpnp进程stuck了,然后就会导致rbal进程产生等待。解决办法其实很简单,就是干掉节点1的gpnp进程就会释放。而干掉gpnp进程集群不会重启,它只会单独的把该进程重启。
通过kill gpnp之后,故障恢复,节点2的集群顺利启动。
这里有一个技巧,以后针对ora.asm服务无法启动的问题,其实都可以使用全局的hang分析。

PostgreSQL识别阻塞会话

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:PostgreSQL识别阻塞会话
在Oracle当中,一般使用v$session中的获取阻塞信息(blocking_session),而在PostgreSQL中,我们来看看怎么识别阻塞会话。我们首先来做个测试。

postgres=# create table t1(id int);
CREATE TABLE

现在我们来给表增加一个字段,我们使用begin来开始事务,但是我们不用end块来结束。

postgres=# begin;
BEGIN
postgres=# alter table t1 add column name text;
ALTER TABLE

此时在另外一个节点执行插入就会hang住,一直等待,因为修改的DDL语句的事务仍然没有结束。

postgres=# insert into t1(id) values(1);

我们可以使用postgresql中的pg_stat_activity视图,可以看到它并不像oracle中的v$session视图,有blocking_session字段。而pg是通过backend_type来实现的。

postgres=# \d pg_stat_activity
                      View "pg_catalog.pg_stat_activity"
      Column      |           Type           | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
 datid            | oid                      |           |          |
 datname          | name                     |           |          |
 pid              | integer                  |           |          |
 usesysid         | oid                      |           |          |
 usename          | name                     |           |          |
 application_name | text                     |           |          |
 client_addr      | inet                     |           |          |
 client_hostname  | text                     |           |          |
 client_port      | integer                  |           |          |
 backend_start    | timestamp with time zone |           |          |
 xact_start       | timestamp with time zone |           |          |
 query_start      | timestamp with time zone |           |          |
 state_change     | timestamp with time zone |           |          |
 wait_event_type  | text                     |           |          |
 wait_event       | text                     |           |          |
 state            | text                     |           |          |
 backend_xid      | xid                      |           |          |
 backend_xmin     | xid                      |           |          |
 query            | text                     |           |          |
 backend_type     | text                     |           |          |


postgres=# select datname,pid,usename,wait_event_type,wait_event,state,query from pg_stat_activity where backend_type = 'client backend'
and pid != pg_backend_pid();
 datname  | pid  | usename  | wait_event_type | wait_event |        state        |                query
----------+------+----------+-----------------+------------+---------------------+--------------------------------------
 postgres | 3231 | postgres | Client          | ClientRead | idle in transaction | alter table t1 add column name text;
 postgres | 3386 | postgres | Lock            | relation   | active              | insert into t1(id) values(1);

(2 rows)

我们可以很容易看到阻塞的对象和他们执行的SQL语句。当然这个是会话数比较少的情况,当会话数比较多,通过查看pg_stat_activity来识别阻塞的会话会变得很棘手。

当您想知道PostgreSQL中当前持有或者是授权那些锁的时候,可以查看pg_locks,主要看RowExclusiveLock和AccessExclusiveLock。

postgres=# \d pg_locks
                   View "pg_catalog.pg_locks"
       Column       |   Type   | Collation | Nullable | Default
--------------------+----------+-----------+----------+---------
 locktype           | text     |           |          |
 database           | oid      |           |          |
 relation           | oid      |           |          |
 page               | integer  |           |          |
 tuple              | smallint |           |          |
 virtualxid         | text     |           |          |
 transactionid      | xid      |           |          |
 classid            | oid      |           |          |
 objid              | oid      |           |          |
 objsubid           | smallint |           |          |
 virtualtransaction | text     |           |          |
 pid                | integer  |           |          |
 mode               | text     |           |          |
 granted            | boolean  |           |          |
 fastpath           | boolean  |           |          |

postgres=# select locktype,database,relation,pid,mode,granted from pg_locks where pid != pg_backend_pid();
   locktype    | database | relation | pid  |        mode         | granted
---------------+----------+----------+------+---------------------+---------
 virtualxid    |          |          | 3386 | ExclusiveLock       | t
 virtualxid    |          |          | 3231 | ExclusiveLock       | t
 relation      |    13806 |    16434 | 3231 | AccessExclusiveLock | t
 transactionid |          |          | 3231 | ExclusiveLock       | t
 relation      |    13806 |    16434 | 3386 | RowExclusiveLock    | f
 relation      |    13806 |    16439 | 3231 | AccessExclusiveLock | t
 relation      |    13806 |    16437 | 3231 | ShareLock           | t
(7 rows)

上面,可以看到会话3386尝试访问行级别锁,但是没有授予。这个是当前插入的会话。我们可以把pg_locks与pg_database和pg_class,通过加入pid来获取更多的信息,如下所示:

postgres=# select b.locktype,d.datname,c.relname,b.pid,b.mode from pg_locks b,pg_database d,pg_class c where b.pid in (3386,3231) and b.database = d.oid and b.relation = c.oid;
 locktype | datname  | relname | pid  |        mode
----------+----------+---------+------+---------------------
 relation | postgres | t1      | 3231 | AccessExclusiveLock
 relation | postgres | t1      | 3386 | RowExclusiveLock
(2 rows)

那么我们如何识别阻塞会话呢?很简单,使用pg_blocking_pids函数传递被堵塞的会话,就可以看到阻塞者。

postgres=# select pg_blocking_pids(3386);
 pg_blocking_pids 
------------------
 {3231}
(1 row)

然后如果想杀掉这个会话可以使用pg_terminate_backend函数传入阻塞的源头会话。然后就可以插入成功了。

postgres=# select pg_terminate_backend(3231);
 pg_terminate_backend 
----------------------
 t

PostgreSQL中如何获取对象的DDL语句

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:PostgreSQL中如何获取对象的DDL语句
我们在Oracle当中,可以使用dbms_metadata.get_ddl()方法去查看创建的语句。也可以使用expdp/impdp生成元数据dmp文件,在转换成sqlfile去查看。但是在大PG里面我们应该怎么做呢?

postgres=# create table t1(id int,name varchar(30));
CREATE TABLE
postgres=# insert into t1(id,name) values (generate_series(1,1000000),'test'||generate_series(1,1000000));
INSERT 0 1000000
postgres=# select count(*) from t1;
  count
---------
 1000000
(1 row)

postgres=# create unique index idx1 on t1(id);
CREATE INDEX

postgres=# create index idx2 on t1(name);
CREATE INDEX

postgres=# create view v1 as select id from t1;
CREATE VIE

postgres=# alter table t1 add constraint con1 check (id< 2000000);
ALTER TABLE

create function add(int,int) returns int
as
'select $1 + $2;'
language sql
immutable returns null on null input;

postgres=# select add(1,2);
 add
-----
   3
(1 row)

PostgreSQL附带了一组函数,具体可以查看https://www.postgresql.org/docs/current/static/functions-info.html,一些函数可以获得对象的定义。如获取视图的定义可以使用pg_get_viewde,获取触发器可以使用pg_get_triggerdef,获取函数可以使用pg_get_functiondef,获取约束可以使用pg_get_constraintdef。

postgres=# select pg_get_viewdef('v1');
 pg_get_viewdef
----------------
  SELECT t1.id +
    FROM t1;
(1 row)

postgres=# SELECT conname, pg_get_constraintdef(r.oid, true) as definition FROM pg_constraint r WHERE r.conrelid = 't1'::regclass;
 conname |      definition
---------+----------------------
 con1    | CHECK (id < 2000000)
(1 row)

postgres=# SELECT proname,pg_get_functiondef(a.oid) FROM pg_proc a WHERE a.proname = 'add';
 proname |                   pg_get_functiondef
---------+---------------------------------------------------------
 add     | CREATE OR REPLACE FUNCTION public.add(integer, integer)+
         |  RETURNS integer                                       +
         |  LANGUAGE sql                                          +
         |  IMMUTABLE STRICT                                      +
         | AS $function$select $1 + $2;$function$                 +
         |
(1 row)

postgres=# select pg_get_indexdef('idx1'::regclass);
                    pg_get_indexdef
--------------------------------------------------------
 CREATE UNIQUE INDEX idx1 ON public.t1 USING btree (id)
(1 row)

但是这些功能里面没有获取表定义的。唯一的办法是使用pg_dump。 使用pg_dump我们可以把表还有索引的语句都dump出来。这里使用-s选项(schema only)和-t选项(tables)。

-bash-4.2$ pg_dump -s -t t1 postgres | egrep -v "^--|^$"
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE public.t1 (
    id integer,
    name character varying(30),
    CONSTRAINT con1 CHECK ((id < 2000000))
);
ALTER TABLE public.t1 OWNER TO postgres;
CREATE UNIQUE INDEX idx1 ON public.t1 USING btree (id);
CREATE INDEX idx2 ON public.t1 USING btree (name);

当然还可以使用pl/perl语言的扩展来实现这件事。

postgres=# create extension plperlu;
CREATE EXTENSION

postgres=# \dx
                                List of installed extensions
    Name     | Version |   Schema   |                      Description
-------------+---------+------------+-------------------------------------------------------
 pageinspect | 1.6     | public     | inspect the contents of database pages at a low level
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
 plperlu     | 1.0     | pg_catalog | PL/PerlU untrusted procedural language
(3 rows)

postgres=# CREATE OR REPLACE FUNCTION system(text) RETURNS text
AS 'my $cmd=shift; return `cd /tmp;$cmd`;' LANGUAGE plperlu;
CREATE FUNCTION

postgres=# select system('pg_dump -s -t t1 ddl | egrep -v "^--|^$"');
                    system
----------------------------------------------------------------
SET statement_timeout = 0;                                     +
SET lock_timeout = 0;                                          +
SET idle_in_transaction_session_timeout = 0;                   +
SET client_encoding = 'UTF8';                                  +
SET standard_conforming_strings = on;                          +
SELECT pg_catalog.set_config('search_path', '', false);        +
SET check_function_bodies = false;                             +
SET client_min_messages = warning;                             +
SET row_security = off;                                        +
SET default_tablespace = '';                                   +
SET default_with_oids = false;                                 +
CREATE TABLE public.t1 (                                       +
    id integer,                                                +
    name character varying(30),                                +
    CONSTRAINT con1 CHECK ((id < 2000000))                     +
);                                                             +
ALTER TABLE public.t1 OWNER TO postgres;                       +
CREATE UNIQUE INDEX idx1 ON public.t1 USING btree (id);        +
CREATE INDEX idx2 ON public.t1 USING btree (name);             +

Ansible入门学习

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:ANSIBLE入门学习
当我们的基础设备和环境变得越来越多的时候,我们需要一个工具来帮助我们管理服务器,无论是物理机还是虚拟机。Ansible这个工具可以帮助我们进行配置、管理和大规模进行部署。而且Ansible是基于ssh来执行的,不需要在其他服务器上安装agent类的插件。现在就开始来入门。
当前我的系统是Ubuntu 16.04.4 LTS,将作为ansible的管理机器。使用virtualbox的网卡,IP地址是192.168.56.1。而我的虚拟机使用的是centos系统,IP地址是192.168.56.91。这个虚拟机将要是被我Ansible管理的机器。
第一步,我们需要安装ansible,我这里直接使用

postgres@postgres-N65S01:~$ sudo apt-get install ansible
[sudo] password for postgres: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  ieee-data python-crypto python-ecdsa python-httplib2 python-jinja2 python-markupsafe python-netaddr python-paramiko python-pkg-resources python-selinux python-six python-yaml
Suggested packages:
  sshpass python-crypto-dbg python-crypto-doc python-jinja2-doc ipython python-netaddr-docs python-setuptools
The following NEW packages will be installed:
  ansible ieee-data python-crypto python-ecdsa python-httplib2 python-jinja2 python-markupsafe python-netaddr python-paramiko python-pkg-resources python-selinux python-six python-yaml
0 upgraded, 13 newly installed, 0 to remove and 95 not upgraded.
Need to get 2,967 kB of archives.
After this operation, 17.9 MB of additional disk space will be used.
Do you want to continue? [Y/n] y

安装完成之后,通过查看一下版本命令看看是否正常运行。

postgres@postgres-N65S01:~$ ansible --version
ansible 2.0.0.2
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides

在开始使用Ansible之前(因为我们不想使用root),所以两个主机都会创建一个组并创建一个用户。同时还要生成密钥。生成密钥非常重要。因为我们将使用无密码ssh身份验证从控制主机到托管主机进行通信。

postgres@postgres-N65S01:~$ sudo groupadd ansible
postgres@postgres-N65S01:~$ sudo useradd -g ansible ansible
postgres@postgres-N65S01:~$ sudo passwd ansible
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
postgres@postgres-N65S01:/home$ su - ansible
Password: 
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/postgres/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/postgres/.ssh/id_rsa.
Your public key has been saved in /home/postgres/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:z0mFh7iuQig7vGlg55GA8i9LyysG9X32fiR41+cdvEQ postgres@postgres-N65S01
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|         . o     |
|.       . o o    |
|o..      . o   E |
|.o.o..  S..  .o  |
|+.o+o ..++o.o .+.|
|=o+o.  o.o++  .o+|
|+*ooo  .  . .  .o|
|o=*+ ..  ...     |
+----[SHA256]-----+

为了不需要密码来控制,我们需要将ssh公钥文件从控制主机复制到托管主机:

$ ssh-copy-id -i .ssh/id_rsa.pub ansible@192.168.56.91
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_rsa.pub"
The authenticity of host '192.168.56.91 (192.168.56.91)' can't be established.
ECDSA key fingerprint is SHA256:/+Re8LQTEBXAvC2rNaTpKiuO5vAL+4yBZvRa3soV0zs.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
ansible@192.168.56.91's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'ansible@192.168.56.91'"
and check to make sure that only the key(s) you wanted were added.

当然我们还需要添加主机设置,这样控制机才知道它应管理哪些主机。因为我们使用ansible用户单独管理,而不是root,这里把权限修改一下。

postgres@postgres-N65S01:/etc$ ls -l ansible/*
-rw-r--r-- 1 root root 10301 1月  15  2016 ansible/ansible.cfg
-rw-r--r-- 1 root root   982 8月  21 23:13 ansible/hosts
postgres@postgres-N65S01:/etc$ sudo chown -R ansible:ansible /etc/ansible/*
postgres@postgres-N65S01:/etc$ su - ansible
Password: 
$ cat /etc/ansible/hosts
# This is the default ansible 'hosts' file.
#
# It should live in /etc/ansible/hosts
#
#   - Comments begin with the '#' character
#   - Blank lines are ignored
#   - Groups of hosts are delimited by [header] elements
#   - You can enter hostnames or ip addresses
#   - A hostname/ip can be a member of multiple groups

# Ex 1: Ungrouped hosts, specify before any group headers.

#green.example.com
#blue.example.com
#192.168.100.1
#192.168.100.10

# Ex 2: A collection of hosts belonging to the 'webservers' group

#[webservers]
#alpha.example.org
#beta.example.org
#192.168.1.100
#192.168.1.110

# If you have multiple hosts following a pattern you can specify
# them like this:

#www[001:006].example.com

# Ex 3: A collection of database servers in the 'dbservers' group

#[dbservers]
#
#db01.intranet.mydomain.net
#db02.intranet.mydomain.net
#10.25.1.56
#10.25.1.57

# Here's another example of host ranges, this time there are no
# leading 0s:

#db-[99:101]-node.example.com

这里给出了一系列的示例,我们这里设置如下,括号中的名称是所谓的组名。这意味着在Ansible命令中引用“pg-servers”,会将组名解析为为服务器地址。让我们做一个基本测试。

[pg-servers]
192.168.56.91

$ ansible pg-servers -a "/bin/echo I love ansible"
192.168.56.91 | SUCCESS | rc=0 >>
I love ansible

$ ansible pg-servers -a "/bin/mkdir -p aaa"
192.168.56.91 | SUCCESS | rc=0 >>

postgres@postgres-N65S01:/etc$ ssh ansible@192.168.56.91
ansible@192.168.56.91's password: 
Last failed login: Wed Oct 24 23:41:56 CST 2018 from 192.168.56.1 on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Wed Oct 24 23:41:45 2018 from 192.168.56.1
[ansible@db ~]$ ls -lrt
total 0
drwxrwxr-x. 2 ansible ansible 6 10月 24 23:41 aaa

可以看到ansible执行了我们相要的操作。非常酷。

PostgreSQL创建表和索引,磁盘会发生什么情况?

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:PostgreSQL创建表和索引,磁盘会发生什么情况?
当我们在PostgreSQL中创建一个表或一个索引时,磁盘上的文件是怎么创建的,我们来研究一下?

postgres=# select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

首先我们来创建一个表。接下来我们使用pg_relation_filepath函数,获取到该表对应的文件名。

postgres=# create table t1(name varchar(30));
CREATE TABLE
postgres=# select pg_relation_filepath('t1');
 pg_relation_filepath 
----------------------
 base/13806/16384
(1 row)

或者我们也可以使用oid2name程序,指定数据库名字和表名即可。

-bash-4.2$ ./oid2name 
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  13806       postgres  pg_default
  13805      template0  pg_default
      1      template1  pg_default

-bash-4.2$ ./oid2name -d postgres -t t1
From database "postgres":
  Filenode  Table Name
----------------------
     16384          t1

此时我们可以查看该文件.

-bash-4.2$ ls -lrt $PGDATA/base/13806/16384
-rw-------. 1 postgres postgres 0 Oct 24 11:18 /var/lib/pgsql/10/data/base/13806/16384

可以看到文件字节大小是0,此时并没有任何数据,没有以“_fsm”后缀的Free Space Mapping文件和以”vm”为后缀的visibility map文件。

-bash-4.2$ ls -lrt $PGDATA/base/13806/16384*
-rw-------. 1 postgres postgres 0 Oct 24 11:18 /var/lib/pgsql/10/data/base/13806/16384

接下来我们试下在这个表上创建索引会发生什么情况。

postgres=# create index idx1 on t1 (name);
CREATE INDEX
postgres=# select pg_relation_filepath('idx1');
 pg_relation_filepath 
----------------------
 base/13806/16387
(1 row)

此时可以发现创建的文件是有大小的。可以看到这个paag是一个元数据的page。

-bash-4.2$ ls -lrt $PGDATA/base/13806/16387*
-rw-------. 1 postgres postgres 8192 Oct 24 15:38 /var/lib/pgsql/10/data/base/13806/16387

我们可以通过创建扩展包pageinspect工具来研究下page(物理文件的单位,默认大小为8K)。

postgres=# create extension pageinspect;
CREATE EXTENSION
postgres=# SELECT * FROM bt_metap('idx1');
 magic  | version | root | level | fastroot | fastlevel 
--------+---------+------+-------+----------+-----------
 340322 |       2 |    0 |     0 |        0 |         0
(1 row)

postgres=# SELECT * FROM bt_page_stats('idx1',0);
ERROR:  block 0 is a meta page

如果我们在插入一条数据看看,可以看到文件有了大小。用strings可以看到里面的内容。

postgres=# insert into t1 values('buddy');
INSERT 0 1

-bash-4.2$ ls -lrt $PGDATA/base/13806/16384*
-rw-------. 1 postgres postgres 8192 Oct 24 15:53 /var/lib/pgsql/10/data/base/13806/16384

-bash-4.2$ strings /var/lib/pgsql/10/data/base/13806/16384
buddy

此时还是没有Free Space Mapping文件和visibility map文件。需要vacuum一下。此时文件的大小变成了8k。

postgres=# vacuum t1;
VACUUM

-bash-4.2$ ls -lrt $PGDATA/base/13806/16384*
-rw-------. 1 postgres postgres  8192 Oct 24 15:54 /var/lib/pgsql/10/data/base/13806/16384
-rw-------. 1 postgres postgres  8192 Oct 24 15:57 /var/lib/pgsql/10/data/base/13806/16384_vm
-rw-------. 1 postgres postgres 24576 Oct 24 15:57 /var/lib/pgsql/10/data/base/13806/16384_fsm

oracle18c新特性-用户NO AUTHENTICATION认证方式和Proxy user连接

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:oracle18c新特性-用户NO AUTHENTICATION认证方式和Proxy user连接

使用Oracle18.3,我们可以在没有密码的情况下创建用户。实际上出于安全原因,没有人可以直接连接到应用程序的架构是一个非常好的功能。一个很好方法是使用代理连接,实际上连接为t1用户但使用t2用户的密码,。下面我们就来研究测试一下。
创建T1用户,赋予create session,resource权限;

create user t1 identified by t1;
grant create session,resource to t1;

创建T2用户,只赋予create session权限;

create user t2 identified by t2;
grant create session to t2;

赋予允许proxy连接的权限.

alter user t1 grant connect through t2;

使用代理连接的方式如下,这样就不用输入t1用户的密码,就连上了t1用户

Usage:  ::= [][/][@]

SQL> connect t2[t1]/t2@ORCLPDB1
Connected.
SQL> select sys_context('USERENV','SESSION_USER') as session_user,sys_context('USERENV','SESSION_SCHEMA') as session_schema,sys_context('USERENV','PROXY_USER') as proxy from dual;

SESSION_USER      SESSION_SCHEMA      PROXY
----------------- ------------------- ------------
T1                T1                  T2

当我们把T1账号锁定一下,会发现使用代理的连接会被锁定。

SQL> alter user t1 account lock;
User altered.

SQL> connect t2[t1]/t2@ORCLPDB1
ERROR:
ORA-28000: The account is locked.

搞明白代理用户的连接方式,接下来我们来研究NO AUTHENTICATION认证。这里先把t1和t2用户删除。

以下是18c的新特性。使用NO AUTHENTICATION,可以创建一个没有密码的用户。也没有办法登陆这个用户。

NO AUTHENTICATION Clause
Use the NO AUTHENTICATION clause to create a schema that does not have a password and cannot be logged into. This is intended for schema only accounts and reduces maintenance by removing default passwords and any requirement to rotate the password.

SQL> create user t1 no authentication;
User created.

SQL> grant create session to t1; 
Grant succeeded

SQL> connect t1@ORCLPDB1
Enter password: 
ERROR:
ORA-01017: invalid username/password; logon denied

可以看到这里是没办法按登录的。需要输入密码。我们可以使用proxy连接试试看能不能登陆。

SQL> create user t2 identified by t2;
User created.

SQL> alter user t1 grant connect through t2;
User altered.

SQL> connect t2[t1]/t2@ORCLPDB1
Connected.

可以看到这里可以通过proxy用户进行连接了。即使t1用户没有密码也可以连接。当然这里有个问题,就是这么做有什么作用?我们的一个想法是,让t1用户里面有数据,然后t1用户是没有密码且不能登录的。然后我们通过代理t2用户对t1用户下的数据进行查询和其他操作。这样我们就影藏了数据库的用户。全程通过代理用户做所有的操作。我们来测试一下。
1.先删除t1和t2用户

drop user t1 cascade;
drop user t2 cascade;

2.创建t1,t2用户,t1使用no authentication认证方式。,同时赋权让t2成为proxy用户。

SQL> create user t1 no authentication;
User created.
 
SQL> grant create session , create table to t1;
Grant succeeded.

SQL> create user t2 identified by t2;
User created.

SQL> alter user t1 grant connect through t2;
User altered.

3.测试使用proxy方式连接。

SQL> connect t2[t1]/t2@ORCLPDB1
Connected.

4.直接登陆,需要输入密码,此时无法登陆

SQL> connect t1/t1@ORCLPDB1
ERROR:
ORA-01017: invalid username/password; logon denied

5.授权sysdba给用户t1,发现管理权限不能被授权给该用户,因为该用户使用的是no authentication认证方式。

SQL> connect sys/oracle@ORCLPDB1 as sysdba;
Connected.

SQL> grant sysdba to t1;
grant sysdba to t1
*
ERROR at line 1:
ORA-40366: Administrative privilege cannot be granted to this user.

6.把no authentication认证方式修改成密码认证。此时就可以授予sysdba权限了

SQL> alter user t1 identified by t1; 
User altered.
 
SQL> grant sysdba to t1;
Grant succeeded.

7.再次修改成no authentication认证方式,此时是不行的,不能把sysdba权限用户设置成no authentication认证方式,收回sysdba权限,就又能成功了。

SQL> alter user t1 no authentication;
alter user t1 no authentication
*
ERROR at line 1:
ORA-40367: An Administrative user cannot be altered to have no authentication
type.

SQL> revoke sysdba from t1;
Revoke succeeded.
 
SQL> alter user t1 no authentication;
User altered.

8.再次取消no authentication认证方式,在t1用户下创建表和插入数据.

SQL> connect sys/oracle@ORCLPDB1 as sysdba;
Connected.

SQL> alter user t1 identified by t1;

SQL> connect t1/t1@ORCLPDB1
Connected.

SQL> create table t1 (name varchar2(32));
Table created.
 
SQL> insert into t1 values('buddy');
1 row created.
 
SQL> commit;
Commit complete.

9.再次重新设置no authentication认证方式,通过proxy方式进行连接,可以查看表的数据和插入数据。此时虽然表属于t1用户,但是t1用户是没有密码和不能登录的,只能通过t2用户使用proxy连接方式登录操作。

SQL> connect sys/oracle@ORCLPDB1 as sysdba;
Connected.

SQL> alter user t1 no authentication;
User altered.

SQL> connect t2[t1]/t2@ORCLPDB1
Connected.

SQL> select * from t1;
NAME
--------------------------------
buddy

SQL> insert into t1 values('buddy99');
1 row created.

SQL> commit;
Commit complete.

SQL> select object_name, object_type, owner from all_objects where object_name ='T1';

OBJECT_NAME          OBJECT_TYPE             OWNER
-------------------- ----------------------- ------------------------------
T1                   TABLE                   T1

那么我们能不能使用审计来审计是什么用户干的呢,能不能看到代理用户干的?

SQL> connect sys/oracle@ORCLPDB1 as sysdba;
Connected.

SQL> create audit policy t1_user_audit_policy privileges create table when 'SYS_CONTEXT(''USERENV'',''SESSION_USER'') = ''T1''' evaluate per session container=current;
Audit policy created

SQL> audit policy t1_user_audit_policy whenever successful;
Audit succeeded.

SQL> connect t2[t1]/t2@ORCLPDB1
Connected.
SQL> create table a1 (name varchar2(30));
Table created.

查看审计unified_audit_trail表,可以看到DBPROXY_USERNAME字段不为空就是代理用户T2创建的。

SQL> connect sys/oracle@ORCLPDB1 as sysdba;
Connected.

SQL> select object_name,event_timestamp, dbusername, dbproxy_username from unified_audit_trail where action_name = 'CREATE TABLE';
OBJECT_NAME                    EVENT_TIMESTAMP                DBUSERNAME                     DBPROXY_USERNAME
------------------------------ ------------------------------ ------------------------------ ------------------------------
A1                             23-OCT-18 10.21.33.073649 PM   T1                             T2

使用PostgreSQL分析操作系统命令及文件

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址:使用PostgreSQL分析操作系统命令及文件
通过在Linux安装了sysstat,可以实现自动采集性能数据存放在/var/log/sa中。要开启自动采集,需要开启syssata服务,在centos 7当中使用下列命令:

systemctl enable sysstat
systemctl start sysstat

当然自动采集是10分钟一次,为了实验方便,我修改了配置文件调整成1分钟1次。

[root@db sa]# more /etc/cron.d/sysstat 
# Run system activity accounting tool every 10 minutes
*/1 * * * * root /usr/lib64/sa/sa1 1 1
# 0 * * * * root /usr/lib64/sa/sa1 600 6 &
# Generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A

接下来我们运行sar -f命令查看自动生成的文件.

[root@db sa]# sar -f sa22 | head -20
Linux 3.10.0-693.21.1.el7.x86_64 (db.gowithme.tw) 	2018年10月22日 	_x86_64_	(1 CPU)

21时48分59秒       LINUX RESTART
21时59分54秒       LINUX RESTART

22时00分01秒     CPU     %user     %nice   %system   %iowait    %steal     %idle
22时01分01秒     all      0.02      0.00      0.20      0.03      0.00     99.75
22时02分01秒     all      0.02      0.00      0.12      0.02      0.00     99.85
22时03分01秒     all      0.00      0.00      0.07      0.02      0.00     99.92
22时04分01秒     all      0.00      0.00      0.07      0.00      0.00     99.93
22时05分01秒     all      0.02      0.00      0.08      0.00      0.00     99.90
22时06分01秒     all      0.00      0.00      0.10      0.02      0.00     99.88
22时07分02秒     all      0.02      0.00      0.07      0.00      0.00     99.92
22时08分01秒     all      0.00      0.00      0.07      0.02      0.00     99.91
22时09分01秒     all      0.02      0.00      0.07      0.02      0.00     99.90
22时10分01秒     all      0.00      0.00      0.07      0.00      0.00     99.93
22时11分01秒     all      0.02      0.00      0.07      0.00      0.00     99.92
22时12分01秒     all      0.00      0.00      0.07      0.02      0.00     99.92
22时13分01秒     all      0.00      0.00      0.08      0.00      0.00     99.92

接下来我们就可以使用postgresql将OS命令的结果存储到变量中.

-bash-4.2$ psql 
psql (10.4)
Type "help" for help.

然后查询变量,就可以查看到os的数据.
postgres=# \set varls `ls -la`
postgres=# select :'varls';
                               ?column?                               
----------------------------------------------------------------------
 total 20                                                            +
 dr-xr-xr-x.  18 root    root     239 Oct 22 20:33 .                 +
 dr-xr-xr-x.  18 root    root     239 Oct 22 20:33 ..                +
 lrwxrwxrwx.   1 root    root       7 Mar 31  2018 bin -> usr/bin    +
 dr-xr-xr-x.   5 root    root    4096 Mar 31  2018 boot              +
 drwxr-xr-x.  18 root    root    3040 Oct 22 21:33 dev               +
 drwxr-xr-x.  76 root    root    8192 Oct 22 21:45 etc               +
 drwxr-xr-x.   3 root    root      21 Mar 31  2018 home              +
 lrwxrwxrwx.   1 root    root       7 Mar 31  2018 lib -> usr/lib    +
 lrwxrwxrwx.   1 root    root       9 Mar 31  2018 lib64 -> usr/lib64+
 drwxr-xr-x.   2 root    root       6 Nov  5  2016 media             +
 drwxr-xr-x.   2 root    root       6 Nov  5  2016 mnt               +
 drwxr-xr-x.   3 root    root      39 Mar 31  2018 opt               +
 dr-xr-xr-x. 113 root    root       0 Oct 22 21:33 proc              +
 dr-xr-x---.   3 root    root     163 Mar 31  2018 root              +
 drwxr-xr-x.  23 root    root     700 Oct 22 21:45 run               +
 lrwxrwxrwx.   1 root    root       8 Mar 31  2018 sbin -> usr/sbin  +
 drwxr-xr-x.   2 root    root       6 Nov  5  2016 srv               +
 dr-xr-xr-x.  13 root    root       0 Oct 22 21:33 sys               +
 drwxrwxrwt.   7 root    root     140 Oct 22 21:45 tmp               +
 drwxr-xr-x.  16 root    root     199 Jun 28 10:50 usr               +
 drwxr-xr-x.   1 vagrant vagrant 4096 Oct 22 21:23 vagrant           +
 drwxr-xr-x.  19 root    root     267 Mar 31  2018 var
(1 row)

接下来我们在把sar的结果放在var_iostat变量里面.

postgres=# \set var_iostat `sar -f /var/log/sa/sa22`
postgres=# select :'var_iostat';
                                            ?column?                                             
-------------------------------------------------------------------------------------------------
 Linux 3.10.0-693.21.1.el7.x86_64 (db.gowithme.tw)       10/22/2018      _x86_64_        (1 CPU)+
                                                                                                +
 09:48:59 PM       LINUX RESTART                                                                +
                                                                                                +
 09:59:54 PM       LINUX RESTART                                                                +
                                                                                                +
 10:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle                +
 10:01:01 PM     all      0.02      0.00      0.20      0.03      0.00     99.75                +
 10:02:01 PM     all      0.02      0.00      0.12      0.02      0.00     99.85                +
 10:03:01 PM     all      0.00      0.00      0.07      0.02      0.00     99.92                +
 10:04:01 PM     all      0.00      0.00      0.07      0.00      0.00     99.93                +
 10:05:01 PM     all      0.02      0.00      0.08      0.00      0.00     99.90                +
 10:06:01 PM     all      0.00      0.00      0.10      0.02      0.00     99.88                +
 10:07:02 PM     all      0.02      0.00      0.07      0.00      0.00     99.92                +
 10:08:01 PM     all      0.00      0.00      0.07      0.02      0.00     99.91                +
 10:09:01 PM     all      0.02      0.00      0.07      0.02      0.00     99.90                +
 10:10:01 PM     all      0.00      0.00      0.07      0.00      0.00     99.93                +
 10:11:01 PM     all      0.02      0.00      0.07      0.00      0.00     99.92                +
 10:12:01 PM     all      0.00      0.00      0.07      0.02      0.00     99.92                +

由于我们只对iowait这一列感兴趣,我们可以使用awk进行过滤。

postgres=# \set var_iostat `sar -f /var/log/sa/sa22 | egrep -v "^$|Average|Linux|LINUX|CPU" | awk -F " " '{print $7}'`
postgres=# select :'var_iostat';
 ?column? 
----------
 0.03    +
 0.02    +
 0.02    +
 0.00    +
 0.00    +
 0.02    +
 0.00    +
 0.02    +
 0.02    +
 0.00    +
 0.00    +
 0.02    +
 0.00    +
 0.02    +

如果我们把这个插入到表中,结果会是一个varchar列,而不是多列。这对于我们分析数据不是很好,接下来我们需要做一下行列转换,先把这么多行转换成一行数据,并用,分割开。

postgres=# \set var_iostat `sar -f /var/log/sa/sa22 | egrep -v "^$|Average|Linux|LINUX|CPU" | awk -F " " '{print $7}' | sed ':a;N;$!ba;s/\n/,/g'`
postgres=# select :'var_iostat';
                                                                                                                                                            ?column?                                                                                          
                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
 0.03,0.02,0.02,0.00,0.00,0.02,0.00,0.02,0.02,0.00,0.00,0.02,0.00,0.02,0.00,0.00,0.00,0.00,0.02,0.00,0.00,0.02,0.00,0.02,0.02,0.00,0.02,0.00,0.02,0.02,0.00,0.00,0.02,0.00,0.00,0.00,0.00,0.00,0.02,0.00,0.00,0.00,0.02,0.00,0.00,0.02,0.00,0.02,0.02,0.00,0.0
0,0.00,0.02,0.00,0.02,0.00,0.07,0.00,0.02,0.00,0.00,0.02,0.00,0.02
(1 row)

现在我们把上述这个变量用逗号分割,然后转换成数组。

postgres=# select string_to_array(:'var_iostat',',')::numeric[];
                                                                                                                                                          string_to_array                                                                                     
                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------
 {0.03,0.02,0.02,0.00,0.00,0.02,0.00,0.02,0.02,0.00,0.00,0.02,0.00,0.02,0.00,0.00,0.00,0.00,0.02,0.00,0.00,0.02,0.00,0.02,0.02,0.00,0.02,0.00,0.02,0.02,0.00,0.00,0.02,0.00,0.00,0.00,0.00,0.00,0.02,0.00,0.00,0.00,0.02,0.00,0.00,0.02,0.00,0.02,0.02,0.00,0.
00,0.00,0.02,0.00,0.02,0.00,0.07,0.00,0.02,0.00,0.00,0.02,0.00,0.02}
(1 row)

接下来存储到表里面,我们就可以查询出当前使用率最大值和最小值做一些统计.

postgres=# create table tab_iowaits (varday date, iowaits numeric[]);
postgres=# insert into tab_iowaits values (current_date, string_to_array(:'var_iostat',',')::numeric[]);
postgres=# with tab as ( select unnest(iowaits) vals, varday from tab_iowaits where varday = current_date ) select max(vals) from tab;
 max  
------
 0.07
(1 row)

postgres=# with tab as ( select unnest(iowaits) vals, varday from tab_iowaits where varday = current_date ) select min(vals) from tab;
 min  
------
 0.00
(1 row)