치명적인 Hash Group By 버그

10g 부터 group by 시에 느린성능의 Sort Group by 가 사라지고 빠른성능의 Hash Group By 가 등장 했다.
하지만 현재시점(10.2.0.3)에서 항상 Hash Group By 가 동작되는 것은 아니다.
성능면에서 배치 SQL 혹은 Migration 작업시에 몇억건의 데이터를 sort 하게되면 견딜수 없다.
주로 Sort Group by 는 insert - select - group by 상황에서 발생한다.
opt_param 힌트로도 해결되지 않았으며 Only select 문또는
CTAS (Create table as Select)문에서는 발생하지 않는걸로 확인됬다.
아래는 간단한 테스트를 진행 하고 현상황에서 Sort Group by 를 피할수 있는 해법을 제공한다.


1.테스트 환경
버젼 : 10gR2(10.2.0.3) ,
Optimizer mode :all_rows
관련 파라미터 : _gby_hash_aggregation_enabled = true
계정 : scott/tiger


2.Select Test

analyze table dept compute statistics;

analyze table emp compute statistics;

select /*+ use_hash(b a) */ empno, ename, job, mgr, hiredate, sal, comm, a.deptno
from
(select max(EMPNO) as empno, max(ENAME) as ename, max(JOB) as job, max(MGR) as mgr,
max(HIREDATE) as hiredate, max(SAL) as sal, max(COMM) as comm, deptno
from emp
group by DEPTNO) a,
dept b
where a.deptno = b.deptno;



Execution Plan
--------------------------------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=4 Bytes=356)
1 0 HASH JOIN (Cost=6 Card=4 Bytes=356)
2 1 VIEW (Cost=4 Card=4 Bytes=348)
3 2 HASH (GROUP BY) (Cost=4 Card=4 Bytes=128) --> 정상적인 hash group by
4 3 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=448)
5 1 INDEX (FULL SCAN) OF 'PK_DEPT' (INDEX (UNIQUE)) (Cost=1 Card=4 Bytes=8)

3.CTAS Test

-- CTAS 시에 inline view 가 merge 되므로 상황을 재현하기위해 no_merge 힌트 사용

create table new_emp as
select /*+ use_hash(b a) */ empno, ename, job, mgr, hiredate, sal, comm, a.deptno
from
(select /*+ no_merge */
max(EMPNO) as empno, max(ENAME) as ename, max(JOB) as job, max(MGR) as mgr,
max(HIREDATE) as hiredate, max(SAL) as sal, max(COMM) as comm, deptno
from emp
group by DEPTNO) a,
dept b
where a.deptno = b.deptno;


Execution Plan
--------------------------------------------------------------------------------
0 CREATE TABLE STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=4 Bytes=356)
1 0 LOAD AS SELECT OF 'NEW_EMP'
2 1 HASH JOIN (Cost=7 Card=4 Bytes=356)
3 2 VIEW (Cost=5 Card=4 Bytes=348)
4 3 HASH (GROUP BY) (Cost=4 Card=4 Bytes=128) --> 정상적인 hash group by
5 4 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=448)
6 2 INDEX (FULL SCAN) OF 'PK_DEPT' (INDEX (UNIQUE)) (Cost=1 Card=4 Bytes=8)


4.Insert-Select-group by Test

--2번/3번 테스트 에서 나타나지 않았던 Sort Group By 가 Insert 시에만 나타난다.

insert into emp
select /*+ use_hash(b a) */ empno, ename, job, mgr, hiredate, sal, comm, a.deptno
from
(select max(EMPNO) as empno, max(ENAME) as ename, max(JOB) as job, max(MGR) as mgr,
max(HIREDATE) as hiredate, max(SAL) as sal, max(COMM) as comm, deptno
from emp
group by DEPTNO) a,
dept b
where a.deptno = b.deptno;

Execution Plan
--------------------------------------------------------------------------------

0 INSERT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=4 Bytes=356)
1 0 HASH JOIN (Cost=6 Card=4 Bytes=356)
2 1 VIEW (Cost=4 Card=4 Bytes=348)
3 2 SORT (GROUP BY) (Cost=4 Card=4 Bytes=128) --> 비정상적인 Sort group by
4 3 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=448)
5 1 INDEX (FULL SCAN) OF 'PK_DEPT' (INDEX (UNIQUE)) (Cost=1 Card=4 Bytes=8)


5.해법

1) merge 힌트사용: 두개의 테이블이 merge 되는것이 유리한 경우


insert into emp
select /*+ merge(a) use_hash(b a) */ empno, ename, job, mgr, hiredate, sal, comm, a.deptno
from
(select max(EMPNO) as empno, max(ENAME) as ename, max(JOB) as job, max(MGR) as mgr, max(HIREDATE) as hiredate, max(SAL) as sal, max(COMM) as comm, deptno
from emp
group by DEPTNO) a,
dept b
where a.deptno = b.deptno;

Execution Plan
--------------------------------------------------------------------------------

0 INSERT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=4 Bytes=184)
1 0 HASH (GROUP BY) (Cost=6 Card=4 Bytes=184) --> 정상적인 hash group by
2 1 HASH JOIN (Cost=5 Card=14 Bytes=644)
3 2 INDEX (FULL SCAN) OF 'PK_DEPT' (INDEX (UNIQUE)) (Cost=1 Card=4 Bytes=56)
4 2 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=448)

2)Migration 인 경우는 insert-select 대신에 CTAS 사용

3)With 구문과 Merterialize 힌트사용

insert into emp
with max_emp as
(select /*+ materialize */
max(EMPNO) as empno, max(ENAME) as ename, max(JOB) as job,
max(MGR) as mgr, max(HIREDATE) as hiredate, max(SAL) as sal,
max(COMM) as comm, DEPTNO
from emp
group by DEPTNO),
max_dept_emp as
( select /*+ use_hash(b a) */
empno, ename, job, mgr, hiredate, sal, comm, a.deptno
from max_emp a,
dept b
where a.deptno = b.deptno )
select * from max_dept_emp;

Execution Plan
--------------------------------------------------------------------------------

0 INSERT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=4 Bytes=356)
1 0 TEMP TABLE TRANSFORMATION
2 1 LOAD AS SELECT OF 'EMP'
3 2 HASH (GROUP BY) (Cost=4 Card=4 Bytes=128) --> 정상적인 hash group by
4 3 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=448)
5 1 HASH JOIN (Cost=4 Card=4 Bytes=356)
6 5 VIEW (Cost=2 Card=4 Bytes=348)
7 6 TABLE ACCESS (FULL) OF 'SYS.SYS_TEMP_0FD9D6616_4EBAE980' (TABLE (TEMP)) (Cost=2 Card=4 Bytes=128)
8 5 INDEX (FULL SCAN) OF 'PK_DEPT' (INDEX (UNIQUE)) (Cost=1 Card=4 Bytes=8)


6.결론
적절한 방법을 사용하여 대용량 Group By 시에 Sort 를 피할수 있는방법을 제시하였으나
더뛰어난 solution 이 있을수 있으므로 여러가지 테스트 후에 적용하기 바란다.

'Oracle' 카테고리의 다른 글

DML 과 PARALLEL의 관계  (1) 2008.05.20
힌트로 오라클 파라미터를 바꿀수는 없을까?  (0) 2008.04.27
DML Error Logging 기능의 활용방안  (0) 2008.04.02
Posted by extremedb
,