Merge Statement and Sequences

Merge Statement and Sequences

In my last project I had to create a merge statement which should insert or update rows in dependency of the source data. The behavior of the sequence was completely different compared to the normal use. But let’s explain my case:

In detail I have a dimension table (AGG_ASSETS) where I store several assets and a further import table (IMP_ASSETS). Each asset should become unique and will get his own ID. So I’ve created a table as follows:

CREATE TABLE AGG_ASSETS
(    
    ASSET_ID 		NUMBER,
    ASSET_NAME 		VARCHAR2(256 BYTE),
    HOSTS 			VARCHAR2(1024 BYTE),
    DATACENTER 		VARCHAR2(256 BYTE),
    INSERT_DATE 	DATE DEFAULT SYSTIMESTAMP,
    LASTFOUND_DATE 	DATE,
    LASTUPDATE_DATE DATE,
    HOSTS_AMOUNT 	NUMBER,
     PRIMARY KEY (ASSET_ID)
);

and a sequence to fill column ASSET_ID

CREATE SEQUENCE SEQ_ASSET_ID 
  INCREMENT BY 1 
  START WITH 100 
  NOCACHE;

Next I’d created the merge statement. If the asset isn’t available it will be inserted, if yes it will updated with the latest values:

MERGE INTO AGG_ASSETS T
    USING
      (
        SELECT 
          MAX(COLLECTION_TIME) AS COLLECTION_TIME,
          lower(ASSET_NAME) AS ASSET_NAME,    
          lower(LISTAGG(HOST_NAME, ',') WITHIN GROUP (ORDER BY HOST_NAME)) AS HOSTS,
          MAX( DATACENTER ),
          COUNT(*) AS HOSTS_AMOUNT
        FROM IMP_ASSETS
        WHERE TRUNC(COLLECTION_TIME) = (SELECT MAX(TRUNC(COLLECTION_TIME) ) FROM IMP_ASSETS)
        GROUP BY ASSET_NAME  
      ) S
      ON (S.ASSET_NAME = T.ASSET_NAME)
    WHEN MATCHED THEN
      UPDATE SET  T.HOSTS           = COALESCE( S.HOSTS, T.HOSTS ),
                  T.HOSTS_AMOUNT    = COALESCE( S.HOSTS_AMOUNT, T.HOSTS_AMOUNT ),
                  T.LASTFOUND_DATE  = COLLECTION_TIME,
                  T.LASTUPDATE_DATE = SYSDATE
    WHEN NOT MATCHED THEN
      INSERT (
        ASSET_ID,
        ASSET_NAME,
        HOSTS,
        DATACENTER,
        HOSTS_AMOUNT
      )
      VALUES ( 
        SEQ_ASSET_ID.NEXTVAL,  --using the sequence as usual
        S.CLUSTER_NAME,
        S.HOSTS,
        S.DATACENTER,
        S.HOSTS_AMOUNT
      );

 

It ran with success. The curious fact was that the sequence didn’t incremented by 1 instead it was 12. To be honest this is not harmful and the application users will never aware of this ID-jump but what was the reason because usually it should increment by 1 as defined ?

[table width=”300px” style=”background-color:#fff;font-size:10px;font-family:helvetica;”]

ASSET_ID  ,  ASSET_NAME,    HOSTS ,   HOSTS_AMOUNT ,   DATACENTER
125 [attr style=”background-color:yellow”],sappd02cl-clust,sappd05hr:sappd06hr:sappd07hr:sappd08hr,4 ,sidea
124 [attr style=”background-color:yellow”],sappd01-04,sappd01hr:sappd02hr:sappd03hr:sappd04hr,4,sideb
112,sapo04cl,sapo13hr:sapo14hr:sapo15hr:sapo16hr,4 ,sidea
111,sapo03cl,sapo09hr:sapo10hr:sapo11hr:sapo12hr,4 ,sideb
110,sapo02cl-clust,sapo05hr:sapo06hr:sapo07hr:sapo08hr,4  ,sidea
109,sapo01h-clust, sapo01hr:sapo02hr:sapo03hr:sapo04hr ,4  ,sideb
108,sapdb02cl,sapdb05hr:sapdb06hr:sapdb07hr:sapdb08hr ,4  ,sidea
[/table]

From the Oracle Documentation I read this:

For each row merged by a MERGE statement. The reference to NEXTVAL can appear in the merge_insert_clause or the merge_update_clause or both. The NEXTVALUE value is incremented for each row updated and for each row inserted, even if the sequence number is not actually used in the update or insert operation. If NEXTVAL is specified more than once in any of these locations, then the sequence is incremented once for each row and returns the same value for all occurrences of NEXTVAL for that row.

For avoiding those sequence jumps the solution is to encapsulate the sequence-call in an extra function. This will prevent the merge statement to do the ID-jump.

CREATE OR REPLACE FUNCTION F_SEQ_NEXTVAL (p_SEQ_NAME  IN  VARCHAR2)
  RETURN NUMBER
IS
  v_SEQ_VAL  NUMBER;
BEGIN
  EXECUTE IMMEDIATE 'select ' || p_SEQ_NAME || '.nextval from dual'
     INTO v_SEQ_VAL;
 
  RETURN v_SEQ_VAL;
END F_SEQ_NEXTVAL;
MERGE INTO AGG_ASSETS T
    USING
      (
        SELECT 
          MAX(COLLECTION_TIME) AS COLLECTION_TIME,
          lower(ASSET_NAME) AS ASSET_NAME,    
          lower(LISTAGG(HOST_NAME, ',') WITHIN GROUP (ORDER BY HOST_NAME)) AS HOSTS,
          MAX( DATACENTER ),
          COUNT(*) AS HOSTS_AMOUNT
        FROM IMP_ASSETS
        WHERE TRUNC(COLLECTION_TIME) = (SELECT MAX(TRUNC(COLLECTION_TIME) ) FROM IMP_ASSETS)
        GROUP BY ASSET_NAME  
      ) S
      ON (S.ASSET_NAME = T.ASSET_NAME)
    WHEN MATCHED THEN
      UPDATE SET  T.HOSTS           = COALESCE( S.HOSTS, T.HOSTS ),
                  T.HOSTS_AMOUNT    = COALESCE( S.HOSTS_AMOUNT, T.HOSTS_AMOUNT ),
                  T.LASTFOUND_DATE  = COLLECTION_TIME,
                  T.LASTUPDATE_DATE = SYSDATE
    WHEN NOT MATCHED THEN
      INSERT (
        ASSET_ID,
        ASSET_NAME,
        HOSTS,
        DATACENTER,
        HOSTS_AMOUNT
      )
      VALUES ( 
        F_SEQ_NEXTVAL( 'SEQ_ASSET_ID' ),  --using self-made function
        S.CLUSTER_NAME,
        S.HOSTS,
        S.DATACENTER,
        S.HOSTS_AMOUNT
      );

After re-executing the statement with the function “F_SEQ_NEXTVAL”, the IDs were provided without jumps.

[table width=”300px” style=”background-color:#fff;font-size:10px;font-family:helvetica;”]

ASSET_ID  ,  ASSET_NAME,    HOSTS ,   HOSTS_AMOUNT ,   DATACENTER
114 [attr style=”background-color:yellow”],sappd02cl-clust,sappd05hr:sappd06hr:sappd07hr:sappd08hr,4 ,sidea
113 [attr style=”background-color:yellow”],sappd01-04,sappd01hr:sappd02hr:sappd03hr:sappd04hr,4,sideb
112,sapo04cl,sapo13hr:sapo14hr:sapo15hr:sapo16hr,4 ,sidea
111,sapo03cl,sapo09hr:sapo10hr:sapo11hr:sapo12hr,4 ,sideb
110,sapo02cl-clust,sapo05hr:sapo06hr:sapo07hr:sapo08hr,4  ,sidea
109,sapo01h-clust, sapo01hr:sapo02hr:sapo03hr:sapo04hr ,4  ,sideb
108,sapdb02cl,sapdb05hr:sapdb06hr:sapdb07hr:sapdb08hr ,4  ,sidea
[/table]

Schreibe einen Kommentar