GoldenGate 12.2.0.1 New Features: Metadata Encapsulation

Basics

imageGoldenGate pre-12.2.0.1 doesn’t store table structure in trail files and in some cases it was an issue: when we received trail file – we didn’t know source table structure from the file. We had only record and this record contained field values. But we didn’t know field names. So earlier we chose how to manage this. There are two ways:

– assume that source table is the same as target table (ASSUMETARGETDEFS). In this case first value from trail record was written to first field of target table, the second value to the second field, etc. But this method doesn’t work in general case where source and target tables are different

– extract table definitions from source and gave this definition to replicat (SOURCEDEFS). In this case trail files were parsed and applied according to the static source table definition (stored in file). This method gave us flexibility but increased number of manual work: when source table structure changes – we must regenerate DEF-file or we can lose data consistency.

Now we don’t need ASSUMETARGETDEFS and SOURCEDEFS anymore. GoldenGate automatically transfers definitions in trail files. Moreover it retransfer definition when table structure changes (but retransfer will be triggered only after DML on this table) and also retransfer will be done after switch to new trail file (also we should wait for first DML).

There are two special new record types for table definition transfer: DDR (Database Definition Record) and TDR (Table Definition Record). The first one describes database, the second one is for table structure.

How it looks like in trail files?

I have create two extracts: the first extract wrote data in 12.1 trail format and the second one wrote in 12.2 trail format:

I’ve done two inserts and opened trail files using logdump. The first trail file was version 12.1. I did count command for it:

So we see that trail file contains 4 records: 2 INSERTS, 1 RestartOK and 1 Others (which is Fileheader really). Nothing interesting.

Let’s do the same exercise for trail file with 12.2 format.

Now we see two new records were added. The first one is metadata record for database (ORCL) and the second one is metadata for table (ORCL.GGTEST.DEPARTMENTS). Let’s look into details. There is new command in LOGDUMP – SCANFORMETADATA. I will position logdump to begin of trail file and find next metadata record:

*

So the first metadata record is DDR (table definition record) record. Its version is 1, database type is Oracle, TimeZone is GMT+03:00, etc. This is important information and GoldenGate can understand from this DDR record, for example, how to convert characterset before applying to target.

Let’s search for the next metadata record.

The next metadata record is TDR (table definition record). Its structure is compatible with DEF-file. We can see description of each field in the beginning.

Benefits

1. We can use MAP in extract and datapump. For example this configuration is possible now:

In this case TDR will represent changes. TDR record will contain mapped table name:

2. If source and target table structure is different then all fields are matched by names. Absent field are skipped automatically. For example I have the following table structures:

Source Target

And simple mapping (map orcl.ggtest.*, target ggtest.*) just works.

3. Correct column mapping after DDL applying (adding or removing column). For example I turned on source DDL capturing and have the following replicat parameter file:

I run the following statements on the source:

And everything works as expected. I got the following rows on the target:

image

Cost

Everything has its cost. The main question I hear again and again: what is the cost of passing DDR and TDR records through trail files? What is the trail file size increase?

Of course there is small overhead of DDR and TDR records. But it is not so big. For example size of TDR for departments table is 86 bytes. So we lose 86 bytes for table DEPARTMENTS in every file. Even If file size is 500 Mb 86 bytes is nothing, almost zero.

But there is another side effect: rows in 12.2 format takes less place than in 12.1 format. This is because 12.2 trail file doesn’t contain table name in every  DML record – it references to special ID which uniquely identify TDR record in trail file. This fact completely eliminate additional cost for DDR and TDR. Even more finally size of 12.2 trail file is less than size of 12.1 trail file.

Summary

Amazing feature! It makes our life much simpler and allows us to solve some issues which were unresolveable before. Also I see this feature as base for crossplatform DDL and for developing GoldenGate Adapeter for some BigData technologies with flexible data structure.

2 Comments

Comments are closed.