Oracle DataSource for Apache Hadoop (OD4H): introduction



Currently we see that Hadoop is becoming part of Enterprise Data Warehouse family. But family should be connected to each other. Sometimes we need access to Hadoop from Oracle Database. Sometimes Hadoop users need enterprise data stored in Oracle database.

Hive has very interesting concept — External Tables which allow you to define Java classes to access external database and present it as a native hive table.

Oracle Datasource for Apache Hadoop (formerly Oracle Table Access for Apache Hadoop) turns Oracle Database tables into a Hadoop data source (i.e., external table) enabling direct and consistent Hive QL/Spark SQL queries, as well as direct Hadoop API access. Applications can join master data or dimension data in Oracle Database with data stored in Hadoop. Additionally data can be written back to Oracle Database after processing.

Oracle Datasource for Apache Hadoop optimizes a query’s execution plans using predicate and projection pushdown, and partition pruning. Database table access is performed in parallel based on the selected split patterns, using smart and secure connections (Kerberos, SSL, Oracle Wallet), regulated by both Hadoop (i.e., maximum concurrent tasks) and Oracle DBAs (i.e., max pool size).

Continue reading ‘Oracle DataSource for Apache Hadoop (OD4H): introduction’ »

GoldenGate Cloud Service (GGCS): Configure GoldenGate to replicate data

imageGoldenGate Cloud Service is part of Oracle’s PaaS portfolio. From technical perspective it is just standard GoldenGate deployed on VM in Oracle Cloud. So same already proven architecture works in Cloud.

GGCS can be used for different cases from zero downtime migration to real-time DWH feeding. More cases like BigData and data pipeline feeding are on the way.

So what do you need to use GoldenGate Cloud Service. You should have:

  • database instance in cloud (DBaaS or ExadataCS)
  • subscription for GoldenGate Cloud Service.
  • storage cloud service (it used for backup)

GGCS is available as Non Metered service now. If you use GGCS Non-Metered Service then you should pay money even if your GoldenGate instance is down.

Soon GGCS will be available as a Metered Service. So it will possible to pay on per hour basis. This capability will open new cases like Dev/Test Cloud Environment Synchronization. Just imagine you have database in cloud for testing purposes. You should periodically (every week/month) synchronize it with production database. So you don’t need GGCS running for all time but run it for 2 hours every Sunday to apply captured data. This approach can save a lot of money.

Continue reading ‘GoldenGate Cloud Service (GGCS): Configure GoldenGate to replicate data’ »

Oracle Database Cloud Service: Create database

imageOracle Cloud provides several Oracle Database offerings. You can choose from

  • a single schema based service
  • virtual machine with a fully configured and running Oracle Database Instance
  • Exadata Service with all the database features.

You can look into details here:

We will talk about Database as a Service and not about Schema or Exadata here. So my final goal is to create database for GoldenGate replication which is separate service. Ok let’s start.


Continue reading ‘Oracle Database Cloud Service: Create database’ »