Oracle Enterprise Data Quality 12

With my new customer architect hat on, I’m looking across a broad range of applications and services to help my organisation be more effective and productive. As a large retail organisation, customer data quality is key to our ability to talk to our customers effectively. Imagine sending letters out to customers – in the digital age, that’s a pretty expensive channel but one which is required none-the-less. If we get the address wrong, or we have duplicate entries for a customer, we end up with a failed communication (really bad for our customer) or duplicate communications (annoying for our customer and bad for our bottom line). Oracle Enterprise Data Quality can help us resolve such problems.

Quickstats Profiler on the S_CONTACT table

I’m running Oracle Customer Hub (the Customer Master Data Management solution formerly known as UCM or Universal Customer Master), which is a Siebel vertical product that orchestrates and governs a process to take data in from sources, cleanse, enrich, match and de-duplicate, before publishing to consuming systems.

EDQ is a critical component of this solution, as it fills the cleanse, enrich and match capabilities of our end to end MDM process. It’s also an extremely impressive and feature rich product and I’m going to explore some of it’s features over the coming months. First things first, let’s download and install Oracle EDQ release 12:

  1. First up, you’ll need a Java JRE to run the front end components, so download and install a 64-bit JRE 8 from Oracle’s Java home
  2. Now, let’s get a trial copy of EDQ installed on a VM so we can have a bit of a mess around. Skipping the usual eDelivery route, head straight to the EDQ product page to download the product
  3. I’ve gone for the standalone 12.2.1.0 Windows installer version, since I’m running a Windows 2012 R2 server VM, but you can sit the newer 12.2.2.0 installation on top of a Weblogic server running on whatever host OS you choose
  4. Once downloaded, run the installer and simply following the instructions – it’s really straight forward for an Oracle product
  5. You’ll find a shortcut to the Launchpad – this is where you’ll find shortcuts to all of the features of the platform

    The EDQ Launchpad

  6. Launch Director, where you’ll set up EDQ to point to your data source and carry out activities. If you’re prompted to open a JNLP file, and your installation does not automatically associate this with the Java Web Start, then navigate to your Java installation folder and associate it with “javaws.exe”
  7. The default username and password is dnadmin/dnadmin – be sure to change the password to something memorable. Log in and you’re ready to go

    EDQ Director

Simple as that! Next time, we’ll connect EDQ up to our Siebel server database (you know, the one with all the lovely sample DB data in it) and run some profiling jobs.

 

EDQ and Siebel – Batch DeDuplication

Having set up Siebel and Oracle Enterprise Data Quality (from now on in known as EDQ), I wanted to put it through it’s paces.

Real Time De-duplication works like a charm, provided you kick off the Real Time jobs in Director and have your Web Service URLs set up correctly.

Batch De-duplication, however, uses a different mechanism (JMX) and the out of the box installation and configuration doesn’t quite leave you in a position to run batch dedupe through the Siebel Client.

After a really useful conversation with Mike, Nick and Richard at Oracle (experts in EDQ and it’s integration into Siebel), I was able to make appropriate changes to the configuration to enable batch de-duplication. My heart felt thanks go out to them all for their dedication and commitment to helping lowly developers like myself!

JMX Port Configuration

By default, EDQ on Windows configures JMX to listen on port 9005. However, by default the Siebel Connector is configured to look on port 8090.

To rectify:

  1. Modify the entry in dnd.properties to match the port specified in director.properties. For example:

    jmxserver = hostname:9005

  2. There is no need to restart anything. The next job to use the DQ Connector will automatically re-read the configuration

JMX Interface

By default, the JRE used by EDQ publishes the JMX interface on localhost (127.0.0.1). Now this may be because of my set up on VirtualBox – it may or may not cause you a problem. However, if you see errors in the connector log relating to connecting to JMX, you may be experiencing this problem.

To resolve:

  1. Create a new file called jre.properties in the same folder as the director.properties file on the EDQ server. Using the default installation, this will be in:

    C:\Program Files\Datanomic\dnDirector\config

  2. Within the file, add the following configuration item:

    java.rmi.server.hostname = <EDQ HostName>

  3. Restart the Datanomic Application Server service

Test Batch DeDuplication

Testing is now straight forward:

  1. From within Siebel, navigate to Site Map > Administration – Server Management > Jobs
  2. Create a new job, using the ‘Batch Account match’ template
  3. Submit the job and await completion
  4. Navigate to Site Map > Administration – Data Quality > Duplicate Accounts
  5. See your deduplicated data and merge!

Having now used EDQ alongside Siebel, I am really, really impressed. Previous DQ attempts have felt really clunky but EDQ fits really nicely alongside Siebel. The real time deduplication works well and is very easy to configure. Batch cleansing and deduplication also works flawlessly, once the tweaks above have been applied.

I get the impression that Oracle are really committed to this software as a solution, too. Whereas SSA-NAME5 and ISS seemed like stop gap solutions, EDQ is feeling like an integrated technology and something that Oracle are building into their Fusion and Siebel roadmaps. Here’s hoping!

OEDQ and Siebel – Configuring SSL

UPDATED: Following some really useful feedback from a very kind gentleman from Oracle, it has been noted that EDQ will actually allow client applications, such as Siebel, to invoke the Web Services over standard HTTP. As such, there is no need to configure Siebel to use SSL as described below. Simply specify the HTTP URLs in dnd.properties, using the default 9002 port. For example:

httpprefix = http://OEDQ9-VM:9002/dndirector/webservices

One thing that really stumped me was getting Siebel to talk to OEDQ via the predefined Web Services.

Unlike other configuration that I’m used to, the OEDQ Web Service URLs are not stored or mastered in Siebel – they are defined implicitly when you install OEDQ. You merely tell Siebel where to find them via the config file. This causes a problem as the Web Services are configured to use https (SSL). What you’ll see when you add a new Account or Contact is an error in the adapter log:

The way around this is rather convoluted, but interesting and effective nonetheless:

Generate a Self Certified SSL Certificate

  1. Connect to your OEDQ server machine
  2. Note that the current Apache keystore resides, by default, in <Install Folder>\Datanomic\dnDirector\tomcat\6.0\conf\dncert.p12
  3. To make things simple, we’re simply going to create a new keystore, so that we can revert to default at any point. We’ll keep the same name and keystore type (PKCS12) so we don’t have to reconfigure Apache
    • This is easily changed in server.xml, though!
  4. Rename the existing dncert.p12 file
  5. Open a command prompt and CD to your Java JRE bin folder
  6. Execute the following command line to generate a new keystore and certificate:
  7. Obviously, you’ll need to substitute your OEDQ installation folder location into the command line above
  8. You’ll be prompted for some information – the thing to do here is use the machine name when asked for your First Name and Last Name
  9. Go into Services and restart the ‘Datanomic Application Server’ service

That takes care of Apache.

Test the Certificate

  1. On the Siebel Server host, fire up Internet Explorer and navigate to the OEDQ Web Services URL (by default: https://<HOST>:9004/dndirector/webservices)
  2. You’ll get a certificate error. Continue to the web site then click the ‘Certificate Error’ button in the top right of IE, next to the address bar, and select ‘View Certificates’
  3. Click ‘Install Certificate’
  4. Using the wizard, place the certificate in the following store: “Trusted Root Authentication Authorities”
  5. Click OK then close IE
  6. Reopen, navigate to the URL above and notice that the nature of the certificate has changed. We’re now ready to tell the Siebel adapter to use the new certificate

Tell the JRE instance to trust the new certificate

  1. On your Siebel Server host, download the certificate from the OEQD server by opening IE and going to Tools > Internet Options > Content > Certificates > Trusted Certificate Root Authorities
  2. Select you OEDQ certificate and select ‘Export’
  3. Use the wizard to export a ‘DER Encoded binary x.509 (.CER)’ certificate to your root drive (C:\root.cer)
  4. When you set up the OEDQ adapter in Siebel, you set a ‘javalib’ property in the ‘dnd.properties’ file – note down this location
  5. Open a command prompt and navigate to the bin folder of this Java instance
  6. Execute the following command to import the certificate into the Java keystore:

    • Note that the default keystore password is ‘changeit’


That’s it! You should now be able to invoke the real time Web Services within Siebel to deduplicate Account and Contact data.

Any problems, please use the comments box below and I’ll see what I can do to help.

OEDQ and Siebel – Configure OEDQ

UPDATED: I’ve had some really useful feedback from a kindly gentleman from Oracle. I’m informed that there is no requirement for an Oracle client installation on the OEDQ server if using an Oracle staging area: EDQ uses JDBC to make the connection itself. In that respect, you must also specify the database SID, not the Service Name, when you configure dnd.properties and the Staging Area database. I’ve updated the article to reflect these comments and would like to give my sincere thanks to the person who contacted me.

Having set up the Siebel Server configuration, we need to tweak OEDQ to start matching and cleansing data from our Siebel system.

Once again, Oracle have this covered with some detailed installation instructions. The basic steps are:

  1. Install an Oracle Database instance to store match results. I created a simple, 11g Enterprise instance on the OEDQ server, a database named OEDQ and a user account called OEDQ
  2. Copy the contents of ‘config.zip’ into the OEDQ installation folder
    • You MUST restart the OEDQ Server at this point, via the Services control panel. If you do not, the .dxi import step will fail
  3. Import the edq-cds-9.0.x.dxi file via Director
    • Check that all the jobs and processes from the import file have successfully been created in your repository. If not, check step 2
  4. Run the appropriate SQL to create the temporary tables in the database instance
  5. Update the ‘Batch Data Staging Area’ entry, in Director, to point to your staging database
  6. Revisit the dnd.properties file on your Siebel Server to verify and update appropriate configuration options, specifically the database host, instance name and schema as created above
  7. Run the ‘Real Time’ jobs in OEDQ Director, so that the Web Services invoked by Siebel can access the OEDQ functionality
    • If the jobs fail to start, again check step 2

I came across an issue with the #Database Settings section of dnd.properties. Note that the correct format for the Oracle database connection is:

oracle:sid@host[:port]/user/pw

  • The :port may be omitted if the port is the default 1521
  • The /pw may be omitted if the password is the same as the username

And that’s almost it in terms of set up of the two applications.

One last step is required, however: in order to get the Real Time DeDuplication to work, we need to do some tweaking of the Apache configuration. Essentially, we’ll have to do some tweaking in order to allow our Siebel instances to talk to the OEDQ Web Server via SSL. I’ll cover this in my next post.

OEDQ and Siebel – Configure Siebel

Now that we have a working instance of OEDQ, we’re going to look at how we integrate it into Siebel.

Again, Oracle really have got the documentation right this time: the instructions are extremely detailed and simple to follow. There are two stages: configure Siebel and configure OEDQ.

The initial configuration of Siebel is really straight forward and you should simply follow these steps in the installation guide:

  1. Copy the DLL file to the Siebel Server machine
  2. Create and configure the dnd.parms file
  3. Copy across appropriate JAR files
  4. Configure the dnd.properties file – this will be covered in more detail in the next post
  5. Enable the ‘Data Quality’ component group and restart the Siebel Server
  6. Configure the Data Quality matching parameters on the Data Quality component and the Object Manager component in use (I’m using Public Sector)
  7. Create Job Templates for batch Account, Contact and Address cleansing
  8. Configure the ‘Data Quality Administration’ options
  9. Set up ‘Third Party Administration’ options, including both the Field Mappings and Vendor Parameters – arduous, but really straight forward
  10. Enable User Preferences

I’m in the process of writing a Business Service, in the style of the Oracle Policy Automation, to populate the OEDQ seed data – I’ll update the post with a SIF file once I’m done.

That’s all you need to do on the Siebel side for now but we’re still not ready to test. Next, we’ll look at configuring OEDQ to work alongside Siebel and finish up the Siebel config, allowing us to test.