Monday, September 24, 2007

P of EAA: Active Record

http://www.martinfowler.com/eaaCatalog/activeRecord.html

I got referred to this from a blog entry about PHP and Ruby. I decided to take a quick look at it and make some observations. I guess this would be the first in a series of Design Pattern reviews on Martin Fowler's bliki.

I have seen this pattern employed in some of my projects (especially Curam projects), it looks like a merge between a Java Bean with database methods. However, I don't think this is a good pattern to follow.

The problem with this pattern are:

  • you are doing more than one thing in a single class
  • the class encapsulates on single record, so what do you do about a list?
  • the class knows about the way it is going to be stored

I would recommend breaking this apart into two separate classes: a table module and a POJO. The table module is responsible for CRUD and list retrieval operations on the POJO and can be replaced with many different implementations.

The POJO need not represent a single row either, but it can contain an entire graph containing everything about the POJO.

e.g., A case can contain all it's evidences and a list of it's participants.

However, you do need some tool support for this like Hibernate which can do lazy fetching so you don't kill your database.

Conclusion: Anti-pattern

Although, I disagree with this pattern, it does not mean that this is the beginning of a bashing Martin Fowler, I do agree with almost everything he says.

Sunday, September 23, 2007

Generating a random SIN

The Canadian Social Insurance Number is something that would validate against the Luhn algorithm. Since SINs may be used as input data for some applications, I have translated the code that generates a number that passes the Luhn Algorithm in Wikipedia to something that can be put into into Selenium's user-extensions.js.

Selenium.prototype.getRandomSIN = function() {
    var length = 9;
    var digits = new Array()
    for(var i = 0; i < length - 1; ++i) {
        digits[i] = Math.floor(Math.random() * 9)
    }
    var sum = 0;
    var alt = true;
    for(var i = length - 2; i >= 0; --i) {
        if (alt) {
            var temp = digits[i];
            temp *= 2;
            if (temp > 9) {
                temp -= 9;
            }
            sum += temp;
        } else {
            sum += digits[i];
        }
        alt = !alt;
    }
    var modulo = sum % 10;
    if(modulo > 0) {
        digits[length-1] = 10 - modulo;
    }
    return digits.join("");
};

Saturday, September 22, 2007

Application and Reference Data Upgrades

This is a rethink of my previous data retention strategies with some application deployment considerations thrown in.

This approach should be ok with most DBAs as using DMXes to retain data may be too foreign a concept for most managed operations.  The deployment will not be automated using tools and procedures that are not developed by the deployment team so they can reuse what they have been doing in other applications and not have something different for your application.

The approach should work on any application, but Curam applications would benefit from the approach anyway.

Objectives

The following are the objectives I am trying to achieve in my approach.

  • Data is retained from one build to the next.
  • Support changes in "Reference Data" loads.
  • Application Deployment should be performed using standard methods and not some specialized batch file file not coming from the deployment team.
  • Infinitely repeatable in terms of number of environment supported.

Requirements

  • You require at least three deployment environments to test on.
  • You have a tool that would compare two different databases for both structure and data.

Environments

Environment 0 - This environment is where your developers perform their Integration Test.  This is the first shared environment that the application would get deployed to.  Data does not get retained in this environment.  Automatic application deployment and database resets are performed only to this level.  In Curam speak this is the only place where DMX files are used.

Environment 1 - This environment is where builds that are defined as stable and you have your test team performing their tests on.  Data is retained in this environment.

Environment 2 - This can be your final environment or your user acceptance environment.  Obviously data better be retained in this environment.  Applications and database changes fully tested on Environment 1 are the only ones moved over to this environment.

Environment n - As this approach is infinitely repeatable, we can take as many environments each taking application and database changes from Environment n - 1.

Approach

The first thing you need to perform is set a base line.  This tasks can be performed when at least two environments are put in place.  Before deploying the code base from environment 0 to environment 1 for the first time, the environment 0 data is reset to only contain the reference data.  The DBA then makes a backup or snapshot of the database state at that point - we call the backup Database Snapshot Build 0.  The deployer also takes the EAR of the current application to prepare it for Environment 1 - we call the EAR file EAR Build 0.

Environment 1 then gets it's database restored from Database Snapshot Build 0.  It then has EAR Build 0 deployed to it.

Now we have our base line ready for the next build.

So let's say the next build, Build z (where z > 0), is ready to be tested and deployed by the testers in Environment 1.  The following things happen:

  1. Developers smoke test Build z on environment 0.
  2. The database is reset to only contain reference data in environment 0.
  3. DBA takes a Database Snapshot Build z.
  4. DBA uses his database difference tool to determine what has changed between Database Snapshot Build z and Database Snapshot Build z-1.  The tool would generate the proper SQL change scripts.
  5. DBA restores the database in environment 0 to Database Snapshot Build z-1.
  6. DBA applies the SQL change scripts against the database in environment 0.
  7. The DBA verifies that the table changes relatively match Database Snapshot z.  I say relatively as there may be slight differences especially since the SQL change scripts may only alter the table and not be able to drop columns.
  8. The DBA then makes a backup of the database of environment 1.
  9. The DBA applies their SQL change script against the database.  If it works out, great, if not the SQL change script may need to be hand extended to support further data changes.
  10. EAR Build z is deployed on Environment 1.
  11. A smoke test on environment 1 is performed.  If there is data corruption, the database is reverted to its backup and the SQL change script may need to be hand extended to support further data changes.
  12. At this point Environment 1 has been updated and its data has been retained.

Now let's say we want to deploy Build z (where z > 0)the to the final environment, let's call it environment n (where n > 1) as it can be repeated as many times as possible.

  1. In Environment n we perform a database backup.
  2. In Environment n we apply the database change script from Environment n-1.  If it works out, great, if not the SQL change script may need to be hand extended to support further data changes.  Most likely reason would be Primary Key IDs for reference data may be clashing in which case we need to remap the clashing IDs to other IDs while maintaining referential integrity.
  3. EAR Build z is deployed on Environment n.
  4. A smoke test on environment n is performed.  If there is data corruption, the database is reverted to its backup and the SQL change script may need to be hand extended to support further data changes.  Most likely the scripts need to be modified to perform data cleanup tasks.
  5. At this point Environment n has been updated to build z and its data has been retained.

Conclusion

I think I have covered the objectives I have set out earlier.  With the exception of Environment 0 which is the environment were we bootstrap all the changes.

  • Any data entered is retained from one build to the next due to the fact we only apply patch changes to the database rather than resetting the whole thing.
  • The "Reference Data" is loaded up properly as the patch changes contain the reference data changes as well.
  • Application Deployment is performed using standard deployment procedures, there are no specialized scripts required.
  • There is no technical limit as to the number of environment levels that you can deploy as the deployment logic is recursive.

The toughest part may be locating a tool that would perform the database diffs that your DBA team would be comfortable with.  It is best to let the DBA choose the tool and they can get pretty pricey, but it will all be worth it come your second release to production.

Wednesday, September 05, 2007

What do you mean "Duplicate record on insert"?

One of the things you may eventually end up doing is doing a build dataextract to capture some of your test or reference data for locations and organization structures. So what happens when you run your tests again?

infrastructure:RUN_ID_DUPLICATE_RECORD: Duplicate record on insert.

Odd problem eh? I think the reason for this is the algorithm Curam uses when generating Unique IDs. Not sure exactly what they use, but it still has a chance of getting duplicates.

Currently the way I fix this is to do a search and replace on the IDs in the DMX files. The approach I take is

  1. load up the DMX file whose ID I want to change in Excel
  2. copy the ID column to another worksheet
  3. Add another column with the formula =1000 + ROW()
  4. Fill the column down.
  5. Add another column with the formula ="perl -pi.bak -e ""s/"&A1&"/"&B1&"/g"" filename1.dmx filename2.dmx"
  6. Fill the column down.
  7. Then I copy and paste the new column to a Command Prompt window and let it do the search and replace.
I guess Curam should improve their algorithm for the unique ID generator. I do hope they are using SecureRandom at least.