<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>RealityForge.org: Scripting Databases</title>
    <link>/articles/2005/11/23/scripting-databases</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>A little short for a storm trooper</description>
    <item>
      <title>Scripting Databases</title>
      <description>&lt;p&gt;I have always considered &lt;a href="http://www.agiledata.org/essays/dataModeling101.html"&gt;data modeling&lt;/a&gt; as the one of the most critical aspects of software development. A well designed data model can outlive the specific software product it was designed for and provide a valuable asset to the organization. In the ideal scenario the data model will &lt;a href="http://www.agiledata.org/essays/evolutionaryDevelopment.html"&gt;evolve&lt;/a&gt; and &lt;a href="http://www.martinfowler.com/articles/evodb.html"&gt;adapt&lt;/a&gt; with the organization as requirements change.&lt;/p&gt;


	&lt;p&gt;However, as a software developer I have rarely worked in an environment where I needed a deep understanding of any particular vendors database implementation. Recently I have been developing database-centric software on a number of different platforms. It may be that I am missing something but I have yet to find a decent database centric scripting language.&lt;/p&gt;


	&lt;p&gt;Consider the following problem that I was tackling a month ago. We have a central database server and application server to access the database. Database writes &lt;strong&gt;MUST&lt;/strong&gt; occur through the application server to maintain data integrity due to limitations at our software layer. We also support distributed servers that can periodically synchronize with the central server. The synchronization process requires heavy use of buisness logic to detect and resolve conflicts in synchronized data. Some data may also come from other external systems such as personal or payroll. This data needs to be cleaned and converted into our schema before being synchronized with the central server.&lt;/p&gt;


	&lt;p&gt;I was tasked with automating the synchronization from an external system to our central server. I also needed to have a test run through the synchronization prior to the real run to stop the process if synchronization would fail. This involved the following steps&lt;/p&gt;


	&lt;ol&gt;
	&lt;li&gt;Import, convert and clean the data from external database into &lt;span class="caps"&gt;INCOMING&lt;/span&gt; database &lt;/li&gt;
		&lt;li&gt;Backup &lt;span class="caps"&gt;INCOMING&lt;/span&gt; database&lt;/li&gt;
		&lt;li&gt;Backup &lt;span class="caps"&gt;CENTRAL&lt;/span&gt; database&lt;/li&gt;
		&lt;li&gt;Restore &lt;span class="caps"&gt;CENTRAL&lt;/span&gt; database into &lt;span class="caps"&gt;TEST&lt;/span&gt; database&lt;/li&gt;
		&lt;li&gt;Run synchronization between &lt;span class="caps"&gt;INCOMING&lt;/span&gt; and &lt;span class="caps"&gt;TEST&lt;/span&gt;. This involves;
	&lt;ol&gt;
	&lt;li&gt;Startup &lt;span class="caps"&gt;TEST&lt;/span&gt; application &lt;/li&gt;
		&lt;li&gt;Start &lt;span class="caps"&gt;INCOMING&lt;/span&gt; application and initiate synchronization with &lt;span class="caps"&gt;TEST&lt;/span&gt;&lt;/li&gt;
		&lt;li&gt;Shutdow &lt;span class="caps"&gt;TEST&lt;/span&gt; application &lt;/li&gt;
	&lt;/ol&gt;
	&lt;/li&gt;
		&lt;li&gt;If synchronization in previous step was successful then synchronize between &lt;span class="caps"&gt;INCOMING&lt;/span&gt; and &lt;span class="caps"&gt;CENTRAL&lt;/span&gt; servers. This involves;
	&lt;ol&gt;
	&lt;li&gt;Start &lt;span class="caps"&gt;INCOMING&lt;/span&gt; server and initiate synchronization with &lt;span class="caps"&gt;CENTRAL&lt;/span&gt;&lt;/li&gt;
	&lt;/ol&gt;
	&lt;/li&gt;
		&lt;li&gt;Backup &lt;span class="caps"&gt;CENTRAL&lt;/span&gt; database&lt;/li&gt;
	&lt;/ol&gt;


	&lt;p&gt;At each step along the way we need to log information about progress into another database as the process can take several hours. If an error occurs we need to inform appropriate party.&lt;/p&gt;


	&lt;p&gt;I ended up implementing this as a stored procedure in Microsoft &lt;span class="caps"&gt;SQL &lt;/span&gt;Server. This is not without it&amp;#8217;s problems. For starters it is tied to a specific vendors database server (and possibly a specific database server version). Secondly there is a large number of ugly code hacks. To execute external processes in &lt;span class="caps"&gt;SQL &lt;/span&gt;Server you need to create a job with the command then start the job. Then I poll a system table every 10 seconds until the job has completed using &lt;a href="http://www.acm.org/classics/oct95/"&gt;GOTO&amp;#8217;s&lt;/a&gt; .&lt;/p&gt;


	&lt;p&gt;If that was not bad enough, I needed to come up with a mechanism to log progress messages to a different database. My problem was that if an error occured during a number of the steps a transaction roll back was issued which reverted all the log messages. The only way I could find to get around this was to open another connection to &lt;span class="caps"&gt;SQL &lt;/span&gt;Server using the &lt;a href="http://www.sqlteam.com/item.asp?ItemID=9093"&gt;SQL-DMO&lt;/a&gt; COM object. The &lt;span class="caps"&gt;COM&lt;/span&gt; object only used to write log entries and as it was a different connection it would not be rolled back when the main transaction rolled back. &lt;b&gt;ugly&lt;/b&gt;!!&lt;/p&gt;


These uglies occur when I was just automating the process. When you get down to the data manipulation and synchronization it gets even less appealing. The code to extract data from the external database and clean it prior to putting it into &lt;span class="caps"&gt;INCOMING&lt;/span&gt; is contained within 
	&lt;ul&gt;
	&lt;li&gt;an xml document defining transformation rules&lt;/li&gt;
		&lt;li&gt;auxilliary &lt;span class="caps"&gt;SQL&lt;/span&gt; scripts to support non-standard rules&lt;/li&gt;
		&lt;li&gt;a look-up-table in another database&lt;/li&gt;
	&lt;/ul&gt;


The code to synchronize the data between multiple applications is placed within
	&lt;ul&gt;
	&lt;li&gt;another xml document defining consistancy rules&lt;/li&gt;
		&lt;li&gt;custom java code to support non-standard rules&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;It is not a pretty sight.&lt;/p&gt;


	&lt;p&gt;Admittedly if the system was to be rewritten from scratch the whole process would probably be a lot cleaner. But even then, I was skeptical that there was a &lt;i&gt;nice&lt;/i&gt; way to implement this. The software would need to be able to define a domain model with rules that contain both imperative/procedural (ie java or some other imperative language) and declarative elements (ie sql and some data constraint language).&lt;/p&gt;


	&lt;p&gt;Previously I had thought that the best path to tackle this problem was to use some sort of &lt;a href="http://www.martinfowler.com/bliki/DomainSpecificLanguage.html"&gt;Domain Specific Language&lt;/a&gt; to define the declarative aspects of the data model and then define the procedural elements using a language like Java. I have used this approach with success before. I defined the static model characteristerics and data constraints in an &lt;span class="caps"&gt;XML&lt;/span&gt; document and then used &lt;a href="http://jakarta.apache.org/velocity/"&gt;Velocity&lt;/a&gt; to generate the java code that was enhanced with procedural elements.&lt;/p&gt;


	&lt;p&gt;Recently I have been playing with &lt;a href="http://www.rubyonrails.org/"&gt;Ruby on Rails&lt;/a&gt; and I have been re-evaluating my position. Rails has the &lt;a href="http://ar.rubyonrails.com/"&gt;ActiveRecord&lt;/a&gt; library that allows you to define model classes (using the &lt;a href="http://www.martinfowler.com/eaaCatalog/activeRecord.html"&gt;Active Record pattern&lt;/a&gt; as described by Martin Fowler). These model classes can define &lt;a href="http://ar.rubyonrails.com/classes/ActiveRecord/Validations.html"&gt;validations&lt;/a&gt; that offer a psuedo-constraint language for the data. It also offers support for defining &lt;a href="http://ar.rubyonrails.com/classes/ActiveRecord/Associations/ClassMethods.html"&gt;associations&lt;/a&gt;, and &lt;a href="http://ar.rubyonrails.com/classes/ActiveRecord/Aggregations/ClassMethods.html"&gt;aggregations&lt;/a&gt; between different active record elements and is generally a nice and easy toolkit to use to access relational data. If you need to escape to &lt;span class="caps"&gt;SQL&lt;/span&gt; for performance or conceptual reasons then that is &lt;a href="http://api.rubyonrails.com/classes/ActiveRecord/Base.html#M000691"&gt;possible&lt;/a&gt; with few hassles.&lt;/p&gt;


	&lt;p&gt;Even more recently I discovered &lt;a href="http://jamis.jamisbuck.org/articles/2005/09/27/getting-started-with-activerecord-migrations"&gt;migrations&lt;/a&gt; in rails that make it possible to incrementally modify your database schema as your application evolves. You can add or remove columns, tables, indexes etc all the while preserving and migrating data as per application requirements. To upgrade to the latest schema you need only run the &amp;#8220;migrate&amp;#8221; rake task and be done with it.&lt;/p&gt;


	&lt;p&gt;This makes rails or more specifically ActiveRecord a very strong contender for my toolkit of choice to script database It would make it possible to avoid vendor specific stored procedures or &lt;span class="caps"&gt;SQL&lt;/span&gt;, to a certain degree and make it much easier to develop software to manipulate schemas and data.&lt;/p&gt;


	&lt;p&gt;The only negative is that it is in ruby and I have a greater understanding of the java language. Then again maybe ruby does not require the breadth of understanding java does &amp;#8211; ir is much simpler to just get stuff done.&lt;/p&gt;


	&lt;p&gt;Maybe ruby is the next java.&lt;/p&gt;</description>
      <pubDate>Wed, 23 Nov 2005 11:07:00 +1100</pubDate>
      <guid isPermaLink="false">urn:uuid:1daa5b7926980b37de5bc49349bb54ac</guid>
      <author>Peter Donald</author>
      <link>http://www.realityforge.org/articles/2005/11/23/scripting-databases</link>
      <category>Rails</category>
      <category>Java</category>
      <trackback:ping>http://www.realityforge.org/articles/trackback/6</trackback:ping>
    </item>
  </channel>
</rss>
