Dumping database to YAML fixtures

Posted by Peter Donald Wed, 14 Dec 2005 02:28:00 GMT

I finally got fed up with the bugs in Rails 1.0 handling of mysql connections and have decided to move to postgres. I have talked about the move and even migrated my DDL to the database agnostic schema language. The one thing I had not yet thought about was how to move my data.

After doing a little bit of asking and searching I decided to just dump my data out the databse to fixtures and then reload these fixtures. This is surprisingly simple using the rake task.

desc 'Dump a database to yaml fixtures. '
task :dump_fixtures => :environment do
  path = ENV['FIXTURE_DIR'] || "#{RAILS_ROOT}/data"

  ActiveRecord::Base.establish_connection(RAILS_ENV.to_sym)
  ActiveRecord::Base.connection.
             select_values('show tables').each do |table_name|
    i = 0
    File.open("#{path}/#{table_name}.yml", 'wb') do |file|
      file.write ActiveRecord::Base.connection.
          select_all("SELECT * FROM #{table_name}").inject({}) { |hash, record|
      hash["#{table_name}_#{i += 1}"] = record
        hash
      }.to_yaml
    end
  end
end

desc "Reset Database data to that in fixtures that were dumped"
task :load_dumped_fixtures => :environment do
  require 'active_record/fixtures'
  ActiveRecord::Base.establish_connection(RAILS_ENV.to_sym)
  path = ENV['FIXTURE_DIR'] || "#{RAILS_ROOT}/data"
  Dir.glob("#{path}/*.{yml}").each do |fixture_file|
    Fixtures.create_fixtures(path, File.basename(fixture_file, '.*'))
  end
end
This is Mysql specific due to the use of select_values('show tables') but apparently sqlite usues select_values('.table') and postgres uses the following.
 select_values(<<-end_sql
SELECT c.relname
 FROM pg_class c
   LEFT JOIN pg_roles r     ON r.oid = c.relowner
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
 WHERE c.relkind IN ('r','')
   AND n.nspname IN ('myappschema', 'public')
   AND pg_table_is_visible(c.oid)
end_sql

This worked like a charm except when my data contained embedded ERB directives because when rails loads the fixtures it attempts to evaluate the fixture as an ERB script. In this scenario I just needed to nip into the read_fixture_files method in $RUBY_HOME\activerecord-1.13.2\lib\active_record\fixtures.rb and comment out the erb rendering while I imported my data.

Update: The code for this can be found is available in dump_fixtures.rake

Posted in  | no comments | no trackbacks

Many-to-many rails goodness with the through directive

Posted by Peter Donald Mon, 05 Dec 2005 06:15:00 GMT

The has_many :through option will rock your socks off! (Or at least make things a little easier) Read on to understand why!

Rails has the ability to declare many-to-many relationships between ActiveRecord objects using the has_and_belongs_to_many macro. HABTM relationships as they are affectionately known, require a third join table that contains the keys of the two domain objects.

Consider the example where a Student can be enrolled in 1 or more Subjects and each Subject can have 1 or more Students. This would be modelled by the following domain classes.

class Student < ActiveRecord::Base
  has_and_belongs_to_many :subjects
  ...
end

class Subject < ActiveRecord::Base
  has_and_belongs_to_many :students
  ...
end

With the sql DDL looking something like;


CREATE TABLE students (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(100) NOT NULL,
  ...
  PRIMARY KEY (id) 
) ENGINE = InnoDB;

CREATE TABLE subjects (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(100) NOT NULL,
  ...
  PRIMARY KEY (id) 
) ENGINE = InnoDB;

CREATE TABLE students_subjects (
  student_id INT NOT NULL,
  subjects_id INT NOT NULL,
  FOREIGN KEY (student_id) REFERENCES students(id),
  FOREIGN KEY (subject_id) REFERENCES subjects(id)
) ENGINE = InnoDB;

At some point you may want to add in some information about the HABTM relationship such as the year that the student enrolled in the subject. This can be done using push_with_attributes and adding an extra ‘year’ column to the SQL DDL.
class Subject < ActiveRecord::Base
  has_and_belongs_to_many :students
  ...
  def enrol(student)
    students.push_with_attributes(student, :year => Time.now.year)
  end
end

Join tables with attributes tend to become ugly fast and it is rare that a few days don’t go by on the rails mailing list without someone asking for features that imply they are using join tables as a crutch for a missing domain object. Enrolment would be a good choice for the example above.

However if we introduce an Enrolment object, the naive approach of accessing the students of the subject (or vice versa) is extremely inefficient as you will first hit the enrolments table before loading from the students table (or conversly the subjects table). A more efficient way using hand crafted SQL would be the following

class Subject < ActiveRecord::Base
  has_many :enrolments
  ...
  def students
    find_by_sql(
      "SELECT students.* " +
      "FROM subjects, enrolments, students " +
 "WHERE enrolments.student_id = students.id AND + "
      "      enrolments.subject_id = #{id}"
    )
  end
end

This pattern is likely to be duplicated across many domain objects. Luckily David Heinemeier Hansson mentioned that a :through option will be supported on the the has_many macro that allows you to replace the above code with;

class Subject < ActiveRecord::Base
  has_many :enrolments
  has_many :students, :through => :enrolments
  ...
end

Rails just keeps getting better and better!

Posted in  | no comments | no trackbacks

Validations for non-ActiveRecord Model Objects

Posted by Peter Donald Fri, 02 Dec 2005 11:48:00 GMT

Rails provides support for validating form input if the form is backed by an ActiveRecord. The application I am currently working on has a form that has a large number of input parameters but is not persisted to the database. I still wanted to use the ActiveRecord Validations as they make my life easier but I did not know if there was an simple way to do this.

Initially I created a dummy table in the database with just an id field and made my model object sub-class ActiveRecord. I could then use the validations with all the fields I had defined using attr_accessor. This looked something like;

class Search < ActiveRecord::Base

  attr_accessor :user_name, :email, :locator

  validates_length_of :user_name, 
                      :within => 6..20, 
 :too_long => "pick a shorter name", 
                      :too_short => "pick a longer name"
  validates_format_of :email, 
                     :with => /^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i
  validates_numericality_of :locator
  ...
end

In my controller I created the Search object in the same way that I created all the other model objects but I never called save. Instead I called the valid? method to check whether the model passed all the validations. If the model is not valid the @search.errors object is populated with all the errors.

class NavigatorController < ApplicationController

  def search
    @search = Search.new(params[:search])
    if @search.valid?
    ...
    end
  end
  ...
end

Of course this left a bad taste in my mouth as it is a seriously ugly hack that requires an empty table in the database just to get form validation working. So I began to look at what I needed to do to implement an ActiveForm object. I was not looking forward to this task as I had read on the rails mailing list that the Validations were intermingled with ActiveRecord::Base and difficult to untangle.

This could not be further from the truth. The first thing I did was create a new ActiveForm class and include ActiveRecord::Validations. This caused a few errors as the ActiveRecord::Validations class attempts to call alias_method for methods that do not exist in ActiveForm. I implement these methods (save and update_attribute) so that they raise a NotImplementedError exception. Then I attempt to call the valid? method but it calls the new_record? method which I implement to return true. To view the errors in the view using the standard helper methods I need to implement the human_attribute_name method. These changes seem to get basic validations working.

The only validations that are not working are validates_uniqueness_of and validates_numericality_of. validates_uniqueness_of is not expected to work as it accesses the database so I just make it raise a NotImplementedError exception. validates_numericality_of does not work as it relies on a method named ”#{attr_name}_before_type_cast” for each attribute named “attr_name”. This is an artifact of the type coercion that ActiveRecord performs on input parameters. ActiveRecord will convert an input parameter from a string to an integer if the underlying database record stores the field as an integer. As this does not occur with ActiveForm I just duplicated the method and replaced ”#{attr_name}_before_type_cast” with ”#{attr_name}”.

The only functionality that ActiveForm was missing was the ability to create a model object from a hash. As ActiveForm does not need to do any type coercion this is as simple as

  def initialize(attributes = nil)
   if attributes
      attributes.each do |key,value|
        send(key.to_s + '=', value)
      end
    end
    yield self if block_given?
  end

At this stage ActiveForm is in a usable state and it took less than 20 minutes. It only took that long because I needed to restart webrick for each change (not to mention the fact that I had never looked at ActiveRecord before). Isn’t ruby/rails great?

To get this working grab the active_form.rb file and place it in the app/models directory. You can then make your model objects extend ActiveForm and use them like regular ActiveRecord objects.

I cleaned up a few warts of ActiveForm like overriding methods you should not be calling (save!, save_with_validation, create!, validate_on_create, validate_on_update). I hope to get motivated enough to send a patch that enables this style of functionality in the core once edge rails is working for me again.

Update:

It seems there is already a HowTo on the rails wiki that describes a similar technique. However rather than duplicating validates_numericality_of they handle the calls to ”#{attr_name}_before_type_cast” by implementing a method_missing method which I incorporated to cleanup my code.

Update on 12th Dec 2005

Today I decided that I needed to add reloading of ActiveForm subclasses and this is done with the following code chunk.

require 'dispatcher'
class Dispatcher
  class << self
    if ! method_defined?(:form_original_reset_application!) 
      alias :form_original_reset_application! :reset_application!
      def reset_application!
        form_original_reset_application!
        Dependencies.remove_subclasses_for(ActiveForm) if defined?(ActiveForm)
      end
    end
  end
end

Download: active_form.rb

Update on 1st March 2006

Available as a plugin at

svn: http://www.realityforge.org/svn/code/active-form/trunk

Posted in  | 3 comments | no trackbacks

Proxy to Lighttpd for Rails apps

Posted by Peter Donald Wed, 30 Nov 2005 01:12:00 GMT

Over the last few weeks I have been looking at what is required to deploy rails applications in an environment with a peak load of about 30 requests a second but with an average rate of 1 request per second.

I am running Apache and initially I was hesitant to try fastscgi due to the horror stories that I read about on the rails mailing list. I initially tried mod_ruby but it was not speedy and it consumed large chunks of memory. I assume this was due to to the ruby interpreter being added to each process.

Next I migrated to fastcgi which was not anywhere near as painful as I had heard. I followed the directions in Apache tuning for Rails and FastCGI and I had it up and running in no time. Very occasionally I am seeing fastcgi processes that are left alive after a web server reload. Apparently Lighttpd does not have this problem but I need to stick with Apache2 because of applications running on the server.

I want to try an approach that I read about on the rails mailing list that involves proxying requests to Lighttpd from Apache2 using configuration like the following. This would then allow me to run a separate Lighttpd on to handle the rail requests.


<VirtualHost *:80>
    ServerName napts.realityforge.org
    ProxyPass / http://napts.realityforge.org:8080/
    ProxyPassReverse / http://napts.realityforge.org:8080/
</VirtualHost>

I also noticed an easy way to host multiple rails applications on one hostname. Previously I had created a new hostname for each rails application ala napts.realityforge.org, iplan.realityforge.org, etc.


<VirtualHost *:80>
    ServerName www.realityforge.org
    ProxyPass /napts/ http://www.realityforge.org:8080/
    ProxyPassReverse /napts/ http://www.realityforge.org:8080/
</VirtualHost>

And then in my rails application I need to add the following to my environment.rb

ActionController::AbstractRequest.relative_url_root = "/napts"

Pretty neat! and it clears up many of my remaining issues with rails.

Posted in  | no comments | no trackbacks

Scripting Databases

Posted by Peter Donald Wed, 23 Nov 2005 00:07:00 GMT

I have always considered data modeling as the one of the most critical aspects of software development. A well designed data model can outlive the specific software product it was designed for and provide a valuable asset to the organization. In the ideal scenario the data model will evolve and adapt with the organization as requirements change.

However, as a software developer I have rarely worked in an environment where I needed a deep understanding of any particular vendors database implementation. Recently I have been developing database-centric software on a number of different platforms. It may be that I am missing something but I have yet to find a decent database centric scripting language.

Consider the following problem that I was tackling a month ago. We have a central database server and application server to access the database. Database writes MUST occur through the application server to maintain data integrity due to limitations at our software layer. We also support distributed servers that can periodically synchronize with the central server. The synchronization process requires heavy use of buisness logic to detect and resolve conflicts in synchronized data. Some data may also come from other external systems such as personal or payroll. This data needs to be cleaned and converted into our schema before being synchronized with the central server.

I was tasked with automating the synchronization from an external system to our central server. I also needed to have a test run through the synchronization prior to the real run to stop the process if synchronization would fail. This involved the following steps

  1. Import, convert and clean the data from external database into INCOMING database
  2. Backup INCOMING database
  3. Backup CENTRAL database
  4. Restore CENTRAL database into TEST database
  5. Run synchronization between INCOMING and TEST. This involves;
    1. Startup TEST application
    2. Start INCOMING application and initiate synchronization with TEST
    3. Shutdow TEST application
  6. If synchronization in previous step was successful then synchronize between INCOMING and CENTRAL servers. This involves;
    1. Start INCOMING server and initiate synchronization with CENTRAL
  7. Backup CENTRAL database

At each step along the way we need to log information about progress into another database as the process can take several hours. If an error occurs we need to inform appropriate party.

I ended up implementing this as a stored procedure in Microsoft SQL Server. This is not without it’s problems. For starters it is tied to a specific vendors database server (and possibly a specific database server version). Secondly there is a large number of ugly code hacks. To execute external processes in SQL Server you need to create a job with the command then start the job. Then I poll a system table every 10 seconds until the job has completed using GOTO’s .

If that was not bad enough, I needed to come up with a mechanism to log progress messages to a different database. My problem was that if an error occured during a number of the steps a transaction roll back was issued which reverted all the log messages. The only way I could find to get around this was to open another connection to SQL Server using the SQL-DMO COM object. The COM object only used to write log entries and as it was a different connection it would not be rolled back when the main transaction rolled back. ugly!!

These uglies occur when I was just automating the process. When you get down to the data manipulation and synchronization it gets even less appealing. The code to extract data from the external database and clean it prior to putting it into INCOMING is contained within
  • an xml document defining transformation rules
  • auxilliary SQL scripts to support non-standard rules
  • a look-up-table in another database
The code to synchronize the data between multiple applications is placed within
  • another xml document defining consistancy rules
  • custom java code to support non-standard rules

It is not a pretty sight.

Admittedly if the system was to be rewritten from scratch the whole process would probably be a lot cleaner. But even then, I was skeptical that there was a nice way to implement this. The software would need to be able to define a domain model with rules that contain both imperative/procedural (ie java or some other imperative language) and declarative elements (ie sql and some data constraint language).

Previously I had thought that the best path to tackle this problem was to use some sort of Domain Specific Language to define the declarative aspects of the data model and then define the procedural elements using a language like Java. I have used this approach with success before. I defined the static model characteristerics and data constraints in an XML document and then used Velocity to generate the java code that was enhanced with procedural elements.

Recently I have been playing with Ruby on Rails and I have been re-evaluating my position. Rails has the ActiveRecord library that allows you to define model classes (using the Active Record pattern as described by Martin Fowler). These model classes can define validations that offer a psuedo-constraint language for the data. It also offers support for defining associations, and aggregations between different active record elements and is generally a nice and easy toolkit to use to access relational data. If you need to escape to SQL for performance or conceptual reasons then that is possible with few hassles.

Even more recently I discovered migrations in rails that make it possible to incrementally modify your database schema as your application evolves. You can add or remove columns, tables, indexes etc all the while preserving and migrating data as per application requirements. To upgrade to the latest schema you need only run the “migrate” rake task and be done with it.

This makes rails or more specifically ActiveRecord a very strong contender for my toolkit of choice to script database It would make it possible to avoid vendor specific stored procedures or SQL, to a certain degree and make it much easier to develop software to manipulate schemas and data.

The only negative is that it is in ruby and I have a greater understanding of the java language. Then again maybe ruby does not require the breadth of understanding java does – ir is much simpler to just get stuff done.

Maybe ruby is the next java.

Posted in ,  | no comments | no trackbacks

Older posts: 1 2 3 4