– we create awesome web applications

The OPEN 2010 conference was very well organized and had many interesting talks.

The side of low level, infrastructure things was presented by people from Red Hat, VMWare and IBM. The buzz words “SaaS”, “PaaS” and the likes were all over the place together with recent hot topic of virtualization.

On the application level there were interesting presentations about Django, PostgreSQL. And of course our own presentations on Ruby, Rails, NoSQL along with a longer, 2 hours “Introduction to Ruby” workshop.

The food was great too. That’s important ;).

We have 4 sessions there:

The Modern Approach

Using Cloud Computing, Web Wervices and Open-Source Tools to Build your Startup

The Modern Approach to Startups

Michael gave the keynote about the modern way to build startups:

WTF is NoSQL?

when, why, and how

WTF is NoSQL?

Vitaly gave a session about the NoSQL databases

Ruby on Rails

Web development that doesn’t hurt.

Ruby on Rails, Web Development that Doesn't Hurt

Boris gave a introduction session about Ruby on Rails

Workshop

Ruby/Rails introduction

After the lunch we also hosted a longer (2 hours) ruby and rails workshop presented by Vitaly.

Note: this blog post was updated in 2012 with screenshots and direct links to slides

Simple Object persistency library for Cassandra

Motivation & History (you can skip it :)

I was developing an multiplayer online game for a client (TBD: link when released :) and we decided to use Cassandra for performance and scaling benefits. Also the game’s internal data structures mapped very well to key-value semantics.

I did some research but couldn’t find anything that was Ready at the time to be used for development.

I did find the BigRecord project which somewhat kind of supported Cassandra, but I got an impression that Cassandra support was ‘bolted on’ after the fact, and their way of running a java Cassandra driver talking to some JRuby and interfacing with it through DRB was … can’t quite find the word for it, lets call it ‘awkward’ :).

So I started to work on my own library, “steaing” liberally from BigObject, which at least at the time was very close to ActiveRecord with just some parts of the code commented out or replaced.

Somewhere midway through the implementation I found the CassandraObject project by NZKoz. Now that was a much better one. So much better that for some time I just considered dumping all I did and using it as-is. After some thinking I decided against it though. The primary reason was that its data model was quite different from what I intended to use for the game. In CassandraObject all attributes are stored as columns in a simple ColumnFamily, and indexes and associations stored in separate ColumnFamilies. This has a benefit of not having restrictions on the number of associated object, but it does require additional DB queries to access associated objects.

I wanted to use Supercolumns instead, and store attributes and associations in different supercolumns for the given key. This has the benefit of being able to fetch all the data at once, but does restrict the number of associated objects. Since in my intended use-case the number of associations was rather small I decided to continue the development.

But, this doesn’t mean I didn’t use CassandraObject to “steal” some code from it too :). There were too many good ideas to pass by. I ended up copying a lot of code from it and throwing all of the BigRecord heritage. May be someday I’ll find a way to ‘combine’ SmallRecord back into CassandraObject. Meanwhile though I’m going to work on this one.

(Ugh, that ended up being rather long explanation :)

Status

This is work in progress. It is currently used in a client project and works well, but it still has rough edges and might have problems in your specific environment. The code base is quite small and modular though, so you should have no problems jumping in and fixing or extending it for your use case.

Data Model

This library is intended to be used with Supercolumns Families only (for now :).

Model’s attributes are stored inside “attributes” supercolumn, with attributes themselves being columns inside it. Associations are stored as separate supercolumns, with each association id being a column inside.

Example (json notation):

users: {
  "1": {
    "attributes": {
      "name": "Vitaly Kushner",
      "company": "Astrails"
    },
    "account_ids": {
      "123": "1",
      "456": "1"
    }
}

accounts: {
  "123": {
    "url": "http://astrails.com",
    "username": "vitaly",
    "password": "234987234509827345"
  },
  "456": {
    "url": "http://rubyonrails.org",
    "username": "vitaly",
    "password": "3084573945873945"
  }
}

As you can see we have a one-to-many association here. but contrary to how it is handled in ActiveRecord we don’t store the user_id in the account ‘record’, instead we store all the account_ids in the user record. This is because otherwise we would have no way of querying user.accounts except for the full accounts ‘table’ scan.

Implementation

Connecting to the DB

SmallRecord will look for the file config/small_record.yml :

production:
  adapter: cassandra
  host: 127.0.0.1
  port: 9160
  keyspace: astrails 

development:
  adapter: mock

test:
  adapter: mock

Notice the :mock adapter. This is just a simple Cassandra emulation using in-memory ruby hash. This is what I’m using for development and testing. It doesn’t require cassandra running. The emulation is not 100% off course but it does the job. And I didn’t yet have any bugs related to the difference b/w the mock and the real thing. Just remember that if you run develoment server with mock all the data will be gone once you restart. But that is probably not such a bad thing for development. Or it is. You decide.

Basics

You define your models the usual way:

class User < SmallRecord::Base
  ...
end

user = User.new :foo => "bar"
user.save
User.find(user.id)
User.first

Attributes

What is different from the ActiveRecord is that you have to tell SmallRecord about all your attributes since it can’t infer it from the database schema like ActiveModel does. There is no database schema duh! The attributes support mostly came form CassandraObject but changed quite a bit since then. One day I’ll document the differences :)

class User < SmallRecord::Base
  attribute :name
  attribute :age, :type => :integer
  attribute :create_at, :type => :time
end

ActiveModel

This library is built upon Rails’s ActiveModel pulling in many of the familiar features of the ActiveRecord. The following is supported:

Callbacks

before_save :do_something

The following callbacks are supported:

:before_init,
:after_init,
:before_find,
:after_find,
:before_save,
:after_save,
:before_create,
:after_create,
:before_update,
:after_update,
:before_destroy,
:after_destroy,
:before_validation,
:before_validation_on_create,
:before_validation_on_update

You can also define your own callbacks:

class User < SmallRecord::Base
  define_callbacks :after_activation

  after_activation :send_confirmation

  def activate!
    ...
    run_callbacks(:after_activation)
  end
end

Dirty attributes

>> user.changed
=> []
>> user.name = "foo"
=> "foo"
>> user.changed
=> ["name"]
>> user.name_changed?
=> true

When saving an object it will only save changed attributes:

>> user.save
=> true
>> user.name = "qwe"
=> "qwe"
>> user.save
  User Insert (0.000043)   insert(aa421ea0-c407-46fe-986f-09b2d749b1be, {"attributes"=>{"name"=>"\"qwe\"", "schema_version"=>"0"}}, {})
=> true

Validations

class User < SmallRecord::Base
  validates_presence_of :name
end

Associations

Association support in SmallRecord is rather basic. Only has_many is supported at the moment (feel free to add more :).

class User < SmallRecord::Base
  has_many :accounts
end

user.accounts
user.accounts.create
user.create_account
user.account_ids
user.accounts.first

SmallRecord tries hard to do the minimal required amount of work. the association is lazily loaded and only when really needed.

Migrations

Migrations are very different then what you are used to with ActiveRecord (this is too comes from CassandraObject).

You see, there might be a LOT of records in a Cassandra DB. To the point of it being quite impractical to run a full migration. Instead each ‘record’ contains its schema-version and we migrate it on read if its outdated. i.e. if we load a record into memory with schema_version that is less then the currently defined in the code we will migrate this record. If you save the record after that it will be saved with the updated version.

migrations are defined using blocks:

class User
  migrate 1 do |attrs|
    attrs[:foo] = attrs.delete(:bar)
  end
  ...
end

More

Read the code :)

TODO

There are a couple of things that I want to fix first:

  • The elephant in the room is the total lack of testing! Well, in the project I’m extracting this from the test coverage is quite high, so all the code was implicitly tested, but now that this is a separate project I need to add some specs.

  • There is this ugly read/write_data business. Apart from the bad naming (I still can’t think of a good one) all the supercolumns except for the ‘attributes’ are not managed. They are currently written directly into db on every change. Need to unify the ‘dirty’ handling in attributes with the rest of supercolumns. For that I think I’ll need to drop the Dirty mixin from the ActiveModel and just roll my own.

  • Documentation. This is also lacking at the moment and you will need to look at the code.

  • Need to research the possibility of merging with CassandraObject. Thought I’m not sure this is practical.

Copyright

Copyright (c) 2010 Astrails Ltd. See LICENSE for details.

About a week ago about 15 people were gathered in People and Computers offices thanks to Raphael Fogel.

Jerry Cheung, nice guy from Outspokes, told everyone how Outspokes was built from the inside and shared his view on building javascript intensive application with Rails as a backend.

Outspokes uses fancy javascript to allow in-browser collaboration of development/design/client teams to request changes, track progress and report problems on ongoing project. I definitely will try it out.

Boris Nadion from Astrails (that’s us in case you were wondering) told the story of our own MarkupSlicer - free to use project we wrote to simplify creating ERB/HAML layouts and partials out of HTML markup we get from our slicing team.

Vitaly Kushner, also from Astrails, made a nice intro presenation about Cassandra - our choice of NoSQL breed. We working with yet to be disclosed client on very technologically challenging project and cassandra is one of many interesting solutions we working with (You can expect a case study on this project in couple of months, as soon as it will go public).

Cassandra Intro presentation

We just started a project for a client that involves Cassandra.

If you’ve been living under a rock and don’t know what Cassandra is let me tell you :)

Cassandra is a “second-generation distributed database” that was built for web scale.

Its is one of the many distributed nosql databases that appear everywhere lately like mushrooms after a heavy rain :).

What sets Cassandra apart is that it comes from a recognizable entity - Facebook.

But I digress.

This is not meant to be a Cassandra introduction, there are enough of those on the net. I Just created a new nosql section on this blog where I’m going to post various tidbits of information about cassandra (and probably others) as I learn them while working on this new project.

Here is the first one: Cassandra gem is just an installer

If you are on Mac OSX and interested in Cassandra you probably know that its just a gem installation away (almost):

gem install cassandra

First thing to note though is that this will not install Cassandra. It will install cassandra installer!

I got bitten by this when I took my laptop with me to my doughter’s dancing class. You see, parents are not allowed “in the room” to not interfere with the process :), so I have 45 minutes to find myself something to do each time. I installed cassandra gem at home and intended to play with Cassandra while there.

Not so fast.

When I tried to run cassandra_helper cassandra which is supposed to start a test cassandra instance it went to connect to a github repository to download and install the actual database.

Duh!

and the 2nd one: Use java preferences

When I got back and finally built Cassandra I got the following message when starting it for the first time:

~ > cassandra_helper cassandra
Set the CASSANDRA_INCLUDE environment variable to use a non-default cassandra.in.sh and friends.
(in /Library/Ruby/Gems/1.8/gems/cassandra-0.5.6.2)
You need to configure your environment for Java 1.6.
If you're on OS X, just export the following environment variables:
  JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home"
  PATH="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:$PATH"

First thing to note is that just typing JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home" in the terminal wont help.

You need to export it:

export JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home"
PATH="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:$PATH"

No need to export PATH as its already exported.

But there is a better and simpler way!

Start “Java Preferences” (find it in /Applications/Utilities, or just use spotlight):

Java Preferences: Java 6 2nd

Then reorder the entries in the bottom “Java Applications” section so that Java 6 will be the 1st:

Java Preferences: Java 6 1st

Now cassandra starts right away w/o any exports:

~ > cassandra_helper cassandra
Set the CASSANDRA_INCLUDE environment variable to use a non-default cassandra.in.sh and friends.
(in /Library/Ruby/Gems/1.8/gems/cassandra-0.5.6.2)
CASSANDRA_HOME: /Users/vitaly/cassandra/server
CASSANDRA_CONF: /Library/Ruby/Gems/1.8/gems/cassandra-0.5.6.2/conf
Listening for transport dt_socket at address: 8888
...

Cool, now go write your killer application!