A SQL Developer's Guide To NewSQL

Accommodating Undefined Properties

Our goal is quite simple: we don't want to update table schemas whenever a new property pops up in our ever growing data set. This is especially true when the newly found properties only exist in a very tiny portion of the whole data instances. In other words, we want the relational databases to accommodate undefined properties just like NoSQL (say MongoDB) does.

That's exactly what newsql is about. We'll explain how you can use newsql to achieve that. First let's start with "insert". Assuming you have a table "House" defined as below:

CREATE TABLE House (
  id     bigint not null auto_increment,
  rooms  smallint,
  price  int,
  addr   varchar(256),
  zip    varchar(5),
  primary key (id)
)
engine = InnoDB;

Now you want to insert a new house entry like the following:

var  data = {
        rooms: 3,
        price: 450000,
        pool: true
     };

newsql.insert('House', data, function(err, pk) {
    // if success, pk will be like
    // {id: id_number_of_the_new_entry}
});

Like a magic, the "pool" property which is not defined in the "House" table gets stored in the "House" table. What is more, you can actually query on this "pool" property:

newsql.find('House', {pool: true}, function(err, list) {
    // list should contain houses with swimming pool
});

So newsql not just help you accommodate "undefined" properties, it makes both defined and not-defined properties look the same and behave the same.

"Promote" A Property

Even though you can accommodate any properties in a table, you should still caution against the possible performance hits caused by undefined properties.

Undefined properties are serialized into a JSON object string and stored in a specific column. Querying or updating those undefined properties will require newsql to parse the JSON string and then perform the needed operations. So it's imaginable that undefined properties will cost more CPU cycles to process. If an undefined property exists in a good portion of the whole data instances (say 40%), you may consider to "promote" that property as a "defined" table column. By doing so, you'll get an instant performance boost.

To "promote" a perperty to become a table column, you can use the indexProperty() function. The function takes four parameters:

  • tableName: name of the table.

  • propName: name of the property. It will also become the name of a table column.

  • propType: this parameter defines the data type of the property. The propType parameter is a JSON object with the following three properties:

    • type: data type of a property. Possible values are 'boolean', 'integer', 'number', and 'string'. This property must be specified or error will occur.
    • format: provides additional information about the data type of a property. If the data type is 'integer', format can be 'int8', 'int16', or 'int64' and those will be mapped to 'tinyint', 'smallint' and 'bigint' respectively. If the data type is 'number', format can be 'double', 'float' or 'decimal(n,s)'. For 'string' data type, format can be 'text'.
    • maxLength: if data type is 'string', this property can be used to specify the maximum length of a string property. For example, {type: 'string', maxLength: 32} means it's a string property with length no longer than 32. That's actually what we call varchar(32) in SQL.

When you call the indexProperty() function, you do not just define a table column. newsql will also migrate property values serialized in JSON strings to the table column. So if your table grows big, it may take a while to perform the indexProperty() function.

Since you can promote a property, you can also down grade a property by calling the removeIndex() function. That is you can drop a table column and move the data back to be part of the serialized JSON. It's different from dropping a table column. When you drop a table column, you lose its data. When you call removeIndex(), you move the data stored in a table column to be part of the serialzied JSON. The data is still availble, and you can use newsql to query them.

Table Join

Table join is arguably the Archilles heel of NoSQL databases. With newsql, joining tables is simple and is not very different from what you do with relational databases. However, because you may involve "undefined" properties in your newsql commands or actions, there are a few things to note.

  1. You cannot put "undefined" properties in the conditional expression of the "ON" clause. For example,

     FROM House AS hs JOIN Schools AS sch
     ON hs.zip = sch.zip
    

    is not allowed if "zip" is not a defined property of "House" or "Schools".

  2. Because every property name could be a legal one under newsql, you'll have to prefix columns with their table name. Any column without a table prefix will be treated as properties of the main table. This semantics is different from SQL. In SQL, column names are not required to be prefixed with their table name as long as there are no ambiguities.

Conclusions

Even though newsql allows developers to save undefined properties into a table, storing data to relational databases without any planning or schema is not encouraged. Developers should define a set of "core" properties for a table and maintain it as necessary. The better use case of newsql is to deal with some rarely used properties. They do exist and you have to store them, but you don't want to pollute your table schema with those rarely used properties. With newsql, it's possible to save all data properties of every data instance while still keep table schema clean.

newsql also gives developers more flexibilities in dealing with data. You can promote and down grade data properties when you're more clear about what your data look like.

newsql tries to make both defined and not-defined data properties look the same and behave the same. With that, newsql can provide a clean, simple and flexible programming model to developers. To meet the challenges of today's web applications, newsql can help developers to get an upper hand in dealing with the ever growing data.