Writing a SQL parser

Well… not exactly writing a parser but generating one.
For the task I’ve been working on recently I needed to have (inside my java code) a data model representing database tables, so I could perform data normalization according to it.

“CREATE TABLE” sql statement is a natural fit for representing the data model. The only thing I had to decide on is parser. How do I parse SQL? Can I use something off-the-shelf? I asked myself and it turns out yes I can!
I took a look at Javacc and then at Antlr. Antlr seemed more, I would say, mature, more documentation, tutorials, sample grammars, etc.

Antlr (ANother Tool for Language Recognition)

is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

BTW, the data I was going to parse looked like this:

CREATE TABLE table_one (
    customer_number integer NOT NULL,
    address character varying(30)
);

CREATE TABLE table_two (
    id integer NOT NULL,
    city character varying(50)
);

I wrote following grammar for it:

grammar CreateTable;

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

table_list :
    (table_def)*;

table_def :
    'create' 'table' table_name table_element_list SEMICOLON;

table_name :
    (schema DOT)? table;

table_element_list :
    LEFT_PAREN table_element (COMMA table_element)* RIGHT_PAREN;

table_element :
    column_name data_type_def (column_constraint)?;

data_type_def :
    data_type lenght_constraint?;

schema : ID;

table : ID;

column_name :ID;

data_type : ID:

length_constraint :
    LEFT_PAREN NUMBER RIGHT_PAREN;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

LEFT_PAREN : '(';
RIGHT_PAREN : ')';
COMMA : ',';
SEMICOLON : ';';
DOT	:	 '.';
NUMBER 	:	(DIGIT)+;	
ID  : (('a'..'z'|'A'..'Z' | '_') ((DIGIT)*))+ ;
NEWLINE:'\r'? '\n' ;
WS : ( '\t' | ' ' | '\r' | '\n' | '\u000C' )+ 	{ $channel = HIDDEN; } ;
fragment DIGIT :   '0'..'9' ;

This is all fine but I want my parser to return data right away like:

List<Table> tables = parser.table_list();
List<Column> cols = table.getColumns();
Column col = table.getColumn("id");
col.getName();
col.getType();
col.getConstriant();
col.getLength();

So we have to made a few changes to our parser rules in the grammar.

grammar CreateTable;

@header {
import java.util.ArrayList;
import java.util.List;

}

table_list returns [List tables] 
	: {$tables = new ArrayList();}	(table_def { $tables.add($table_def.tbl); } )*  ;
	

table_def returns [Table tbl]
	: 'create' 'table' table_name 
		table_element_list {
			$tbl = new Table(String.valueOf($table_name.text), $table_element_list.cols);
		}
		SEMICOLON ;

table_name 
	:	(schema DOT)? table  ;

table_element_list returns [List cols]
	: {$cols = new ArrayList();} LEFT_PAREN te=table_element {$cols.add($te.col);} (COMMA te=table_element {$cols.add($te.col);})*   RIGHT_PAREN ;

table_element returns [Column col]
	:  column_name data_type_def (column_constraint)? {$col = new Column($column_name.cn, $data_type_def.tpe, $column_constraint.text, $data_type_def.len);};

column_name returns [String cn]
	:	ID {$cn = String.valueOf($ID.text); };

column_constraint
	:	'not' 'null';

data_type_def returns [String tpe, Integer len]
	: data_type {$tpe = $data_type.tp;} length_constraint? {$len = $length_constraint.len;} 	 
	;
	
data_type returns [String tp]
	: 
	((('character' 'varying')  | 'varchar') {$tp = "varchar";}
		| ('bit' 'varying' | 'varbit') {$tp = "varbit";}
		| ('double' 'precision' | 'float8') {$tp = "float8"; }
		| ('character' | 'char') {$tp = "char";}
		| ('integer' | 'int' | 'int4') {$tp = "integer";}
		| ID {$tp = $ID.text;}
		);
	

length_constraint returns [Integer len] 
	: LEFT_PAREN NUMBER RIGHT_PAREN {$len = Integer.valueOf($NUMBER.text);} 	;
	
schema : ID;
table :	ID;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

LEFT_PAREN : '(';
RIGHT_PAREN : ')';
COMMA : ',';
SEMICOLON : ';';
DOT	:	 '.';
NUMBER 	:	(DIGIT)+;	
ID  : (('a'..'z'|'A'..'Z' | '_') ((DIGIT)*))+ ;
NEWLINE:'\r'? '\n' ;
WS : ( '\t' | ' ' | '\r' | '\n' | '\u000C' )+ 	{ $channel = HIDDEN; } ;
fragment DIGIT :   '0'..'9' ;

That’s basically it. I have a parser that I wanted and I think it took me less than I would have spent writing my own.
There are many sample grammars. Have a look maybe there is existing one for the language you’re about to parse. :)

P.S. The exact “CREATE TABLE” SQL statement is slightly more complex that I assume in the grammar.

Emacs tip: bulk operations on buffers

When I use Emacs for my work, I usually have dozens of buffers open: every file I edited, plus different consoles, process outputs etc. Emacs allows to handle this volume easily because it doesn’t have stupid tabs which I need to scroll and click, but instead every buffer is accessible by entering a couple of chars from its name.

The main problem I encountered is that when I’m working simultaneously on several versions of the same project, I’m starting to have many similarly named buffers. I already set up uniquify plugin to have file names prepended with directory names, but the risk of changing a file from other version of project is still high.

The obvious solution to get rid of unneeded buffers is to close Emacs and open again, but for Emacs this is analogous to OS reboot—ideally, you don’t need this. The better solution is to close all files that were opened from a specific directory. What is the easiest way to do this? As I’ve learned from Emacs wiki, the Ibuffer plugin offers a great interface for such operations.

So, how I delete all buffers that has files opened from a particular location? First, I’m opening Ibuffer:

M-x ibuffer-list-buffers

A fancy colored table of buffers is shown. Now I’m marking buffers having a filename to satisfy a specified regular expression criteria: % f and then pressing D to delete them. That’s all!

dbext.vim cheat sheet

I’ve created a cheat sheet for dbext Vim plugin. Hope somebody may find it useful.
About dbext:

While editing your SQL (and without leaving Vim) you can execute database commands, run queries, display results, and view database objects. dbext understands various programming languages, and can parse and prompt the user for [host] variables and execute the resulting statement. See below for more details.

dbext-screenshot3
Download.

Today I had this nice idea…

I wanted to implement some simple persistence support for my objects that probably contain at most 5-7 fields and the total number of objects would be at most 20-30. I didn’t want to put database and ORM in place for such a small task.

So I thought I’m going to develop something like:

#persons.properties
name = John
lastName = Smith
age = 28

Where I can:

PropertiesX properties = new PropertiesX().load("persons.properties");
Person person = properties.mapTo(Person.class);

My Person class:

@PropertyObject
public class Person {
	private String name;
	private String lastName;
	private int age;
	
	@PropertyMethod
	public void setName(@Property(key="name") String name) {
		this.name = name;
	}
	......
}

Or if I want to get all objects I would do something like:

#persons.properties

# all person's ids
persons = 0, 1, 2

person.0.name = James
person.0.lastName = Smith
person.0.age = 28

person.1.name = John
person.1.lastName = Smith
person.1.age = 23

person.2.name = Joe
person.2.lastName = Doe
person.2.age = 34

And java code would look like

// where second argument is the property prefix and the third is the array of person ids from the property file.
Person[] persons = properties.mapToArray(Person.class, "person", "persons");

//or I could do something like the following where ids is arbitrary list of ids :
Person[] persons = properties.mapToArray(Person.class, "person", ids);

Then I realized that property file with 30 objects might look a kind of ugly and googled for “Java JSON”.
And then I lost fun I was going to have for today. I found this – a simple library to serialize objects to Xml or JSON and back again.

Fun with parentheses

Oh… I almost forgot what fun it is to learn functional programming. During one of the semesters in university, we took “Functional programming” course. It was a new world of something that I hadn’t seen before, so unusual, so charming.

That new world of lists and operations on them was something “weird” because it wasn’t similar to what I had known before. After that semester in the university I almost haven’t touched Lisp nor any other function language until recently.

The subject of functional programming appeared few times on my horizon a couple of weeks ago. I decided to recap what I knew almost ten years ago and learn something new.

If you want to quickly grasp what functional programming is I would recommend “The little Schemer”.
This book is only about two hundred pages which is good and has lots’ of blank space. :) However it is very valuable book, written in an interesting way of questions on the left side of a page and answers on the right. The book is full of small tasks that you solve as you read and if you can’t don’t worry there is an answer on the next page. That’s a fun book indeed.

 

For those who has read “The little Schemer” there are “The Seasoned Schemer” and “The Reasoned Schemer” which I haven’t done yet. These books give more insight into the language and cover more advanced topics. They are written in the same manner of series of questions and answers which I have much fun with solving as I’m reading first book. I’m sure I’m going to get these books soon.

 

The last one that I’d like to mention is a book that available online, and I would say, the most robust book about functional programming and Scheme, I know of is “Structure and Interpretation of Computer Programs”.

It teaches not so Scheme language but programming itself using Scheme as a language to express “pure” ideas.

“The Structure and Interpretation of Computer Programs” is the entry-level subject in computer science at the Massachusetts Institute of Technology. It is required of all students at MIT who major in electrical engineering or in computer science, as one-fourth of the “common core curriculum,” which also includes two subjects on circuits and linear systems and a subject on the design of digital systems. We have been involved in the development of this subject since 1978, and we have taught this material in its present form since the fall of 1980 to between 600 and 700 students each year. Most of these students have had little or no prior formal training in computation, although many have played with computers a bit and a few have had extensive programming or hardware-design experience.

If you already know a couple of object oriented programming languages and would like to learn something new I would suggest to learn one of Lisp’s dialects. It’s mind opening experience and it’s fun. :)

Thanks to Mikhail for pointing out those books to me.

Movies about software developers and entrepreneurs, Pt. 2

Preface

Watching how the current economy crises brings the NASDAQ down (it is fluctuating around 1550 as I write these words), the Dot-com bubble crises comes to mind. We have a good opportunity to look back on these times by watching movies about dot-com start-ups. In my bucket I’ve got these: the recently released fiction movie August, and two dot-com-times documentaries: e-Dreams and Startup.com.

August

Fiction pseudo-documentary movies always have a drawback of over-emphasizing presentation side and blowing out facts. August is no exception. Here we have glamorous interiors of a Landshark “web-something” company packed with funny-looking Macs of 2001, over-emotional actors’ play, and lack of understanding what the heck the Landshark’s business is really about. 

Josh Harnett playing Tom Sterling, the Landshark’s co-founder, tries hard to perform like Tom Cruise playing Steve Jobs, but he really lacks charisma and passion that have both of them. The only rousing moment for me was Tom’s speech on “eSymposium” event where he says the following: “Are we making the world a less sucky place or more sucky? How do we every day impacting the suckage?” By the way, a good question for everybody in the IT industry to ask themselves.

Two things that are interesting in this movie besides the speech. First: it emphasizes the difference in a behavior of founders and employees in company’s hard times. Founders are struggling, employees are thinking about an escape. Second: relatives and family comprise an essential part of entrepreneurs and developers life. This point is often missing in blogs, which speak more about technologies and “cool stuff”.

e-Dreams

This is a documentary, so here the company’s business model is explained clearly. Kozmo.com performs a 1-hour delivery of books, CDs, etc. after a purchase in an Internet-shop. The coolness of the idea flows down from founders onto all employees and drives them up. This helps the company to gain a momentum and spin up.

As a programmer, I found really boring that the whole movie is about money. The only justification for this is that Kozmo.com really raised lots of money — more than $250 million, and it’s even without an IPO!

Also, the movie lacks material about families or private life of co-founders. We only see the mother of Joseph Park (co-founder) telling how proud she is of her son. But the theme of founders vs. employees is traced in e-Dreams maybe even better than in August. When the company perceives problems with cash, interviews with delivery bikers are interleaved with shootings of top-management meeting. This a really embarrassing moment.

Startup.com

Had I to advice a single movie to watch, I would choose this one. Startup.com achieves a good balance between being spectacular and delivering facts. Telling the story of govWorks.com startup, it covers all-important problems like searching for investors (as it is observed by one of participants, “VC isn’t a synonym for charity”), work vs. life disbalance in founders’ life, competitors activities, product release, etc.

To some extent, Startup.com movie is a real-world, “proper” version of August. What lacks coverage in the movie is the theme of employees. There’s almost no participation of them on the scene. E.g. close to the govWorks.com collapse, we only see short notes about company’s employees headcount, and they go down from 233 to 50 almost in a blink of an eye.

Conclusion

Personally, I enjoyed watching these movies. If you like startup success stories told by Paul Graham and Joel Spolsky, you can also be interested in watching complementary videos about startup failures.

Don’t let Xml be your API

It’s interesting to see how many hours java developers spend these days writing Xmls. I can hardly imagine writing java server-side applications without a single xml document nowadays. All those configuration files, deployment descriptors, schema files, mappings, etc. Has anybody tried to count those hours?
The things are getting worse when you work within SOA. The number of xml-based specifications and standards to deal with is just tremendous. When too much goes to Xml you start loosing some neat features that you already might be accustomed to like autocomplete/codesense or re-factorings in your favorite IDE. Some IDEs solve that problem providing support for widely used frameworks and tools like Spring.
But things are getting scattered all over the place, now you have to look into a bunch of files to figure out what the code does and what the all dependencies are. It’s not only Java anymore.

For the last few years I’ve been mostly working with SOA, writing back-end logic and providing SOAP interface to it. The most common approach these days to write web-services, I think, is using frameworks like Axis. It allows you to develop services from top-down or bottom-up. First one implies that you have a WSDL document for which you write/generate your skeletons classes and then wire them with the business logic. Second one goes from your classes to the WSDL. The framework hides all the web-services interaction and communication details. And you basically operate in RPC-style model.

In my two previous projects we decided to go another way. We chose document-oriented type of interaction between our web-services where the document in SOAP represents a complete unit of information and may be completely self-describing.
We’ve been using Soatoolkits (Alex: this is a plug for you) which is really great tool for rapid development that supports JSR 181, XPath 2.0 and provides some other neat features.

I can turn my java class into a web-service very quickly:

@WebMethod
public void recover(@XmlParam(xpath = "//Email/text()") final String email,
         @XmlParam(xpath = "//Answer/text()") final String securityQuestionAnswer,
         @XmlParam(xpath = BODY_XP) final Xml xmlBody) throws Fault {
    if (xmlBody != null) {
        XmlValidator.validateSenderInput(xmlBody, GET_SEC_QUESTION_VALID_SCHM);

All three parameters in the method signature get their values assigned in runtime when the message comes in to the service.
The third param xmlBody is an Xml class instance that contains following xml:

<soap-env:Body>
 <Email>user@gmail.com</Email>
 <Answer>yellow</Answer>
</soap-env:Body> 

If you want that method to call directly from API you certainly can do that just passing null for xmlBody as it is used in that method only for incoming SOAP validation according to our Schematron rules.

But… here things can go wrong. Some people started using it like:

@WebMethod
public String getWhatImLookingFor(@XmlParam(xpath = BODY_XP) Xml xmlBody) throws Fault {
    try {
        return whatImLookingForManger.getTheThing(xmlBody);

Thus an xml in SOAP body goes all the way down to the business logic where it is parsed and required values are extracted from the xml.
That is totally wrong to me. Now your business object’s interface is an xml document that I have no idea about. What structure of that xml is, what elements are required, what types of those fields are and so on and so forth until I go deeply inside that object and try to figure out based on Xpath statements what the xml looks like. And I’m not talking about testing and how odd it is when you have to create that xml in first place and then pass it to the method that needs just two values from it. You loose to much and gain too little passing an xml around in your classes.

So what’s the solution?

Try to make your services the only one place that deals with Xml directly. Parse messages in the services, get parameters you need in native java data-types or map Xml directly to an abstract data structure and pass it to your internal business logic. Now you have clear API, you have type safety, basically you have created a contract. Another benefit is that your code is testable. When in a couple of months you’ll get back to your or someone else’s code you can see the contract clearly, all the parameters and types. Hopefully, you won’t be pulling hairs out of your head trying to figure out what xml document is expected in the methods and classes.