Saturday, August 22, 2009

What is QEP?

In the context of database programming, QEP is an acronym for Query Execution Plan.

The database server analyzes every SQL query and plans how to use indexes and order tables to produce the result in the most efficient way.

You can get a report of the QEP for a SELECT query using the EXPLAIN command in MySQL. This is an important tool to analyze your SQL queries and detect inefficiencies.

Consultant Ronald Bradford uses the acronym QEP freely, but this term is not ubiquitous and doesn't appear in the MySQL documentation. Googling for QEP yields some confusing results:

Remember: if you are using an uncommon acronym in your writing, define it when you use it. Even if you have used the acronym in past blog posts, define it again in a new blog post. Or at least link to the original post where you defined it.

Thursday, June 18, 2009

Free Software vs. Gratis Software

A lot of folks are unclear on the subtleties of free software and open source. Mike Hogan writes a blog article"Is Hybrid Licensing of OSS Hypocrisy?" to try to shed some light on this. With respect, I think he has missed part of it.

We're talking about two orthogonal things here. One is open-source versus closed-source, and the other is whether we charge money for software licenses or not. As Mike points out, the former is about development methodology. The latter is about business model.

There ought to be nothing wrong with charging money for open-source software. In fact, there isn't -- even the GPL permits this. This is related to the origin of open-source in the Free Software movement. "Free software" to them is about what the user is licensed to do with that software, not whether they paid anything to license it. In fact, some companies package free software and charge for it (presumably with some added value).

The marketing interpretation of "free software" is the "free chips & salsa" concept Mike mentions. It's a way of encouraging the market to adopt this product. It's a loss leader, usually followed by upselling the customer with other products or services.

There's also a case for no-charge versions of closed-source software. Typically these have either limited features ("crippleware") or they expire after a limited time (e.g. "demo").

Few companies have found a way to use open source methods to develop full products they then charge money for. But this could simply be because it's hard to drive adoption of any software product, regardless of whether it's open-source or closed-source.

On the issue of hybrid licensing, I see this as no hypocrisy; I see it as more freedom. If I develop some code, offer it under the GPL license, and you use my code as part of your project, then you are obligated to license your project with a GPL-compatible license. This is termed the "viral" nature of the GPL license, and it's clearly intended to promote the free software movement.

What if you don't want to adopt a GPL-compatible license for your project? Well, no one is forcing you to use my code. But my code is really amazingly good, and you want it. You want it so much that you're willing to give me money if I grant you a license to use it under different terms. If I'm willing to do that, now you have more freedom -- you can choose to contribute your own code to the body of free software in the world, or you can choose not to. But the latter choice may have a different price tag associated with it.

This should still promote the principles of the free software movement. It would be wrong to charge someone for their freedom. But in the hybrid license model, you can avoid paying for a license simply by joining the movement, by spreading the freedom. If you want to stick to your closed-source model, you can pay for the privilege of using my code in that way.

I don't see any hypocrisy in a software maker using a hybrid licensing model, as long as they are consistent and honest about it.

Thursday, June 04, 2009

I'm Speaking on SQL at OSCON

OSCON 2009

Early Registration has been extended to June 23. Save up to $250!

Enter my friends-of-speaker discount code "os09fos" when you register, and save an additional 20%! Just because you read my blog.

Practical Object-Oriented Models in SQL

Wednesday July 22, 5:20pm.

SQL is from Mars, Objects are from Venus.

This talk is for software developers who know SQL but are stuck trying to implement common object-oriented structures in an SQL database. Mimicking polymorphism, extensibility, and hierarchical data in the relational database paradigm can be confusing and awkward, but they don't have to be.
  • Polymorphism: Suppose your blog supports comments, but then your comments need to reference multiple types of content, for example news, blog articles, and videos. What then?
  • Extensibility: We’ve all designed customizable software, allowing customers to extend a data model with new data attributes. See how to design flexible systems, while using efficient SQL queries.
  • Hierarchies: Tree-structured data relationships are common, but working with trees in SQL usually implies recursive queries. There are a few solutions to solve this more cleanly.
  • ActiveRecord Dos and Don'ts: Web development frameworks have popularized the use of design patterns, but when it comes to multi-table queries, complex views, and assignment of OO responsibilities, ActiveRecord falls short as a one-size-fits-all Domain Model.

BoF: Meet Authors from Pragmatic Bookshelf

Wednesday July 22, 7:00pm

Gather with published and upcoming authors of programming books from the industry favorite publisher, Pragmatic Bookshelf. Join this informal chat about programming, writing books, job hunting, and career development.

  • Author introductions, books, OSCON presentations.
  • Experiences working with a publisher.
  • How does authoring a book aid a tech career?
  • What tech books would you like to see?

Pragmatic Bookshelf authors attending OSCON include:

  • Ian Dees is presenting “Testing iPhone Apps with Ruby and Cucumber” at OSCON (Wednesday 10:45am). Ian authored the book “Scripted GUI Testing with Ruby.”
  • Bill Karwin is presenting “Practical Object-Oriented Models in SQL” at OSCON (Wednesday 5:20pm). Bill is currently writing a book “SQL Antipatterns.”
  • Other Prag authors are attending OSCON, and plan to be at this BoF.

Thursday, May 21, 2009


The photo above illustrates (by counter-example) an important characteristic of a normalized database: each logical "type" of attribute belongs in a separate column.

Just because three values happen to be numeric doesn't mean it makes sense to SUM() them together. But if dissimilar attributes are stored in the same column, it's tempting to treat them as compatible in this way.

This also shows a fallacy of the Entity-Attribute-Value antipattern. In this design, all attribute values are stored in a single column.

CREATE TABLE EntityAttributeValue (
entity        VARCHAR(20) NOT NULL,
attribute     VARCHAR(20) NOT NULL,
value         VARCHAR(1000) NOT NULL,
PRIMARY KEY (entity, attribute)

INSERT INTO EntityAttributeValue (entity, attribute, value)
('New Cuyama', 'Population',          '562'),
('New Cuyama', 'Ft. above sea level', '2150'),
('New Cuyama', 'Established',         '1951'),

SELECT SUM(value) FROM EntityAttributeValue
WHERE entity = 'New Cuyama';

The Entity-Attribute-Value design does not support or conform to rules of database normalization.

To be clear, the proper way to design a database is to put different attributes in different columns. Use column names, not strings, to identify the attributes.

 city_id          SERIAL PRIMARY KEY,
 city_name        VARCHAR(100) NOT NULL,
 population       INT UNSIGNED NOT NULL,

Monday, April 20, 2009

SQL Antipatterns Strike Back! Slides

I presented my tutorial at the MySQL Conference & Expo today. I have fun preparing it and presenting it, and I got many good questions and comments from the audience. Thanks to everyone for coming and participating!

I have uploaded my slides with a Creative Common 3.0 license to my SlideShare account:

For those who did not get to see my tutorial, I'm presenting some selections from it during a 45-minute session at the MySQL Camp on Wednesday at 2:00pm, under the title "Practical Object-Oriented Models in SQL."

See you next year!

Oracle buying Sun

Stunning news today that Oracle has offered to buy Sun Microsystems. This is sending the MySQL community reeling, as they begin their MySQL Conference & Expo today. Everyone's talking about whether this change is good for MySQL.

My first thought is: it's not over till it's over. These deals have a way of falling through at the last minute. Just look at Microsoft's attempts to acquire Yahoo!. I'm not saying that it will fall through in this case. Just don't count it as a done deal until the agreements are signed, and the shareholders and the SEC have their say.

My second thought is: it depends on how well the two companies can integrate. In any acquisition, there's a merger not only of assets but of goals, strategies, and corporate culture. Not to mention people. People matter.

I worked for InterBase during part of the 1990's. InterBase was an RDBMS that was acquired by Borland in 1991, as part of their acquisition of Ashton-Tate. Borland was very interested in Ashton-Tate for its dBase product, but Borland also got InterBase in the deal (InterBase had been acquired only a couple of months before AT's merger with Borland). InterBase wasn't in Borland's strategy and it wasn't what they valued as part of the acquisition. As a result, it was an unwanted step-child for over ten years (despite having a revenue matching Borland C++Builder).

What does this tell us about Oracle's plans for MySQL? Nothing for certain. My point is that it depends on what Oracle values as part of the acquisition. Is it Java? Is it the line of enterprise hardware? Is it XFS or OpenSolaris or NetBeans or Glassfish? Any of these are likely candidates. But MySQL does not jump to the head of the list as the likeliest "jewel in the crown" that motivated Oracle to make this offer.

Tuesday, April 07, 2009

I hate IBM System i

In his blog, Vadim reports that a new storage pluggable engine for MySQL has appeared in the source tree, to support IBM DB2 for i as a back-end.

This reminds me that I hate the IBM System i platform (aka IBM Power Systems, aka iSeries, aka AS/400).

Don't get me wrong -- I'm sure it's terrific technology. I'm sure IBM supports many businesses with it and they're happy (although I do wonder why they need to keep re-branding the product line). But my fate is not aligned favorably with respect to System i.

At several companies I have worked for, the business development people struck an ill-conceived deal with IBM, to "support System i." Meetings were had. Agreements were signed. Commissions were paid.

Then it came time to do the work and fulfill the partner agreement. At my last job, my manager came to me and said, "by the way, in your spare time, make sure your work-in-progress supports the IBM platform."

I know nothing about the IBM System i. I have never seen one face-to-face. I have never seen any documentation for it. I enjoy command-line interfaces, but using the System i made MPE/ix seem friendly.

By the way, in spite of the "Universal" moniker, DB2 on the System i is, as far as I can tell, a completely different database implementation, with the brand name "DB2" tacked on as an afterthought.

Here are some suggestion for the System i business development folks at IBM: when you make a deal with small companies to support your platform, make sure they have enough machines to do development and testing. Include electronic documentation so everyone can have access to it. Perhaps even offer some training as part of the deal. And then ask your new partner for a project plan that details such things as:
  • Which products they promise to support on the System i.
  • When they promise to do the work and have the solution ready.
  • Who they will assign to do the work, not ask to do it in their spare time.

Sunday, March 29, 2009

Virtually Speaking

The word virtually is overused. In many cases using the word virtually simply means not. For example:
I have finished virtually all of my homework.

This new surgical procedure is virtually pain-free.
In Modern American Usage, Bryan A. Garner counts virtually as a weasel-word. Weasel-words are so named because of the habit of weasels to attack birds' nests, and eat their eggs by sucking the meat out, leaving an empty shell." Likewise, words such as virtually "have the effect of rendering uncertain or hollow the statements in which they appear."

So be careful using virtually, or other words that similarly diminish the words around them. Other weasel-words commonly used by writers today include significantly, obviously, very, and quite.

Wednesday, March 25, 2009

Hello EclipseCon 2009

No, I am not attending EclipseCon -- but my smiling face apparently was on Tuesday. StackOverflow founder and CodingHorror blogger Jeff Atwood emailed me to let me know he displayed my StackOverflow user profile page during his keynote at EclipseCon.

I don't know what the context was in which he showed my profile. Maybe he just needed an example of an SQL geek who has too much time on his hands.

I hope a video of the keynote will be made available. If I do find one, I'll link to it in this blog.

Thursday, March 19, 2009

Parrot Web Framework?

Wondering if the following idea could be feasible:
  • Architect a web framework that emphasizes Inversion of Control.
  • Implement core web framework in Parrot (now that this dynamic language platform has released its 1.0).
  • Voila! A web framework that supports any language implemented for Parrot platform.
  • Developers write plugins in any language: Python, Ruby, PHP, Perl6, Lua, C, or any other language supported on Parrot.
Although the Parrot platform is now 1.0, specific language implementations are still in various stages of development. Realistically, I would guess that it'll be some years before the architecture above is ready for production.

Wednesday, March 11, 2009

How do the Proxy, Decorator, Adaptor, and Bridge Patterns differ?

A user recently asked:
I was looking at the Proxy Pattern, and to me it seems an awful lot like the Decorator, Adaptor, and Bridge Patterns. Am I misunderstanding something? What's the difference? Why would I use the proxy pattern veses the others? How have you used them in the past in real world projects?
Proxy, Decorator, Adapter, and Bridge are all variations on "wrapping" a class. But their uses are different.
  • Proxy could be used when you want to lazy-instantiate an object, or hide the fact that you're calling a remote service, or control access to the object.
  • Decorator is also called "Smart Proxy." This is used when you want to add functionality to an object, but not by extending that object's type. This allows you to do so at runtime.
  • Adapter is used when you have an abstract interface, and you want to map that interface to another object which has similar functional role, but a different interface.
  • Bridge is very similar to Adapter, but we call it Bridge when you define both the abstract interface and the underlying implementation. I.e. you're not adapting to some legacy or third-party code, you're the designer of all the code but you need to be able to swap out different implementations.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "How do the Proxy, Decorator, Adaptor, and Bridge Patterns differ?"

Tuesday, March 10, 2009

Can I Use Example Code from Internet Q&A Sites?

A user recently asked:

  • A developer is working on a project and encounters a problem.
  • They ask a question on the internet somewhere (ie
  • Someone answers their question and provides a nice code snippet that just about does what they want.
Where does one legally stand if the developer includes the code verbatim in their project's code?

I know I've done this before...and I'm sure others have too...but I'd really like to know what the legal or ethical answer is to this question.

Note: never make business decisions based on legal advice you read on the internet -- including mine! Confirm this with a qualified legal professional.

StackOverflow seems to offer its content under the Creative Commons Attribution-Share Alike 2.5 license. See the logo and link at the bottom of the page.

This means you are free to copy, distribute, and transmit any content you see here, or to remix or adapt the content.

However, you must attribute the work in the manner specified by the author (though I can't find where this is specified on StackOverflow), and if you alter or build upon the content, you may distribute it only under a compatible license. So it's similar to LGPL, in that respect.

I would recommend that you do not copy code or other content from StackOverflow verbatim, if you need to use it in a commercial project.

  • Should one use the example code and move on?

    No. You need to be aware of the license and comply with it, or in theory you could get your project into trouble. Realistically, this is very unlikely. But it is still possible to cause problems.

  • Should one use the example code and provide a comment referencing its origin?

    Yes, attribution is required by the Creative Commons license used by StackOverflow.

  • Should one inform the provider of the example code that they've used their code?

    You are not obligated to do so, so it would be a courtesy and that's up to you.

  • Should one not use the example code at all and use the basic idea to create your own code?

    Yes, this would be a conservative and safe policy. Plus, it's always better to understand how the code works, which you would have to in order to write code with similar functionality.

  • Is it ok to use said example code in proprietary closed-source projects?

    No. The license would require you to make your project available under the same Creative Commons Share Alike (or compatible) license.

  • Is it ok to use said example code in proprietary but internal-use-only projects?

    Technically, yes, because like GPL & LGPL, the licensing requirement only activates when you distribute a derived work. But how can you be sure that code from the internal-use-only project will never be duplicated into a product that gets distributed? Do you plan to annotate code fragments within your internal projects? I would not recommend relying on this.

  • Are the legal implications currently undefined?

    No, they are spelled out in the Creative Commons license mentioned above.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "What legal issues can I run into if I use example code (say from stackoverflow) in my projects?"

Quantity Over Quality

Alex Netkachov recently reported a list of micro-optimizations for PHP. Several other bloggers (Sebastian, Maggie, Pádraic) responded with appropriate messages, reminding people that proper application design usually counts more than micro-optimizations.

They are all correct.

When I was an intern, I emailed a C compiler developer, to ask a question that had occurred to me regarding optimization: which is faster, ++i or i++? Assuming either form will work in my case, as in the increment expression in a for() loop. His response (paraphrased):
"By emailing me this question, you have already wasted more computing resources than you will ever save by choosing one form over the other during your entire programming career."
(I'm still not sure if he meant to emphasize that the performance difference between the two expressions was extremely small, or that he didn't think very highly of my career prospects. I'll prefer to assume the former.)

A list of performance factoids like those listed by Alex are missing the guidance and wisdom that software developers need to judge their importance. All of the responses from other bloggers are similarly qualitative, instead of quantitative.

I know that it's hard to make quantitative statements with regards to optimization.
  • How much benefit can I get by replacing print with echo? It depends on how much printing you do in a given application -- and also what else you're doing in that application.
  • Can I benefit from caching page output or results of resource-consuming functions? Probably, but not if the content is 100% dynamic and must be re-calculated for each request.
  • Which of these micro-optimizations should I employ with greater priority? Which is the best use of my development time?
These micro-optimization tips are interesting and worth knowing, but they should also be taken with a grain of salt. Their importance varies, depending on the nature of your application. There are no magic words that are guaranteed to double performance in every application.

Finding the best way to optimize your code is your job as a software developer. You must use scientific measurement, as well as good judgment, experience, and intuition to get your job done most effectively.

Thursday, March 05, 2009

Accepting a job that failed The Joel Test

A user recently asked:
I'm about to accept a job offer for a company that has failed The Joel Test with flying colors.

Now, my question is how do I improve the conditions there. I am positive that within a few months I will be able to make a difference.

But where do I start? And how?

Don't view yourself as the "new sheriff in town" who's here to clean it all up in one year. The habits they have settled into have been a long time forming.

Watch and listen, and ask questions about the most severe and recurring pain points. Find out what bad habits have actually caused loss of work, late nights, quality problems, or lost customers. Try to quantify the cost of these bad habits.

Then at some point talk to your boss in a one-on-one meeting and make a proposal for how you could mitigate one specific risk that seems to be their biggest problem. It could be almost anything on the Joel Test, but I'd guess it's most likely to be one of:

  • No source code control means the code is a mess, with lots of "commented out" sections. Can't track which code changes were made for a given bug. It's hard to do major features in parallel with ongoing maintenance. No way to roll back changes. No way to track which developer did what changes.

  • No build process means some code changes exist only on the live server. Developers are constantly pushing and pulling code to and from the live server. No one has a development environment that's in sync with the live code, so it's hard to reproduce bugs.

  • No bug database means some tasks "fall through the cracks" from time to time. Customers report bugs that fall into a black hole. Managers don't know what's being worked on. Employees have no record of their work when it comes time for annual reviews.

When presenting the solutions, don't try to justify them with abstract concepts like "best practices" or "it's the industry standard way" or anything so intangible. If those were enough to motivate this company, they would have done it by now.

Instead, focus on what is their deciding factor. I'd guess it's probably related to how much time and money it costs the business to use best practices, versus how much it can save them. But you should find out if this is really their reason. It'll take some setup work to establish these tools and practices, but you can explain the recurring benefits for quality, productivity, and predictability of the development work. All those can contribute to the business' bottom line.

In one year, you'll be doing extremely well if you can make just one change to help them. It'll take a lot of patience to overcome a development culture that has been building for so long. Keep in mind that the rest of the team isn't there by coincidence -- they may actually be compatible with that level of disorganization.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "Accepting a job that failed The Joel Test"

Wednesday, February 25, 2009

Unit Test Coverage

S.Lott writes in his blog about unit test code-coverage: how much is enough?

Effective tests should account not only for code paths, but also input values and other application state or external environment that may affect the behavior.

For example, it may be easy to get 100% code coverage from tests for a function like the following:
divide(x, y) { return x/y; }
But unless you test for division-by-zero (when the parameter y is zero), you haven't tested sufficiently.

The code-coverage metric doesn't reveal when you've tested a good variety of input values. It only tests if your tests have visited the given lines of code, not what values were in each variable at the time.

Likewise for other application state besides input parameters. Values in other application objects, the contents of databases or files, or the operating system environment can all affect the proper functioning of a class or function that you're testing. These variations are not measured by code-coverage metrics.

It could be argued that if you're testing for external state, you aren't doing unit testing by its strict definition; you're doing functional or system testing. Nevertheless, most people rely chiefly on unit testing tools, because automated unit testing tools that generate code-coverage metrics are pretty easy to use.

While it's a worthwhile goal to try to get high code-coverage in your unit-testing, a score of 100% doesn't guarantee that you've tested enough. Likewise, a score below 100% isn't necessarily an indication of inadequate testing. Code-coverage is therefore not a goal in itself; it's one way of measuring one type of testing.

Friday, February 06, 2009

How Do You Reward Good Clients?

A user recently asked:

I find when I get a 'good client' things go so much smoother on a project (there even seems to be less bugs - weird?). I have a habit of rewarding good behavior from anyone (even if its just a simple thank you).

I am interested to know what sort of things you guys do, and even how you feel about good client behavior.

It would be nice if "good clients" were simply "normal clients," and bad clients were those you avoid working for.

If you want to give them a more explicit and material reward, you could give them a "good customer" discount on the next project. My invoices specify a penalty for late payment, so I suppose you could also offer a small discount (2%) for extra-prompt payment.

Probably the best reward is to give them your best value for their dollar, and continue doing business with them. Work with them with equal respect, communication, and commitment to quality and value. Plus occasional free support, or advice on projects you're not actually working on for them, etc.

It goes both ways. They know when they've got a good contractor who delivers quality work on time and charges fairly. They want to continue the relationship when they get good results. So a good customer won't quibble about nickel-and-dime line items on invoices, pester you about delivery dates, or question your technology choices. They'll accept your rates as economical if you actually give them good work, instead of skipping to another consultant who charges less but does poor work. They'll also refer other clients your way (and probably not the ones they know will be annoying for you).

Really bad clients will get a bit more of a cold shoulder:

  • "I'd really like to bid on your new project but I'm swamped with other work this quarter."
  • "That'll be a rush order so I'm going to have to charge you a premium."
  • "Sorry, I was traveling for a few days and didn't get your RFP."

Unfortunately, bad clients are probably least able to read between the lines when they get this sort of message. It's much easier for them to believe that all contractors are difficult to work with, than to accept that it's they who are difficult.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "How do you reward your clients for good behavior?"

Wednesday, January 21, 2009

Splitting a String in Perl

A user recently asked:

How do I take a string in Perl and split it up into an array with entries two characters long each?

Ultimately I want to turn something like this


into and array containing

F5 3C BB A4 76
This was my answer:
@array = ( $string =~ m/../g );
The pattern-matching operator behaves in a special way in a list context in Perl. It processes the operation iteratively, matching the pattern against the remainder of the text after the previous match. Then the list is formed from all the text that matched during each application of the pattern-matching.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "How can I split a string into chunks of two characters each in Perl?"

Understanding Unfamiliar Databases

A user recently asked:

What kind of approaches and techniques can you employ to become familiar with an existing database if you are tasked with supporting and/or modifying it? How can you easily and effectively ramp up your knowledge of a database you have never seen before?
Here was my reply:

  • The first thing I do is create an Entity-Relationship Diagram (ERD). Sometimes you can simply describe the metadata with command-line tools but to save time there are some tools that can generate a diagram automatically.

  • Second, examine each table and column make sure I learn the meaning of what it stores.

  • Third, examine each relationship and make sure I understand how the tables relate to one another.

  • Fourth, read any views or triggers to understand custom data integrity enforcement or cascading operations.

  • Fifth, read any stored procedures. Also read SQL access privileges if there are such.

  • Sixth, read through parts of the application code that use the database. That's where some additional business rules and data integrity rules are enforced.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "What are the Best Ways to Understand an Unfamiliar Database?"

Why Should You Use an ORM?

A user recently asked for good arguments in favor of using Object/Relational Mapping technology:

If you were to motivate [sic] the "pro's" of why you would use an ORM to management/client, what would the reasons be?

Try and keep one reason per answer so that we can see what gets voted up as the best reasons.

I offered four answers. The first three got the most votes, but my last answer got little interest.

  1. Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.
  2. Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.
  3. Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.
  4. Generating boilerplate code for basic CRUD operations (Create, Read, Update, Delete). Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.
There are lots of other reasons for and against using ORM frameworks. Generally, I'm not a fan of ORM's, because their benefits don't seem to make up for their complexity and tendency to perform slowly. Their chief value is in reducing the time taken in repetitive development tasks.

Hibernate, for example, is about 800,000 lines of code (Java and XML), but it's complex enough that I doubt it's easier to learn or to use than SQL. Besides, there seem to be fundamental tasks, such as a simple JOIN that are impossible to do through the entity interface. Please correct me if I'm wrong, but I've been searching tutorials and examples and I haven't found a way to fetch a joined result set from two entities, without writing a custom query in HQL (Hibernate's abstract version of SQL).

I was also led to a blog by Glenn Block, titled "Ten advantages of an ORM (Object Relational Mapper)." I disagree with Block on several points. He cites some traits of ORMs as advantages where I see them as defects. He also cites features that are not specific to ORMs; they could be achieved with any type of data access library.

update: Upon request, here are some specific comments on Glenn Block's list of advantages of an ORM:

1. Facilitates implementing the Domain Model pattern

Not necessarily. I can design Domain Model classes containing plain SQL as easily as I can design classes that operate on the database via an ORM layer. Keep in mind that ActiveRecord is not a Domain Model.

2. Huge reduction in code.

Depends. When executing simple CRUD operations against a single table, yes. When executing complex queries, most ORM implementations fail spectacularly compared to the simplicity of using SQL queries.

3. Changes to the object model are made in one place.
This is not a benefit of an ORM. Many people use ORM interfaces inexpertly, so when the database structure changes, they still have to update many places in their application to reflect the change. But instead of redesigning SQL queries, they have to redesign usage of the ORM. There is no net win. They could structure their application using plain SQL queries and still be as likely to achieve the benefit of DRY.
4. Rich query capability.
Absolutely wrong.
5. You can navigate object relationships transparently.
This is definitely a negative rather than a positive. When you want a result set to include rows from dependent tables, do a JOIN. Doing the "lazy-load" approach, executing additional SQL queries internally when you reference columns of related tables, is usually less efficient. Leaving it up to the ORM internals deprives you of the opportunity to decide which solution is better.
6. Data loads are completely configurable ...
This is not a benefit of an ORM. It is actually easier to achieve this using plain SQL.

7. Concurrency support.

Again, not a benefit of an ORM.

8. Cache managment.
This has nothing to do with using an ORM. I can cache data using SQL.
9. Transaction management and Isolation.
Also has nothing to do with using an ORM versus a more direct DAL.
10. Key Management.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Nice Answer" or "Good Answer" badges. This was my answer to "Why Should You Use An ORM?"

Tuesday, January 20, 2009

Is This Legal?

A user recently asked a question about GPL compatibility with his company's commercial software offerings:

I work for a software / design firm and I recently found out that our "in house" CMS is actually [based on software] licensed under the GPL Ver 2. I would like to know if it is ethical / legal to be selling this to clients.

Don't act on any legal advice you read on a forum like StackOverflow -- including mine. :-)

Here's a passage about GPL from Wikipedia (emphasis mine):

The terms and conditions of the GPL are available to anybody receiving a copy of the work that has a GPL applied to it ("the licensee"). Any licensee who adheres to the terms and conditions is given permission to modify the work, as well as to copy and redistribute the work or any derivative version. The licensee is allowed to charge a fee for this service, or do this free of charge. This latter point distinguishes the GPL from software licenses that prohibit commercial redistribution. The FSF argues that free software should not place restrictions on commercial use, and the GPL explicitly states that GPL works may be sold at any price.

However, if your company is distributing the software under another license not compatible with GPL, then they're violating their license.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "Is This Legal? (GPL Software/ Licensing Issues)"

Wednesday, January 14, 2009

Learn to Program in 21 Days

A user recently asked:
Has anyone "learned how to program in 21 days?"

I'm not a fan of these learn how to program in X amount of days books. Some even boast, learn how to program in 24 hours. This is a joke and an insult to me as a software engineer who went through a rigorous discipline in computer science and mathematics.

So a question to the community, have you benefited from these become a programmer quick books?

No, it's impossible to learn how to program in 24 hours or 21 days.

See "Teach Yourself Programming in Ten Years," an article by Peter Norvig (Director of Research at Google, Inc.).

If you already have good fundamental skills at programming, and you just need a tutorial-style book to guide you through learning a new API, then these kinds of books may be helpful.

Even then, the level of expertise will be shallow. It will take many months (at least) to become really proficient. But the quick-introduction books are useful to give you a taste of the range of functionality in a language or API.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "Has anyone 'learned how to program in 21 days?'"

Best. Perl Script. Ever.

A user recently asked:
What has been your best programming experience so far?
The most successful program I've ever written was this Perl script:
' ','@')[$y**2-20*$y+($_**2)/3<0]),(0..30)),),(0..24));
print join("\n", map(reverse($_).$_, @l)), "\n";

I wrote this for a woman I was dating in 2001. Writing a Perl script for my girlfriend is not as geeky as it sounds, at least in this case. She's also a software developer, and she was taking a Perl class at the time.

I consider this script a great success because she married me in 2007!

I'll leave it as an exercise for the reader to run the script in a console window and see its output (I promise it's not a Trojan Horse or any other kind of evil trick).

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was based on my answer to "What is your best programming experience?"

Tuesday, January 13, 2009

The Next-Gen Databases

A user recently asked:
I'm learning traditional Relational Databases (with PostgreSQL) and doing some research I've come across some new types of databases. CouchDB, Drizzle, and Scalaris to name a few, what is going to be the next database technologies to deal with?
SQL is a language for querying and manipulating relational databases. SQL is dictated by an international standard. While the standard is revised, it seems to always work within the relational database paradigm.

Here are a few new data storage technologies that are getting attention currently:

  • CouchDB is a non-relational database. They call it a document-oriented database.
  • Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web service. Amazon also has a distributed key-value store called Dynamo, which powers some of its S3 services.
  • Dynomite and Kai are open source solutions inspired by Amazon Dynamo.
  • BigTable is a proprietary data storage solution used by Google, and implemented using their Google File System technology. Google's MapReduce framework uses BigTable.
  • Hadoop is an open-source technology inspired by Google's MapReduce, and serving a similar need, to distribute the work of very large scale data stores.
  • Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It's a research project from the Zuse Institute in Berlin, Germany.
  • RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its own query language SPARQL, which resembles SQL superficially, but is actually totally different.
  • Vertica is a highly scalable column-oriented analytic database designed for distributed (grid) architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon's Elastic Compute Cloud.
  • Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL.
  • XML isn't a DBMS at all, it's an interchange format. But some DBMS products work with data in XML format.
  • ODBMS, or Object Databases, are for managing complex data. There don't seem to be any dominant ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL is gradually gaining some OO features (e.g. extensible data types and tables).
  • Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architectural changes designed to manage data in a scalable "cloud computing" system architecture. Presumably it will continue to use standard SQL with some MySQL enhancements.

Relational databases have weaknesses, to be sure. People have been arguing that they don't handle all data modeling requirements since the day it was first introduced.

Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don't fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.

Even though these advanced technologies do great things to solve the specialized problem they were designed for, relational databases are still a good general-purpose solution for most business needs. SQL isn't going away.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "The Next-Gen Databases."

Saturday, January 10, 2009

Verifying a Company Uses Best Practices

A user recently asked how to use the Joel Test in an interview, to confirm that a software company practices what they preach with regard to professional software development habits:
I've got an interview with a company that claims to score a 12 on the Joel Test. [...] What are some ways of determining if they really implement all 12 points? Are there any particular questions I can ask?
It's reasonable to say, "show me." Ask them for examples and concrete details of their support for the Joel Test subjects. Since they claim they score all 12 points, they are obviously proud of it. People tend to like to show off, so they'll probably be eager to share more details.

If you ask more specific questions, it'll become apparent from their descriptions whether they really have those good practices.

We can think of many specific follow-up questions to the basic questions. The Joel Test questions are in bold below, and my follow-ups, er, follow:

  1. Do you use source control? What source control system do you use? Why did you pick that one? What is your branch/release policy? What are your tag naming conventions? Do you organize your tree by code vs. tests at the top with all modules under each directory, or do you organize by module at the top with code and tests under each module directory?
  2. Can you make a build in one step? What tools do you use to make builds? How long does it take to go from a clean checkout to an installation image? What would it take to modify the build? Is it integrated into your testing harness? What would it take to duplicate a build environment? Are the build scripts and tools also under source control?
  3. Do you make daily builds? What software testing tools do you use for daily builds? Do you use a Continuous Integration tool? If so, which one? How do you identify who "broke the build?" What is your test coverage?
  4. Do you have a bug database? What bug tracker software do you use? Why did you pick that one? What customizations did you apply to it? Can you show me trends of rate of bugs logged or bugs fixed per month? How does a change in source control get associated with the relevant bug?
  5. Do you fix bugs before writing new code? What is your bug triage process? Who is involved in prioritizing bugs? How many bugs did you fix in the last release of your product? Do you do bug hunts with bounties for finding critical bugs?
  6. Do you have an up-to-date schedule? Can I see it? How far are you ahead of/behind schedule right now? How do you do estimating? How accurate a method has that turned out to be?
  7. Do you have a spec? Can I read one? Do you have a spec template? Can I see that? Who writes the specs? Who reviews and approves the specs?
  8. Do programmers have quiet working conditions? Can I see the cubicle or work area for the position I'm interviewing for? (or an equivalent work area)
  9. Do you use the best tools money can buy? What tools do you use? Are you up to date on versions? What tools do you want you don't have yet? Why not?
  10. Do you have testers? How many? Can I meet one? Do testers do black-box or white-box testing?
  11. Do new candidates write code during their interview? What code would you like me to write? What are you looking for by seeing my code?
  12. Do you do hallway usability testing? How frequently? Can I see a report documenting one of your usability testing sessions? Can you give me an example of something you changed in the product as a result of usability testing?

Beware if their answers to the specific follow-up questions are evasive like, "um yeah, we are committed to doing more best practices and we'll be looking to you to help us effect changes toward that goal." If they're so committed to it, why don't they have anything to show for it yet? Probably because like many companies, when the schedule is in jeopardy, following "best practices" goes out the window.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "Administering the Joel Test."

Friday, January 09, 2009

Do I really need version control?

A user recently asked:
I read all over the internet (various sites and blogs) about version control. How great it is and how all developer NEED to use it because is a god bless.

Here is the question: do I really need this? ... I usually work alone (freelancer) and I had no client that asked me to use svn (but never is too late for this, right?). So, should I start and struggle to learn to use svn (or something similar?) Or it's just a waste of time?
Here's a scenario that may illustrate the usefulness of source control even if you work alone.
Your client asks you to implement an ambitious modification to the website. It'll take you a couple of weeks, and involve edits to many pages. You get to work.
You're 50% done with this task when the client calls and tells you to drop what you're doing to make an urgent but more minor change to the site. You're not done with the larger task, so it's not ready to go live, and the client can't wait for the smaller change. But he also wants the minor change to be merged into your work for the larger change.
Maybe you are working on the large task in a separate folder containing a copy of the website. Now you have to figure out how to do the minor change in a way that can be deployed quickly. You work furiously and get it done. The client calls back with further refinement requests. You do this too and deploy it. All is well.
Now you have to merge it into the work in progress for the major change. What did you change for the urgent work? You were working too fast to keep notes. And you can't just diff the two directories easily now that both have changes relative to the baseline you started from.
The above scenario shows that source control can be a great tool, even if you work solo. Source control can solve many problems for you, such as the following:
  • You can use branches to work on longer-term tasks and then merge the branch back into the main line when it's done.
  • You can compare whole sets of files to other branches or to past revisions to see what's different.
  • You can track work over time (which is great for reporting and invoicing by the way).
  • You can recover any revision of any file based on date or on a milestone that you defined.
For solo work, Subversion is recommended. CVS is all but antiquated, and GIT is more useful for distributed teams. A good book is Pragmatic Version Control Using Git by Travis Swicegood.

I'm posting to my blog the questions I've answered on StackOverflow, which earned the "Good Answer" badge. This was my answer to "Do I really need version control?"