Thursday, December 27, 2007

Idea for "pure" mysqlnd interface

When I first heard about the mysqlnd project at the MySQL Camp in 2006, somehow I thought it was going to be 100% PHP code. I can't remember now if I was making an incorrect assumption or if they planned to write it in PHP but changed their mind.

What we have today is a DLL that is a drop-in replacement for libmysql. This has a few nice advantages. It means the replacement library will be API-compatible with all three PHP extensions for MySQL, (mysql, mysqli, and pdo_mysql) so it requires no one to change their PHP application code. The libmysqlnd is licensed under the PHP license, so it can be distributed with PHP core without any license conflict. All in all a good solution that will help both PHP and MySQL.

But what about my (mistaken) impression of a MySQL client written in PHP, instead of implemented as a DLL? Essentially this would implement in PHP code the MySQL client/server protocol.

It's certainly possible that implementing this protocol in a language like PHP would be hopelessly inefficient and not suitable for any real world development. But it would have a few nice advantages:
  • Providing a platform-independent MySQL client.
  • Deploying easily in hosting environments with no access to configure php.ini.
  • Allowing PHP debugging tools to inspect at the protocol level.
  • Serving as a model for similar drivers for Perl, Ruby, Python, etc.
  • Licensing with the same license as the respective language, e.g. PHP License, Artistic License, or GPL.

Writing a driver in the host language has precedent. A JDBC type 4 driver is supposed to implement the vendor's client/server protocol in 100% Java code, so as to appear to the RDBMS server as indistinguishable from the vendor's own client library.

So for now, this is just an idle idea. Please don't anyone ask me where to download this imaginary software, it doesn't exist as far as I know!

Monday, December 10, 2007

How to Save $100 Million

Last night I listened to an interesting interview on public radio, relating a story from the New Yorker magazine about Michigan Dr. Peter Pronovost saving millions of dollars and hundreds of lives of patients.

How did he do it? He taught doctors and nurses to use checklists to avoid mistakes in the intensive care units of hospitals. Mistakes that could put patients' health or lives at risk.

What's interesting about this story is that it's an extremely low-technology solution to a type of problem that exists in virtually every field. In this case, it applies to medical care. But it easily applies to manufacturing. In Japan, they call it poka-yoke, or mistake-proofing. Check out this book too: "Mistake Proofing: Designing Errors Out"

Do checklists and similar techniques mean hamstringing the creative process in these fields? Absolutely not! On the contrary, effective use of checklists can free our attention from repetitive details, so that we can devote more of our energy to innovation and creativity. We don't need to keep the details of well-understood procedures in our short-term memory, if we write down the steps so that we can do them without burden, or delegate the work to a teammate.

Why not go the extra step and create technology to automate those procedures? Because a checklist doesn't necessarily remove the requirement for human attention, to exercise good judgment and critical thinking. Some steps may require analysis, or may be performed conditionally based on the result of a previous step. It's usually very expensive to make a machine that does that kind of analysis.

Procedures are inexpensive to modify and prototype when humans perform them. It might turn out that the whole procedure is revealed to be incorrect, and needs to be re-thought. If automation technology had been developed for that procedure, the cost of developing that technology would be wasted. If the procedure were merely a checklist, then we just need to re-train the operation staff and voilĂ !

Notice that books like David Allen's "Getting Things Done" focus on non-technological methods for organizing and avoiding letting things fall through the cracks.

If checklists and other easy organizational techniques are such a good idea, why don't we employ them more? In the article about Dr. Peter Pronovost, he remarked that it's surprising that it has taken so long to adopt his methods, and if there were a drug that achieved the same positive results he does, it would be mandatory in every hospital. A clue to the explanation is in if there were a drug. Follow the money! The solution that is marketed most aggressively is not the one that is most cost-effective; it's often the one that is least cost-effective, because its vendor stands to make the most money from that one.

Thursday, November 29, 2007

In Support of the Relational Model

Every few years, a wave of people claim that the relational model falls short of modeling data structures of modern applications. This includes some really smart people recently such as:

- Jim Starkey
- Brian Aker
- MIT researchers

This trend of criticizing the relational model started almost immediately after E. F. Codd published his seminal paper, "A Relational Model of Data for Large Shared Data Banks" in 1970.

Some data modeling tasks are just complex by nature, and it is necessarily a difficult, time-consuming analysis to figure out how to structure the storage for a given application in a manner that is both logically complete and also efficient.

The complexity does not originate from the technology; the complexity originates from nontrivial real-world scenarios, which are full of complexity, inconsistency, and special-case exceptions. It isn't that the database modeling technology is insufficient, it's that the job is genuinely hard.

This is an example of what I call a Waterbed Problem. When you're lying on a waterbed, and you push down with your hand to make part of the bed lower, the water inside is displaced and rises in some other area. Even if you get a lot of friends to help you with the waterbed (okay this is starting to get racy but don't worry), the water is still mostly non-compressible, so naturally no matter how hard you push, you only displace the water, you can't make it shrink.

The waterbed analogy applies when we try to simplify a computer organization problem. We "push down" on the complexity of the task by inventing new methods for describing the problem. Sometimes this leads to nifty things like domain-specific languages. But generally there is a trade-off. We make one part of the problem easier to solve, at the cost of making another part of the problem less flexible, or harder to solve. The analogy is that the complexity rises in some other area, in response to our attempt to push the complexity down in the area we're working on currently.

We might even create a simple solution for the problem at hand, but the new solution is simply unable to address some related problems. This might be analogous to a leak in the waterbed, but probably I've exhausted this analogy by now.

Flexibility of the data modeling solution is not the only criterion. It must also support rigidity where needed. If you can't define the structure of your data model enough to disallow invalid data, then you're asking for trouble. If the data management is just a pool of data elements and relationships, and any enforcement of structure is dependent on application logic, then your data integrity is at risk of bugs in that application, or susceptible to invalid changes made via a different client application.

I believe the relational model strikes a good balance between flexibility and rigidity, and that's why it has been a good choice for general-purpose data modeling for so long. Other techniques have advantages in certain areas, but they always have a balance of disadvantages in other areas.

There is no magic bullet that can make the data modeling process easier. Not object databases, not hierarchical databases, not semantic databases, not network databases. Different data modeling paradigms can describe the data relationships differently, but they can't make the problem simpler.

To respond to the smart folks who say that the relational model is inadequate, I concede that the relational model is the worst form of DBMS except for all those other forms that have been tried from time to time (with apologies to Winston Churchill).

Saturday, November 03, 2007

Less code vs. good code

Alex Netkachov, Vidyut Luther, and Richard Heyes discuss the pros and cons of writing code that is short. Here are some thoughts from me.

In general, I don't think turning three lines of code into one line makes an application better, or more readable, or prettier, or whatever is your goal. The logical extension of this is to switch to writing Perl code:


#!/usr/bin/perl
# Valentine.pl, copyright 2001-2007 Bill Karwin
map( ( $r=$_, map( ( $y=$r-$_/3, $l[24-$r].=(' ','@')[$y**2-20*$y+($_**2)/3<0] ), (0..30) ), ), (0..24) );
print join("\n", map(reverse($_).$_, @l)), "\n";


The problem with such super-compressed code is that it becomes harder to code, harder to debug problems, and harder to maintain. Imagine being a newly hired developer assigned to a project that contains code all written like the above.

Of course, we're not talking about code as obfuscated as the above. But taking a step in the direction of compressed code for the sake of compressed code, instead of for some functional improvement, is a step toward making the code less maintainable.

For example, think about reading diffs in your revision control system, when a bug fix occurs on such a compressed line of code. It's harder to see the nature of the bug fix, compared to a more expanded form of coding. For this reason, I prefer to write expanded code--even when writing Perl code!

The ternary operator is another good example. Vidyut expresses that the ternary operator should not exist. It's true that it can be abused like any programming construct, but it is definitely useful to have the ternary operator in the language. It is a simple if/else construct, which has a return value. The conventional if/else construct doesn't have a return value. Alex gives an example that is a perfect use of the ternary operator.


$message = ($age < 16) ? 'Welcome!' : 'You are too old!';


But notice I said simple. The ternary operator is inappropriate to use for any if/else blocks that contain anything more complex than a single expression. What if the application requires that we log the event when someone who is too old attempts to access the system?

One can try to do acrobatics in code to accomplish two steps in one expression, but this is not a good practice because if a bug occurs in either step or the requirements change, you end up breaking both.


// Wrong
$message = ($age < 16) ? 'Welcome!' : log_and_return_message('You are too old!');


Once either the positive or negative block becomes anything other than a single expression, you do need to change to using an if/else construct. So if you are tempted to use a ternary operator because it's shorter or it's prettier, consider if there is any likely scenario in which you would have to restructure it anyway. If so, that's probably a good reason to use if/else instead of a ternary expression.

Thursday, November 01, 2007

Enabling the Success of a Software Team

There are three "must haves" for excellent managers, which I look for when I work for a manager, and which I try to live up to when I work as a manager.

I thought I'd write down these thoughts, after seeing Jeremy Cole's blog this week with some great advice about ways to attract, motivate, and retain expert MySQL DBA architects, and earlier this summer Cal Evans' podcast Attracting Talent on the PHP Abstract site about attracting talent.

This brings me to my ideas about how to manage excellent software developers. There are many things a manager has to do to be effective at leading a team, but to boil it down to a single principle, I like to say that the manager's job is to enable the success of their team.

Here is my list of three management responsibilities to support the success of the team:

1. Give clear and achievable assignments

The first step to making your team successful is to tell them what you want them to do. What is the goal of the project? What does the resulting software need to do? Are there constraints on schedule or technology? Who is the audience? Who approves the final deliverable? These and other high-level questions must be answered, even if all the fine details are still in "discovery phase.".

Counter-example: in a classic Dilbert comic strip, the boss asks Dilbert to work on a new project. Dilbert says, "great, I'm ready, what's the project?" The boss says, "It's not all worked out yet, so you just start coding, and I'll stand here looking over your shoulder. If you do something wrong, I'll scream."

The assignment must be achievable. Not softball -- giving a developer a challenge is a good thing. Many people thrive on this, and sometimes they can pull a rabbit out of a hat and surprise everyone (including themselves). But it does no good to give someone a task that is truly impossible, this just sets them up for certain failure.

Also, the assignment must be consistent, or at least acknowledge clearly when it changes. We all know that project requirements tend to evolve and we are okay with that. But a manager who implies that the developer should have anticipated the change is being disrespectful. Or worse, I've seen managers claim that the changed requirements were what was "intended" (though not stated) from the beginning, and that it's the developer's fault for not inferring this information. What can one say about this behavior? Let's just say that the manager is not fulfilling his or her responsibility to make the team successful.

2. Provide the resources needed to be successful

I have a pretty broad definition of "resources" in this context, including hardware and software tools, enough time to complete the assignment, access to people who are needed such as IT support staff or subject matter experts, any existing technology or research that is part of the desired solution, etc.

Counter-example: I once was told to set up a testing environment, but we had no server on which to install it. The VP's solution was to tell me to use VMware and then I'd have as many servers as I need. But we still needed a real server on which to run the VMware software, and we had none. This is an example of being told to make bricks without straw.

Another counter-example: a manager who won't authorize a $250 expense for a commercial-quality code-analysis tool, but they'd rather let their highly-paid developers spend weeks debugging elusive issues. That's not a smart use of time or money. Sure, one doesn't want expenses to get out of control, but being either too stingy or too frivolous are both likely to put the team's success at risk.

3. Give constructive feedback

The manager must communicate clearly and deliberately, instead of assuming "no news is good news."

Feedback doesn't need to be full of hollow affirmations or cheerleading; it should let the developer know how close he or she is to the goal of success. Also, if the developer is off-target, it's important to communicate about this and correct it as early as possible. Most people naturally want to do a good job, and being allowed to do the wrong job for weeks is sure to discourage them once they learn the truth.

Ultimately, when the team completes an assignment, a manager should tell them they did so, and how well it meets expectations. An important part of enabling a team's success is letting them know when they have done it.

Tuesday, October 30, 2007

Proposals for MySQL Conference

I submitted proposals for the MySQL Conference & Expo.

SQL AntiPatterns II

I thought it would be a no-brainer to do a sequel of my 2007 talk, "SQL AntiPatterns". That talk was very well attended, thanks to Jay Pipes' endorsement in his guide to the conference. It's not hard to come up with all-new content for a sequel!

Topics in this presentation:
* Corrupt your data by storing images in files instead of BLOB fields.
* Kill your query performance using the HAVING clause.
* Use the FLOAT datatype and lose money.
* Add an "id" column to every table -- whether it needs one or not.
* Prepare queries using parameters for identifiers and keywords.

Designing Models and Such: using MySQL in MVC Applications

I just recently finished working for Zend Technologies, spending a year developing the database access components for the Zend Framework.

A database like MySQL is an integral part of virtually every web application. This talk describes practical ways to leverage MySQL in your project, to meet goals of development productivity, application performance, and security.

Model-View-Controller (MVC) is a popular architecture pattern for web applications, but it may be novel to PHP developers. Designing Models in an MVC application is the subject of many questions, so this talk will focus on these issues.

Examples use the Zend Framework web application library for PHP 5.

Topics:
* Designing database-backed Model classes for MVC applications
* Caching data and metadata appropriately
* Storing authentication credentials in a database
* Configuration management and testing issues
* Logging application events to a database

The audience for this talk is assumed to know object-oriented programming concepts in PHP 5.

Tuesday, October 23, 2007

Leaving Zend

I've worked at Zend for the past 13 months, heading up an open source project called the Zend Framework. Zend Framework is a library of PHP 5 classes providing simple, object-oriented solutions for most features common to modern web applications. I was the project manager as well as developing a lot of code, tests and documentation, and engineering the product releases through its 1.0 release.

When I joined Zend in September 2006, that project had made a few "Preview Releases", but it was losing momentum. My assignment was to organize the project, finish development of the 50+ components in the library, make regular beta releases to demonstrate progress, and to move the product to a general 1.0 release as rapidly as possible.

To achieve this goal, I knew we had to manage the scope of the project carefully. There's always tension between CSSQ (Cost, Scope, Schedule, and Quality) in any project. The project already had bare-bones cost, we had high standards for quality (who doesn't?), and Zend placed a very high priority on making a general 1.0 release as soon as possible. So the only thing remaining to control was the scope.

We were blessed with an enthusiastic user community, but this meant that we had dozens of people submitting feature requests and proposals for new components every week. Though the ideas were genuinely very attractive, there was simply no way to add them to the project scope without causing consequences to the schedule or quality of the project. And there were some features that we wish we had time to do before we had to reach that 1.0 milestone.

Some members of the user community voiced objections to the emphasis on schedule over feature-set. I hope they understand that we were following the priorities of Zend, who is after all the sponsor of the project.

We released Zend Framework 1.0 on June 30, 2007, and followed it with a couple of bug-fix releases in July and September. Zend Framework is accelerating in popularity, with over 2.3 million downloads to date. I feel proud to have contributed to a successful web application component library. It's very satisfying to see so many people using code I worked on, making their own projects more successful.

I was honored to work with a great team of software developers. I learned a lot from Darby, Matthew, and Alexander as we worked together, about technical subjects, productivity, and teamwork. Those guys were great to work with, and I hope to work with them again someday. There are a lot of other fine people at Zend and in the Zend Framework developer community, but I worked most closely with those three.

I completed my assignment at Zend successfully. But Zend and I were unable to define a next objective for me. That usually means it's time to declare victory and move on, so I gave notice and I finished working there last week.

During the time I was on the project, the Zend Framework increased by:

  • 104,000 lines of PHP code in its core library;

  • 100,000 lines of PHP code in its unit tests;

  • 3,945 unit test functions (overall code coverage from tests is 84%);

  • 40,000 lines of documentation;

  • 200 new community contributors.

Sunday, January 28, 2007

SHA2() patch for MySQL 5.0

I've created a patch for MySQL 5.0.33 to provide a function SHA2().
Download it here:

http://www.karwin.com/sha2.patch.gz

It really just calls out to the OpenSSL library for the digest functions. So you have to build MySQL from source with OpenSSL support enabled.

You can use the function in SQL syntax like:

SELECT SHA2('message', 256);

The second argument is 224, 256, 384, or 512, depending on what digest algorithm you want to use. If you pass 0 as the second argument, it uses SHA-256.

This is my first code contribution to MySQL. I'd be grateful if someone wants to review it and let me know if it needs any changes.

UPDATE 2/5/2007: I re-packaged the patch, excluding more of the MySQL generated files. Thanks to Stewart Smith @ MySQL for the suggestion.

UPDATE 12/3/2010: MySQL 5.5.8 has been released for General Availability, including my SHA2() patch. Happy day!