Saturday, January 26, 2008

PDO v2 CLA issues

Wez Furlong posted a request for discussion regarding the future of PDO and they proposed to adopt a CLA (Contributor License Agreement) to manage contributions to the project. Some questions have come up indicating some misunderstandings about how CLA's work.

As you may know, I worked for about a year as the project manager of the Zend Framework. I was involved in administering the CLA process for that project and managing community contributions. I'll describe my experiences on that project and give my understanding of the CLA issues. I must say however that I'm not a lawyer, nor have I talked to a lawyer about these issues.

I should also say that I have been participating in the PDO v2 discussions. I was an employee of Zend when they started organizing meetings to discuss PDO v2. I have since left Zend, but I'm continuing to participate, now as a community developer. I'd really like to see the DBMS vendors get involved, because it would add a lot of much-needed developer resources to the PDO project.

Discussions and Patches

Questions have come up about how to apply the terms of a CLA to code patches submitted via an issue tracker, or a mailing list, or in chatrooms. Do these require that the author sign the CLA? In the Zend Framework project, the answer is yes. Otherwise that code cannot be incorporated into the project.

If the author of a patch was not a CLA signer, we could do one of two things:

1. Ask the author to sign the CLA and then grant that specific patch to the project retroactively. We did this on several occasions, and in general it was not a problem.

2. Someone else writes a different, original solution for the problem for which the patch was made, and submits this new contribution under the terms of the CLA.

The latter was often necessary anyway, because the patch author might not understand some of the code architecture, or their solution covered only some cases. For example, when I maintained Zend_Db, sometimes a patch was submitted to fix an issue in one DBMS-specific adapter class, but the issue really affected all DBMS brands. So I re-coded a new solution in the abstract superclass. Thus I did not use the contributor's patch verbatim, but I wrote an alternative solution to address the same goal.

Discussions, bug reports, and feature requests are not considered to be IP and thus are not subject to the terms of the CLA. This applies to talk on the issue tracker, the mailing lists, chatrooms, or IM. It also applies to face-to-face meetings, users groups, conferences, etc. Anyone can offer comments, criticisms, wishlists, etc. to the project without conflicting with the terms of the CLA.

When describing a feature request, often the desired usage is illustrated with code. This code shows usage, not implementation, so it's not likely to become part of the project. The CLA does not apply to this type of contribution. However, if that code showing usage becomes part of the project in the form of documentation or demo scripts, then in that case yes, the CLA would apply.

Basically, a reasonable guideline is that if some contribution is checked into the project's source control (CVS), then it's subject to the terms of the CLA. This applies to code, test code, test data, docs, build scripts, README files, etc. -- anything that can be copyrighted and that gets included with the project.

In the Zend Framework project, we also require written proposals and specifications to be submitted under the terms of the CLA. These documents were not checked into source control (though one could argue that they should be), but they still required the assurance that the contributor was not violating someone else's IP rights to that material. This seemed like a good policy for proposals and specs, since these documents often contain prototype code.

Documentation and Tests are Subject to the CLA

Anything that can be copyrighted is considered intellectual property (IP). If that IP becomes part of the project it must be treated similarly as code contributions. That is, the contributor must assure that this IP is something he has the right to contribute, and he's not illicitly copying someone else's work. The CLA often uses the term "contribution" instead of simply "code" because the terms of the CLA apply to more types of contributions than solely code.

For example, in the Zend Framework project, everyone who is granted commit privileges to the subversion repository or to post proposals on the wiki must first sign a CLA. Even volunteers who translate the English documentation into other languages must sign a CLA.

Later in this blog I may say "code" for simplicity, but what I say applies to all IP contributions in the project.

The CLA Does Not Prohibit Code Reuse

The CLA does not require that every contribution be original work. It does require that the contributor agree to contribute only IP that they have rights to. If the contributor is also the copyright holder of that work, this is relatively straightforward, but they could also be contributing non-original work if they have the permission of its owner.

The point has been made that a CLA-governed project cannot build on other OSS code. Yes and no. In practice it's rare to incorporate code of any significant size from a non-CLA-governed source, because most OSS projects cannot assure with certainty that all their contributions have been made in a manner compatible with a CLA. But this isn't the fault of a CLA, it's just a result of the organic way most OSS projects grow.

CLA Does Not Exclude Community Involvement

One assertion is that one needs to sign the CLA to view the PDO v2 specification. This is incorrect. The current PDO v1 spec is online now, and my understanding is that the PDO v2 spec will be open too. Similarly, speculation that one needs to sign a CLA to view the source code is false. Perhaps people are confusing CLA with the concept of an NDA.

Some people believe that the use of a CLA blocks the community from being involved with the project, or that the work occurs in mysterious smoky rooms behind closed doors. This is also not true. Non-CLA signers can give feedback and discussion -- but the actual code and other IP must be written by people who have signed a CLA.

Here's a hypothetical scenario: any community member can read the specification or the code and say, "I don't like the way it handles feature X. It fails to account for case Y." The contributor of the spec says, "okay, I've edited the spec with case Y in mind, does that satisfy your issue?". Community member responds, "yes, that's good."

See? The community stays involved, and their feedback is heeded. The Zend Framework is a good example. Hundreds of developers who never signed the CLA have filed bugs or feature requests in the issue tracker, or asked questions on the mailing lists.

CLA Does Not Restrict OSS Freedom

One assertion that has been made is that requiring a CLA opposes the spirit of free software, since it places conditions on contributions. I would point out that not everyone has commit privileges to the PHP project. Contributions are carefully vetted, discussed, reviewed. Many are flat out rejected, sometimes for subjective or inconsistent reasons. No one is complaining about this process -- I'm not either. But it should be noted that PHP is not a free-for-all. There are good reasons to filter contributions for the sake of quality.

The tradition of PHP includes an unwritten assumption that contributors grant their work freely, and do not expect compensation or to dictate special terms of use for their contribution. There's also an assumption that contributors are not plagiarizing code or other IP. In fact, there have been some recent cases where code had to be removed or reimplemented to avoid IP conflicts. So PHP does have a commitment to respect other people's copyrights and licenses, and to preserve clean IP in the project.

Given that, is it such a bad thing for contributors to make their agreement with those traditions explicit? To promise that you contribute only IP that you have a right to contribute, and that you do so freely and do not expect compensation, seems very consistent with the spirit of OSS.

Another tenet of OSS is that anyone can create derived works. It does not mean that a given project must accept contributions. The proposed PDO License permits creating derived works, and it even explicitly states that the derived work may use a different license, which in my mind supports the spirit of free software (I'm not a GPL zealot).

CLA Does Not Make Contributors Legally Responsible

Another assertion that has been made about the CLA is that by signing something, the contributor becomes "legally responsible" for their code. Of course you are responsible for not plagiarizing your code, but that's true regardless of whether the project uses a CLA or not.

The other interpretation of "legally responsible" is in regards to liability for damages if the code is defective. In fact, the CLA has a clause by which the contributor disclaims responsibility for the code. This "AS-IS" clause (the part in all-caps) is common in software licenses.

But without a CLA, there is no such agreement between the original contributor and the project. The project itself could sue the contributor for a bug that resulted in damages (actually my understanding is that since PHP and PDO incorporate no legal entity, the project cannot initiate a lawsuit). Having a CLA between the contributor and the project makes it clear that the contributor offers no warranty for his code. Thus the contributor has more protection by using a CLA than by not using a CLA.

CLA Does Not Protect Contributors From Being Sued

Wouldn't that be clever, to sign a form like that! "I hereby certify that no one can sue me." It's nonsense to expect a piece of paper to give this guarantee.

Yet this is the argument some people use against CLA's: that it doesn't protect them from being sued if they contribute code that conflicts with someone else's patent or copyright. People who use this objection have gotten it backwards. A CLA doesn't protect you as a contributor from being sued -- it is your agreement that you won't sue other people who use your code.

This is beneficial to you as a contributor because it goes both ways. There are other contributors writing code for the project. If these other contributors have also signed the CLA, they have agreed to grant their work to the project (subject to the terms of the license). They've stated that they won't come back later and demand other terms for using their contribution. Any protection you get is not because you have signed the CLA, it's because all the other contributors have signed the CLA.

The purpose of the CLA is not to protect you if you write code that conflicts with someone else's IP. That's your responsibility. Keep in mind that this is no different if you contribute to a project that has no CLA process.

CLA is Not the Commercial Vendors' Plot to Control PHP

This is incredibly cynical, and it doesn't even make any business sense. The commercial DBMS vendors have demonstrated their commitment to OSS by contributing to many projects. Their interest is in making it attractive for developers using PHP to adopt their latest DBMS technology, by ensuring that their cutting-edge features are supported by PDO.

But to do this, they need some assurance that by participating in the PDO project, they won't become exposed to other contributions that contain "tainted" IP. It's the job of the legal services professionals in each of these companies to protect them from such risk. I'm sure it's fresh in their minds in the wake of the SCO-Linux controversies.

Tuesday, January 15, 2008

MacBook Air - Almost as Thin as a Sinclair ZX80

I just watched the photos and text from today's keynote at MacWorld. Steve Jobs unveiled the MacBook Air, described as the world's thinnest notebook computer. I suddenly recalled being at a computer graphics user group at UCSC, circa 1982.

One of the hobbyists in that group had a Sinclair ZX80, a kit computer from the UK that was notable as the first personal computer available for under £100. The thing that makes me think of that early PC in relation to Apple's new notebook is that the ZX80 was even smaller and thinner than a MacBook Air. We usually think of early PC's as chunky monstrosities like the Osborne 1 or the Commodore PET that nonetheless cost over $1500.

It's interesting to recall that more than 25 years ago, the $100 portable computer was a reality, though with far less computing power than a modern alarm clock.

Saturday, January 05, 2008

"Pure" mysqlnd interface feedback

After I posted my idea about a pure PHP mysqlnd driver, I received an email from Ulf Wendel from the MySQL AB team who works on the mysqlnd driver. He told me I could post his comments since he doesn't have a Blogger account. I'll include his comments below in blockquotes and then I'll comment below that.

Ulf Wendel writes:
What is "native"?

PHP and the Borg[1] seem to be relatives. Whenever a real PHP hacker find a useful C library in the universe he speaks the famous words "resistance is futile". A little later, the C library is available to PHP has become a PHP extension. This is one of the secrets of the early success of PHP: PHP is extentable. Extensions allow all the PHP drones to use the collective power of C libraries. As of today the PHP Function Reference shows 189 extensions [2]. Guess how many of them are written in C and how many of them are based on a C library...

To make it short: with PHP you can choose between C and PHP. Same with Lua: a scripting language (Lua) and C (the language Lua is implemented in) can be mixed whenever appropriate. That's the nature and the secret of both PHP and Lua. Whenever your coding bees hit a limitation of the simple to use scripting language you use the C level. Implementing the MySQL Client/Server Protocol[3] is such an example. It would be slow.

Compare that to Java. Is it common to extend the Java programming language? No, not really, very few people start hacking a Java virtual machine. And once you have hacked one virtual machine, what about portability and the other virtual machines out there. Alternatives? Well, if you enjoy cross-compiling, maybe... That's why you would never want to write a JDBC driver in any other language but Java. And, Java is more of a compiled language than PHP is, therefore it is fast enough.

Native for PHP can mean both: C and PHP. If its C - like with mysqlnd - you have to ask what external dependencies exist. The MySQL native driver for PHP (mysqlnd) gets all the infrastructure it needs from PHP itself. The driver is part of PHP, therefore mysqlnd runs on all platforms that run PHP.

Why using C?

You list some disadvantages of choosing C:

a) Platform-dependence not given

Mysqlnd runs on all platforms that run PHP. Mysqlnd is a part of PHP, it does use in particular:

- PHP memory management (limits really work now!)
- PHP data structures (less copy operations and memory savings)
- PHP Streams

b) Communication protocol inspection not possible

The MySQL native driver for PHP does use PHP Streams. PHP Streams feature hooks, aka Stream Filters [4]. With a little hacking you could expose the internal stream to the userland (PHP scripts). However, we favour MySQL Proxy[5] and therefore we have not implemented it.

To sum up: technically its possible but we have not enabled it. Tell us why you need it inside PHP and you have a fair chance to see it implemented.

Last but not least: have you ever worked on a raw binary network stream in PHP. Do your really want to know about the details of the protocol which Andrey started to love while he was implementing it? Or do you want to use something that is already there and might be the future standard: MySQL Proxy[5].

c) Deployment problems with no access to php.ini

First, with mysqlnd foundations have been laid to enable MySQL support by default: no license issue, no version issues, no external library dependencies. Its up to php.net and its community to decide if mysqlnd should be added to the default PHP configuration.

Second, if you choose a hosting service that does not configure/compile PHP as you need it, you do something wrong - honestly.

Anyway: all the above is minor stuff. The main reason is maximum integration into PHP for the best performance and easy deployment with no license issues.

[1] http://en.wikipedia.org/wiki/Borg_%28Star_Trek%29
[2] http://www.php.net/manual/en/funcref.php
[3] http://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol
[4] http://www.php.net/manual/en/ref.stream.php
[5] http://forge.mysql.com/wiki/MySQL_Proxy
Now my comments to Ulf's comments:

First, thank you very much Ulf for your reply to my blog posting. I appreciate getting information "from the source" and it's important to get more information about libmysqlnd out to the community.

I want to reiterate that I think Libmysqlnd is the right solution for the PHP community, given the requirements of providing high-performance, quality extension with a PHP-compatible license. I look forward to libmysqlnd being part of the standard PHP distribution if the PHP community approves it.

I'm not lobbying to change libmysqlnd! I'm just supposing that a MySQL connector written in PHP code might also be interesting, even if it were not the preferred connector for MySQL server. It would be useful in a few circumstances, and also could be a debugging tool.

The performance advantage of C over of PHP is important. It stands to reason that an implementation of the MySQL protocol in a scripting language would be quite a bit slower.

However, it would be interesting to try it and measure the actual difference in performance, if for no other reason than understanding exactly how much performance advantage is achieved by using C. I understand from a user's comment that libraries exist in other scripting languages that implement the MySQL Protocol. Net::MySQL for Perl, Ruby/MySQL, and an unreleased Python library.