Monday, January 21, 2013

Webinar on PHP and MySQL Replication


Using MySQL replication gives you an opportunity to scale out read queries. However, MySQL replication is asynchronous; the slave may fall behind.

This Wednesday, January 23 2013, I'll be presenting a free webinar about using MySQL replication on busy PHP web sites.  Register here:  http://www.percona.com/webinars/readwrite-splitting-mysql-and-php


Applications have variable tolerance for data being out of sync on slaves, so we need methods for the application to query slaves only when their data are within tolerance. I describe the levels of tolerance, and give examples and methods for choosing the right tolerance level in your application. 

This talk shows the correct ways to check when the slave is safe to query, and how to architect your PHP application to adapt dynamically when the slave is out of sync.

I'll also demonstrate an extension to the popular PHP Doctrine database access library, to help application developers using MySQL to make use of read slaves as effectively as possible.

Please join me in this free webinar this Wednesday!

Tuesday, November 20, 2012

C Pointers Explained, Really

While I was in college, a friend of mine complained that he was confused while programming in C, struggling to learn the syntax for pointers.

He gave the example of something like: *x=**p++ being ugly and unreadable, with too many operations layered on each other, making it hard to tell what was happening.  He said he had done a bit of programming with assembly language, but he wasn't accustomed to the nuances of C.

I wrote the following explanation on our student message board, and I got a lot of good feedback.  Some people said that they had been programming in C for years, but not until they read my post did they finally understand pointers.  So here it is, unearthed from my backups and slightly edited.  I hope it helps someone again...


Message 1956 (8 left): Thu Jan 25 1990  2:44am
From: Bill! (easterb@ucscb)
Subject: Okay


Well, if you know assembly, you have a head start
on many of the cis freshpersons here.  You at least know
about memory maps:  RAM is a long long array of bytes.
It helped me to learn about pointers if I kept this in mind.
For some reason, books and instructors talking about
pointers want to overlook this.


When I have some code:

main()
{
int n;
int *p;


There is a place in my memory that looks like this:

           :
Address:   :
        |-----|
  0x5100|     |   n is an integer, one machine word big
        |-----|
  0x5104|     |   p is a pointer, also one word big
        |-----|
  0x5108|     |   other unused memory
        |-----|
           :
           :

Let's give these variables some values.
I set n to be the number 151.

        n = 151;

I set the pointer p to point to the integer n.

        p = &n;

That says, "the value of the variable p is assigned the
address of the variable n".

             :
Address:     :     Value at that address:
          |----|
  0x5100  | 151|  n
          |----|
  0x5104  |5100|  p
          |----|
  0x5108  |   ?|
          |----|
             :
             :

Now I want to print out the value of n, by two ways.

        printf("n is %d.\n", n);
        printf("n is %d.\n", *p);

The * operator says, "give me the object at the following address."
The object's type is the type that the pointer was declared as.
So, since we declared "int *p", the object pointed at will be
_assumed_ by C to be an int.  In this case, we were careful to
make this coincide with what we were pointing at.

Now I want to print out the memory address of n.

        printf("n is located at $%x.\n", &n);
        printf("n is located at $%x.\n", p);

The & operator says, "tell me the address where the following object
starts."  In this case, it is hex 5100 (I put a '$' before it, to
conform to the Assembly notation I am used to).
Notice the _value_ of p is an address.

Hm.  Does p have an address?  Sure.  It is a variable, and all
variables have their own address.  The address of p is hex 5104.

        printf("p is located at $%x.\n", &p);

Here we are taking the address of a pointer variable, using the & operator.

main()
{
char name[] = "Bill";
char *p;
int *q;

Now we have an array to play with.  Here's how memory looks now:

        |---|
 0x5100 |'B'|  "name" is an address constant that has value hex 5100
        |---|
 0x5101 |'i'|  char: 1 byte
        |---|
 0x5102 |'l'|  char: 1 byte
        |---|
 0x5103 |'l'|  char: 1 byte
        |---|
 0x5104 |\0 |  char: 1 byte
        |---|
 0x5105 |   |  p is a pointer: 1 word
        |---|
 0x5109 |   |  q is a pointer: 1 word
        |---|

        p = name;

We set p to the value of name.  Now p has value hex 5100 too.
We can use the * dereferencing operator on p, and get the
character 'B' as a result.

Now what happens if I do this:

        ++p;

The pointer p is incremented.  What value does it have now?
Hex 5101.  Pretty simple.

Now let's try something irresponsible:

        q = name;

But q is a pointer to int!  If we dereference q, it will take
the word (typically 4 bytes) beginning at address "name" (which
is hex 5100) and try to convert it to an int.  'B', 'i', 'l', 'l'
converted to an int will be some large number, dependant on the
bit-ordering algorithm on your machine's architecture.  On ucscb,
it becomes 1114205292.  (to see how, line up the binary representation
of the ascii values for those 4 characters, and then run the 32 bits
together, and convert that resultant binary number as an integer.)

What we have just seen here is a key issue of pointers that I
mentioned earlier:  C assumes that what they are pointing at
is an object of the type that the pointer was designed to point at.
It is up to the programmer to make sure this happens correctly.

        ++q;

The int pointer is incremented.  What value does it have now?
Hex 5104.  Huh?!?  The answer is simple if you accept the above
paragraph.  It gets incremented by the size of the object it
_thinks_ it is pointing at.  It's an int pointer, so incrementing
it makes it advance a number of bytes equal to the size of an int.

Now print the dereferenced value of q (i.e. the value of the object
q is pointing to).  Well, it's pointing at a null byte, and then
the first 3 bytes of the char *p.  Now we're all messed up.
Nice going.  Try to convert _that_ to an integer representation.
Well actually, C will do it happily.  But it'll be another weird number.


main()
{
int n;

        n = 151;
        f(n);
}

f(x)
int x;
{
        printf("%d.\n", x);
}

Here is a simple program that passes an int "by value".
That is, it copies the value of n into the new variable x!

        |---|
 0x5100 |151|  n is an integer
        |---|
 0x5104 |151|  x is another integer
        |---|

When we mention x, we are using the value at location 5104,
and we can change it, read it, whatever, and it won't affect n,
the int at location 5100.

But what if we want to have f() modify the value and then
have that new value be available in main()?  C does this by
passing the variable "by reference".

main()
{
int n;

        n = 151;
        f(&n);
}

f(x)
int *x;
{
        printf("%d.\n", *x);
        *x = 451;
}

Pass the _address_ of n, and declare x as a _pointer_ to int.
Actually, this is still passing by value, but the value being
passed is the address, not the number.

        |----|
 0x5100 | 151|  n is an integer
        |----|
 0x5104 |5100|  x is a pointer to int
        |----|

Now if f() when we make use of *x, we are referring to the
value at location 5100.  This is the location of n.
After the assignment "*x = 451;", this is what we have:

        |----|
 0x5100 | 451|  n is an integer
        |----|
 0x5104 |5100|  x is a pointer to int
        |----|

x still points to location 5100, but we have changed the value
of the object at that location.




Well, those are the basics.
You mentioned things like "*x=**p++" being ugly and unreadable.
Well, yeah, but here is a diagram that may help:

        |----|
 0x5100 |   0|  here is a word in memory with initial value 0. no variable name
        |----|
 0x5104 |  12|  here is a value, a word in memory.  no variable name.
        |----|
 0x5108 |5104|  Here is an int pointer, pointing at the previous word.
        |----|
 0x511c |5108|  here is p, a pointer to int pointer.
        |----|
 0x5120 |5100|  here is x, a pointer.  guess where it's pointing.
        |----|

First let's see what p and x were declared as:
int *x;    /* pointer to int */
int **p;   /* pointer to pointer.  The subordinate pointer is a pointer to int.*/

You should know now what "*x" means.  It means, "the value of location 5100."
And you know what "*p" means, "the value of location 5108".
Now that value is another address!  Okay, let's dereference that
address: "**p" and we find (by the declaration) an int.

Now "*x = **p" looks like, "this int at 5100 gets the value of
that int at 5104."

And what does "**p++" mean?  Well, ++ binds tighter than *, so this
is equivalent to:  *( *( p++ ) )
Or, "pointer to pointer to int, and by the way, after we're done,
p has been incremented.  But we looked where it was pointing
before it got incremented, so we don't care.  Let the next statement
worry about it."



This content is copyright 2012 by Bill Karwin.  I'll share it under the terms of the Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0 Unported.

Thursday, April 15, 2010

Don't Put the Cart Before the Horse

April 2nd I made this undiplomatic statement (funny how Twitter practically encourages being provocative):

#ZF 2.0 is a great example of second-system syndrome.
Matthew Weier O'Phinney and I have a good working relationship. I think his work on the Zend Framework project has been amazing, both from a technology perspective and a marketing perspective. 
Matthew and Bill
So when Matthew asked me to clarify my Tweet, I was happy to reply, in the spirit of constructive criticism. These thoughts apply to many projects--not just ZF--so I thought they would be of general interest. Here's the content of my reply:

When I've reviewed project proposals or business plans, one thing I often advise people is that you can't describe the value of a project in terms of how you implemented it. Users don't want to hear about how you used XML, or dependency injection, or unit tests, or agile methodology, or whatever. They want to hear what they can do with this product.

After reading the roadmap for ZF 2.0, I observed that a great majority of the planned changes are refactoring and internal architectural changes. These are worthwhile things to do, but the roadmap says very little about the feature set, or the value to users.

What I'm saying is that implementation does not drive requirements. That's putting the cart before the horse.

I admit that for a developer framework, this line is more blurry than in other products. Your users do care about the architecture more than they would for a traditional application. But that still doesn't account for the emphasis on implementation changes in the roadmap, and the lack of specific feature objectives.

For instance, some goals for the controller are described in a list of four bullet items: lightweight, flexible, easy to extend, and easy to create and use custom implementations (which sounds close to easy to extend). Then it jumps right into implementation plans.

So how flexible does it need to be, and in what usage scenarios? What does lightweight mean? How will you know when it's lightweight? Are there benchmark goals you're hoping to meet?

Another example is namespacing. Yes, using namespaces allows you to use shorter class names. Is that the bottleneck for users of ZF 1.x? Do you need to create a namespace for every single level of the ZF tree to solve this? Would that be the best solution to the difficulties of using ZF 1.x?

The point is that the way to decide on a given implementation is to evaluate it against a set of requirements. You haven't defined the requirements, or else you've defined the requirements in terms of a desired implementation.

My view is that requirements and implementation are decoupled; a specific implementation should never be treated as one of the requirements, only a means of satisfying the requirements.

Regards,
Bill Karwin

Wednesday, April 14, 2010

Sql Injection Slides Posted

I gave a presentation today at the MySQL Conference & Expo 2010, titled SQL Injection Myths and Fallacies. Thanks to everyone who came to my talk! I appreciate your interest in learning to develop more secure applications. SQL Injection is a serious threat to web applications, and it's only going to get worse. It's incumbent on you as software developers to learn how to write secure code!

My slides are now online in two places: on the MySQL Conference website, and at SlideShare.net/billkarwin.

I also handed out cards for a 20% discount on my upcoming book, SQL Antipatterns. One chapter in my book is devoted to SQL Injection risks and methods for defending against them. You can pre-order the hardcopy book and receive it as soon as it ships. You can also get the downloadable beta e-book right away, and receive an update when the editing is done.

I left a stack of the leftover discount cards on the collateral table in the hallway. If you didn't get one, you'll have another chance when I talk at the PHP TEK-X conference in Chicago in May!

Thursday, April 01, 2010

Announcing Awk on Rails

Awk on Rails is a new kind of web application development framework, with a distinction that no other framework has: Awk on Rails is fully POSIX compliant.

Awk on Rails brings the best practices of modern web application development to the ALAS stack (Apache, Linux, Awk, Shell). This stack is entirely new to the field of web development, yet already brings decades of maturity.
  • Installation is a breeze -- in fact, it's unnecessary, because Awk on Rails uses commands and tools already provided by your operating system.

  • Develop web applications that leverage the power of high-speed interprocess I/O pipelining, utilizing POSIX regular expressions to optimize request routing through common gateway interfaces.

  • Generate your Awk on Rails application code--using awk! A sophisticated script-based front-end called wreak takes care of it for you.

  • You get unlimited flexibility to customize the base application scripts, using your choice of development environment: vi or emacs.

  • SQL? We got NoSQL! We don't need no stinking SQL! Tired of being confused by relational databases? Manage your data in an "X-treme" non-relational data store exclusive to Awk on Rails. It's called Hammock, and it's based on the POSIX key-value system NDBM. To initialize your data store, it's as simple as running the command: wreak hammock.

  • Design and render application views using the simple and popular M4 language. We all know we need to keep application design separate and free from logic. Awk on Rails can make sure this happens!

  • Embedded source code documentation is easy using a custom macro package. Create ready-to-typeset manuals with one simple command: nroff -Mawkdoc.

  • Awk on Rails comes with example applications to get you started, including a blogging & content management platform AwkWord, and a syndication provider AWRY.

  • Does it scale? Of course! Thanks to the power of Moore's Law, you'll stay ahead of the curve over the long haul.

  • Development, deployment, and distribution are all powered by a convenient set of three distinct software licenses. No other framework supports this many licenses! Contributing back to the Awk on Rails project? You get to sign and submit a fourth license -- at no charge!
You will soon be able to download source for Awk on Rails and join its development community, at the social source repository SCCSHub.net. As soon as we figure out whether the licenses allow us to distribute our own source code, you may be able to use it in your projects too!

Look for future Awk on Rails developments and announcements in 2010.* Also look for an innovative cloud computing extension to Awk on Rails, called VaporWare.

Awk on Rails: Not Really Rapid, Not Exactly Agile, More Like Dodgy.

* Awk on Rails comes with no guarantee of release dates or timeliness of announcements. Check your calendars.

Wednesday, March 24, 2010

Rendering Trees with Closure Tables

I got a comment from a reader about the Naive Trees section of my presentation SQL Antipatterns Strike Back. I've given this presentation at the MySQL Conference & Expo in the past.

I'd also like to mention that I've developed these ideas into a new book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming. The book is now available in Beta and for pre-order from Pragmatic Bookshelf.

Here's the reader's question:
I would like to ask if there's a way I can dump all the hierarchies in a single query using a closure table? For example I have a following tree:

rootree
- 1stbranch
- midbranch
- corebranch
- leafnodes
- lastbranch
- lastleaf

and I want to display it like:

rootree -> 1stbranch
rootree -> midbranch
rootree -> midbranch -> corebranch
rootree -> midbranch -> corebranch -> leafnodes
rootree -> lastbranch
rootree -> lastbranch -> lastleaf

The Closure Table is a design for representing trees in a relational database by storing all the paths between tree nodes. Using the reader's example, one could define and populate two tables like this:
drop table if exists closure;
drop table if exists nodes;

create table nodes (
node int auto_increment primary key,
label varchar(20) not null
);

insert into nodes (node, label) values
(1, 'rootree'),
(2, '1stbranch'),
(3, 'midbranch'),
(4, 'corebranch'),
(5, 'leafnodes'),
(6, 'lastbranch'),
(7, 'lastleaf');

create table closure (
ancestor int not null,
descendant int not null,
primary key (ancestor, descendant),
foreign key (ancestor) references nodes(node),
foreign key (descendant) references nodes(node)
);

insert into closure (ancestor, descendant) values
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (1,7),
(2,2),
(3,3), (3,4), (3,5),
(4,4), (4,5),
(5,5),
(6,6), (6,7),
(7,7);
What we need to do is find all the descendants of the root node 1, then for each of these descendant nodes, list its ancestors in order, separated by an arrow. We can use MySQL's useful GROUP_CONCAT() function to build this list for us.
select group_concat(n.label order by n.node separator ' -> ') as path
from closure d
join closure a on (a.descendant = d.descendant)
join nodes n on (n.node = a.ancestor)
where d.ancestor = 1 and d.descendant != d.ancestor
group by d.descendant;

Here's the output in the MySQL client. It looks like what the reader asked for:
+-------------------------------------------------+
| path |
+-------------------------------------------------+
| rootree -> 1stbranch |
| rootree -> midbranch |
| rootree -> midbranch -> corebranch |
| rootree -> midbranch -> corebranch -> leafnodes |
| rootree -> lastbranch |
| rootree -> lastbranch -> lastleaf |
+-------------------------------------------------+
I do assume for the purposes of ordering that all of a node's ancestors have a lower node number. You could alternatively use a pathlength column to the closure table and sort by that.

The Closure Table design is nice compared to the Nested Sets (or Preorder Traversal) design, because it supports the use of referential integrity. By using indexes, the EXPLAIN report shows that MySQL query optimizer does a pretty good job on it (I've omitted a few columns for brevity):
+-------+--------+-------------------+--------------------------+
| table | type | ref | Extra |
+-------+--------+-------------------+--------------------------+
| d | range | NULL | Using where; Using index |
| a | ref | test.d.descendant | |
| n | eq_ref | test.a.ancestor | |
+-------+--------+-------------------+--------------------------+

Thursday, February 18, 2010

Speaking on SQL Injection at MySQL Conference

O'Reilly MySQL Conference & Expo 2010

I'm speaking this year at the MySQL Conference & Expo 2010 in Santa Clara. Be sure to get your early registration discount by Feb 22! If you miss that deadline, get 25% off with this discount code: mys10fsp

I'm presenting a talk on SQL Injection Myths and Fallacies. This may seem like a topic that's been done to death, but it's important for all developers to understand it. This reminds me of a story:

My mother volunteers with the League of Women Voters. One of their activities is helping college students register to vote. So every year they set up a table on campus and help young people fill out the forms.
One day one of the women expressed frustration: "We've been doing this for ten years! When are these students going to learn how to register to vote for themselves?!"
The rest of the group looked at her blankly. Finally someone said calmly, "You realize that every year a new class of students becomes eligible to vote, right?
The woman who complained felt suitably embarrassed.

I'm going to cover the basics about SQL Injection, but I'll also show how much of the advice about SQL Injection (even advice from noted security experts) misses the whole picture. I'll also give some new techniques for remedies, that I seldom see in books or blogs. Come on by!