Iterator design pattern in php (revisited)

In this post I'll try to sketch a number of related ideas about working with sequential collections, all of them expressed in php. Enumerations, enumerables, interators, iteratees and their cousins have been tortured to death, but as Eric Meijer and his team working on LINQ and Rx frameworks for .net show, it pays off doing it systematically. I'll revisit the Iterator and Subject Observer design patterns and will derive them systematically from first principles.

In php iterators come handy for all kinds of work on sequences - usually using

foreach( $sequence in $key,$value) { ... }

The problem with foreach is that it is not composable, but there are a lot of functions and operations on iterators which are, at least in theory. I will try to show how to achieve composability by, first, composing the processing of each individual step and, later, by composing iterators. This can lead to easily reconfigurable applications and a more declarative style of programming.

Streams

Let's start with streams as the general idea for sequential collections. A stream is a possibly infinite sequence, which can be traversed step by step, some steps could be skipped. This idea is borrowed from haskell stream fusion.

A stream is a sequence of steps, which implement the step protocol, i.e. they are in one of the three states

  • Yield - yield a value
  • Skip - skip this step
  • Done - no more processing, end of sequence

It is a sum type, or, if you come from the C tradition, a union type. The states will be distinguished using a tag. For generality, the state of the stream (the rest of the stream itself) will be threaded through a step via a stream variable. Note that the stream is only passed through the steps and not touched by any step processor at all. The code below uses magic methods to add flexibility. The use of the renew method is not efficient, but it is simpler, more readable, and suffices for the purposes of this blog post.

<?php

interface iStep 
{
   function __construct($tag = Done, $val = Null, $stream = Null);
   function tag();
   function value();
   function stream();
}

//a class implementing the Step protocol
class Step implements iStep
{
   private $tag, $val, $stream;
   function __construct($tag = Done, $val = Null, $stream = Null)
   {
      $this->renew( $tag, $val, $stream );
   }

   //all the getters
   function tag() { return $this->tag; }
   function value() { return $this->val; }
   function stream() { return $this->stream; }


   //magic hooks for admin purposes

   //an array of callback methods
   private $callbacks = array();

   //add or replace an existing callback
   function overload( $name, $fun ) {
       $this->callbacks[$name]=$fun;
       return $this;
   }
   
   //the magic __call method
   function __call($name, $args) {
       $args = array_merge(array($this),$args);
       return call_user_func_array($this->callbacks[$name], $args);
   }
   
   //renew is used to overwrite in place the step values
   //instead of creating a new object, preserves the current
   //callbacks
   function renew($tag = Done, $val = Null, $stream = Null) {
        //dispatch on tag
     switch($tag) {
         case Done:  $this->tag = Done; break;
         case Yield: if($val != Null && $stream != Null) 
                {
                    $this->tag   = Yield;
                    $this->val  = $val;
                    $this->stream = $stream;
                } else {
                    throw new Exception("Step Construction Error, missing a required argument to Yield");
                }
                break;
         case Skip:  if($val != Null) 
                {
                    $this->tag   = Yield;
                    $this->stream  = $stream;
                } else {
                    throw new Exception("Step Construction Error, missing a required argument to Skip");
                }
                break;
         default: 
                    throw new Exception("Step Construction Error, invalid type tag");
         }
     return $this;
   }

}

The above is an example definition of the Step protocol and a class implementing it. In truth the iStep interface is redundant, since all implementations should have the same functionality. Of course, the different steps types (yeild, skip, done) can be implemented as classes, but that would be overkill, it is verbose enough as it is. Once created a step cannot be changed, which explains the private label on the variables and the missing setters.

You probably wonder - Why bother with this complex machinery? Let's see an example of pipelining several step processing functions:


//apply a function $f to the value of a step returning a new step,
//with the same stream state
function map($step, $f) {
     //get the next step
     switch( $step->tag() ){
         //unfortunately before php5.3 there are no lambdas, 
         //so we can't postpone the creation of the next step
         //therefore we do it here. 
         //Should watch out for interactions with exceptions
         case Done:  
              return $step->renew(Done);
         case Yield: 
              return $step->renew(Yield, $f($step->value()), $step->stream());
         case Skip:  
              return $step->renew(Skip, $step->stream());
         default:    
              throw new Exception("a value of wrong step type passed to map ".var_export($step, true) );
     }
}

//filter a step - if the value of a step does not satisfy 
//the predicate $cond, skip further processing of this step
function filter($step, $cond) {
     switch( $step->tag() ){
         case Done:  return $step->renew(Done);
         case Yield: 
                if($cond($step->value) ) 
                { 
                     return $step->renew(Yield, $step->value, $step->stream); 
                } else { 
                     return $step->renew(Skip, $step->stream);
                }
         case Skip:  return $step->renew(Skip, $step->stream);
     }
}

$step = new Step();

//id - return any value given
function id($i) { return $i;}
//the always successful predicate 
function t( $v ) { return true; }

$step = $step->overload("map", "map")->overload("filter","filter")
$result=$step->map("id")->filter("t");

You probably notice the method chaining in the above example. This is a nice and quite intuitive way to represent a processing pipeline. One disadvantage of the above step processing code is that the processing pipeline cannot be short circuited. In my opinion these cons are outweighed by far by the benefits of the flexibility offered. If short circuiting is required, it can be signalled by exceptions, but that is too much to go into for this blog post.

To summarise - the above code implements a single step (state) of stream processing. A step pipeline, is a composition of simple step processors. Step processors don't touch the stream state.

What about the stream type? It should encapsulate the iterative stream processing. Its responsibilities include getting a new step from the stream (using the threaded stream state) and resource management - allocating and deallocating resources for the stream (for example file handles, sockets, etc...)

interface iStream {
   function next();
   function current();
   function done();
}

The next() function moves the stream into the next state. It could be a next database row, next file record, etc... The function current() returns the current Step, as defined above. done() is a predicate, returning true at end of stream.

Iterators

Unsurprisingly, the iStream interface is very similar to the Iterator interfaces as found in SPL. They implement, albeit slightly differently, the Iterator design pattern. In order to use a stream iterator, in places where the Iterator interfaces is needed, a small wrapper is required (translating Iterator::next -> iStream::next, Iterator::rewind -> nop...).

The advantage of the stream processing interface over straight SPL Iterators is composability and code reuse. One can prepare a number of small stream processing functions similar to map and filter and pipeline them to get complex processors. Using a converter it can be driven by a foreach and cousins.

In follow ups, I'll expand on the use of iterators, observers, observables, callbacks and aspect oriented programming.

Powered by Drupal, an open source content management system