Tutorial on using Python's PEAK to adapt protocols

If you're trying to deal with metaclasses, or are stuck with asynchronous programming in Twisted, or are looking at object-oriented programming that exhausts you with its use of multiple assignments, you're doing it all wrong! PEAK combines some elements of all of these into a component programming framework. Like Twisted, PEAK's documentation -- which is as voluminous as possible -- is difficult to read. But nonetheless, there's something very noteworthy about this project led by Python leader Phillip J. Eby; and, I think, there's an opportunity for very productive and exceptionally high level application development.

The PEAK package consists of many sub-packages for different purposes. Some important subpackages are , , , and . Most of those names are self-explanatory. Subpackages are used for flexible connectivity between components; allow you to store "lazily immutable" data related to declarative application programming; allow you to create globally unique identifiers for (networked) resources; allow you to manage databases and persistent content, as the name suggests. lets you create globally unique identifiers for (networked) resources; and, as the name suggests, lets you manage databases and persistent content.

For the purposes of this article, however, we will focus on . In particular, the PyProtocols package, which is available separately and provides an infrastructure for other PEAK subpackages. A version of the PyProtocols package is included in . But for now, I'm interested in looking at a standalone protocols package. I'll return to the topic of other parts of PEAK in later sections.
What's the deal?

In the abstract, a protocol is just a set of behaviors that objects agree to follow. Strongly-typed programming languages -- including Python -- have a collection of basic types, each of which has a guaranteed set of behaviors: Integer knows how to find its own product; list knows how to iterate over its contents; dictionary knows how to find a value based on a keyword; file knows how to read and write bytes; and so on. file knows how to read and write bytes; and so on. The set of behaviors you can expect from built-in types constitutes a protocol for their implementation. An object that systematizes a protocol is called an interface.

For standard types, it's not too difficult to list all the behaviors implemented (although this varies slightly from one version of Python to another; or, of course, from one programming language to another). However, at the boundary - for objects belonging to custom classes - it is difficult to declare what ultimately constitutes "class-dictionary" or "class-file" behavior. In most cases, a custom object that implements only a subset -- or even a rather small subset -- of, say, the built-in methods of the dict type, is sufficiently "class-dictionary" for the purposes at hand. However, it would be fascinating to be able to explicitly organize what an object needs to be able to do in a function, module, class, or framework. That's what the PyProtocols package does (in part).

In programming languages with static type declarations, in order to use data in a new context, you usually need to cast or convert it from one type to another. In other languages, conversions are done implicitly, depending on the context, and these are called coercions; Python has both coercive conversions and coercive coercions, with the former usually being used more often ("explicit is better than implicit"). You can add a floating-point number to an integer and get a more generalized floating-point number, but if you want to convert the string "3.14" to a number, you need to use the explicit constructor float("3.14").

PyProtocols have a feature called "adaptation", which is similar to the unorthodox computer science concept of "partial typing". Adaptation may also be thought of as "accelerated forced isotyping". If an interface defines a set of required capabilities (i.e., object methods), then the object that is going to do "everything that is required" requires adaptation -- implemented via the () function -- to provide the required capabilities. Obviously, if you have an explicit conversion function that converts an object of type X to an object of type Y (where Y implements some IY interface), then that function needs to be able to adapt X to protocol IY. However, adaptation in PyProtocols can do much more than that. For example, even if you've never explicitly written a program to convert from type X to type Y, adapt() can often deduce a way to make X provide the capabilities required by IY (that is, to find the intermediate conversions from X to interface IZ, from IZ to IW, and then from IW to IY).

Declare interfaces and adapters

There are a number of different ways to create interfaces and adapters in PyProtocols, and the PyProtocols documentation describes these techniques in great detail -- many of which will not be covered in this article. We'll get into some of the details next, but I thought it would be useful to give a minimalist example of actual PyProtocols code here.

For example, I decided to create a class-Lisp serialization of Python objects. The description is not in exact Lisp syntax, and I don't really care about the exact pros and cons of the format. The idea here was simply to create a function that could perform work similar to the repr() function or the pprint module, though the result would be both significantly different from previous serializers and more easily extendable/customizable. A very un-Lisp-like choice was made for illustrative purposes: mappings are a much more basic data structure than lists (Python's tuples or lists are treated as mappings keyed on consecutive integers). Here is the code:

PyProtocol Definitions

from protocols import *
from cStringIO import StringIO
# Like unicode, & even support objects that don't explicitly support ILisp
ILisp = protocolForType(unicode, ['__repr__'], implicit=True)
# Class for interface, but no methods specifically required
class ISeq(Interface): pass
# Class for interface, extremely simple mapping interface
class IMap(Interface):
  def items():
    "A requirement for a map is to have an .items() method"
# Define function to create an Lisp like representation of a mapping
def map2Lisp(map_, prot):
  out = StringIO()
  for k,v in map_.items():
    ("(%s %s) " % (adapt(k,prot), adapt(v,prot)))
  return "(MAP %s)" % ()
# Use this func to convert an IMap-supporting obj to ILisp-supporting obj
declareAdapter(map2Lisp, provides=[ILisp], forProtocols=[IMap])
# Note that a dict implements an IMap interface with no conversion needed
declareAdapter(NO_ADAPTER_NEEDED, provides=[IMap], forTypes=[dict])
# Define and use func to adapt an InstanceType obj to the ILisp interface
from types import InstanceType
def inst2Lisp(o, p):
  return "(CLASS '(%s) %s)" % (o.__class__.__name__, adapt(o.__dict__,p))
declareAdapter(inst2Lisp, provides=[ILisp], forTypes=[InstanceType])
# Define a class to adapt an ISeq-supporting obj to an IMap-supporting obj
class SeqAsMap(object):
  advise(instancesProvide=[IMap],
      asAdapterForProtocols=[ISeq] )
  def __init__(self, seq, prot):
     = seq
     = prot
  def items(self):  # Implement the IMap required .items() method
    return enumerate()
# Note that list, tuple implement an ISeq interface w/o conversion needed
declareAdapter(NO_ADAPTER_NEEDED, provides=[ISeq], forTypes=[list, tuple])
# Define a lambda func to adapt str, unicode to ILisp interface
declareAdapter(lambda s,p: "'(%s)" % s,
        provides=[ILisp], forTypes=[str,unicode])
# Define a class to adapt several numeric types to ILisp interface
# Return a string (ILisp-supporting) directly from instance constructor
class NumberAsLisp(object):
  advise(instancesProvide=[ILisp],
      asAdapterForTypes=[long, float, complex, bool] )
  def __new__(klass, val, proto):
    return "(%s %s)" % (val.__class__.__name__.upper(), val)

In the code above, I've declared a number of adapters in a number of different ways. In some cases, the code converts one interface to another; in other cases, the type itself adapts directly to another interface. I hope you'll notice a few things about the code: (1) it doesn't create any adapters from lists or tuples to the ILisp interface; (2) it doesn't explicitly declare adapters for the int numeric type; and (3) it doesn't declare adapters directly from dict to ILisp for that matter. Here's how the code will adapt() various Python objects:

test_lispy.py Object Serialization

from lispy import *
from sys import stdout, stderr
toLisp = lambda o: adapt(o, ILisp)
class Foo:
  def __init__(self):
    , ,  = 'a','b','c'
tests = [
 "foo bar",
 {17:2, 33:4, 'biz':'baz'},
 ["bar", ('f','o','o')],
 1.23,
 (1L, 2, 3, 4+4j),
 Foo(),
 True,
]
for test in tests:
  (toLisp(test)+'\n')

At runtime, we get:

test_lispy.py Serialization results

$ python2.3 test_lispy.py
'(foo bar)
(MAP (17 2) ('(biz) '(baz)) (33 4) )
(MAP (0 '(bar)) (1 (MAP (0 '(f)) (1 '(o)) (2 '(o)) )) )
(FLOAT 1.23)
(MAP (0 (LONG 1)) (1 2) (2 3) (3 (COMPLEX (4+4j))) )
(CLASS '(Foo) (MAP ('(a) '(a)) ('(c) '(c)) ('(b) '(b)) ))
(BOOL True)

Some explanation of our output would be helpful. The first line is relatively simple; we define an adapter directly from a string to ILisp, and the call to adapt("foo bar", ILisp) just returns the result of the lambda function. The next line is just a little more complicated. There's no direct adapter from dict to ILisp; but we don't have to use any adapters to get dict to adapt to IMap (we declared enough), and we do have adapters from IMap to ILisp. Similarly, for the lists and tuples that follow, we can adapt ILisp to ISeq, adapt ISeq to IMap, and adapt IMap to ILisp. PyProtocols will point out the adaptation paths to take, and all of this incredible process is done behind the scenes. An instance of the old style goes through the same process as a string or IMap-enabled object, and we have a direct adaptation to ILisp.

But wait a minute. What about all the integers used in our dict and tuple objects? Numbers of type long, complex, float, and bool have explicit adapters, but int has none. The trick here is that the int object already has a . __repr__() method; by declaring the implicit support as part of the ILisp interface, we can cleverly use the object's existing . __repr__() method as support for the ILisp interface. In fact, as a built-in type, integers are represented as unadorned Arabic numerals without the use of uppercase type initializers (such as LONG).

Adaptation Protocol

Let's look more explicitly at what the () function does. In our example, we're using the declaration API to implicitly set up a set of factories for the adaptation. There are several levels to this API. The "primitives" of the declaration API are the functions: declareAdaptorForType(), declareAdaptorForObject(), and declareAdaptorForProtocol(). These were not used in the previous example, but rather some higher-level APIs such as declareImplementation(), declareAdaptor(), adviceObject(), and protocolForType(). In one case, we see the "marvelous" advise() declaration in a class body. The advise() function supports a large number of keyword arguments that configure the purpose and role of those advised classes. You can also advise() a module object.

You don't need to use the declarative API to create adaptable objects or interfaces that know how to adapt() themselves. Let's look at the markup of a call to adapt() and then explain what happens afterward. A call to adapt() looks like this:

The call token for adapt()

adapt(component, protocol, [, default [, factory]])

This means that you want the component to adapt to the interface protocol. If default is specified, it can be returned as a wrapper object or a modification of the component. If factory is specified as a keyword argument, then a transformation factory is used to generate the wrapper or modification. But let's back up a bit and look at the full sequence of actions (simplified code) that adapt() attempts:

A hypothetical implementation of adapt()

if isinstance(component, protocol):
  return component
elif hasattr(component,'__conform__'):
  return component.__conform__(protocol)
elif hasattr(protocol,'__adapt__'):
  return protocol.__adapt__(component)
elif default is not None:
  return default
elif factory is not None:
  return factory(component, protocol)
else:
  NotImplementedError

Calls to adapt() should maintain some characteristics (though this is a suggestion to programmers, not a general mandate from the library). Calls to adapt() should be idempotent. That is, for an object x and a protocol P, we want: adapt(x,P)==adapt(adapt(x,P),P). At a high level, this serves a similar purpose to returning a value from the . __iter__() method to return the iterator class of self. You basically don't want to have to re-adapt to the same type you've already adapted to produce fluctuating results.

It is also worth noting that adaptation can be lossy. In order for an object to conform to an interface, it may be inconvenient or impossible to maintain all the information needed to reinitialize that object. That is, typically, for object x and protocols P1 and P2: adapt(x,P1)! =adapt(adapt(adapt(x,P1),P2),P1).

Before we conclude, let's look at another test script that takes advantage of the low-level behavior of adapt():

test_lispy2.py Object serialization

from lispy import *
class Bar(object):
  pass
class Baz(Bar):
  def __repr__(self):
    return "Represent a "+self.__class__.__name__+" object!"
class Bat(Baz):
  def __conform__(self, prot):
    return "Adapt "+self.__class__.__name__+" to "+repr(prot)+"!"
print adapt(Bar(), ILisp)
print adapt(Baz(), ILisp)
print adapt(Bat(), ILisp)
print adapt(adapt(Bat(), ILisp), ILisp)
$ python2.3 test_lispy2.py
<__main__.Bar object at 0x65250>
Represent a Baz object!
Adapt Bat to WeakSubset(<type 'unicode'>,('__repr__',))!
'(Adapt Bat to WeakSubset(<type 'unicode'>,('__repr__',))!)

The results prove that the design does not satisfy the objective of equal powers. It might be a good exercise to improve this design. However, descriptions like ILisp will certainly deplete the information in the original object (which is okay).

concluding remarks

It feels like PyProtocols has a few things in common with the other "exotic" topics mentioned in this column. First, the declarative API is declarative (as opposed to interpretive). Instead of giving the steps and switches needed to perform an action, declarative programming declares what to do, and it's up to the library or compiler to specify how to do it. The names "declare*()" and "advice*()" are derived from this idea.

However, I've also found that PyProtocols programming is somewhat analogous to programming with multiple assignments, specifically using the module I mentioned in another installment. In contrast to PyProtocols' determination of adaptation paths, my own module performs a relatively simple derivation to determine the relevant ancestor classes to be distributed. Both libraries, however, tend to encourage a similar idea of modularity in programming - a large number of small functions or classes that perform "pluggable" tasks, without being bogged down in a rigid class hierarchy. In my opinion, this style has its advantages.