Factory classmethods in Python

I haven’t seen a great deal of practical documentation about using classmethods as factories in Python (which is arguably the most important use of a classmethod, IMO). This post hopes to fill in that gap.

Simple Example of classmethod Factory

This is not to hard to find a good example of, but here is a simple example of a class method being used for a generator. Let’s say you have the following vector class, and you want to be able to make a new vector using Vector(x,y,z) or Vector.from_cyl(ρ,θ,z) (as usual, I’m exploiting Python 3’s unicode variable names, change these to plain text if you are using Python 2). Here is how we would do that, with a repr added to make the display easier to see:

from math import sin, cos, pi as π

class Vector(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    
    @classmethod
    def from_cyl(cls, ρ, θ, z):
        return cls(ρ*cos(θ), ρ*sin(θ), z)
    
    def __repr__(self):
        return "<{0.__class__.__name__}({0.x}, {0.y}, {0.z})>".format(self)
Vector(1,2,3)
Vector.from_cyl(1,π/4,1)

A few take away points:

  • Class methods take the class as the first argument, traditionally cls.
  • Class methods that are factories call cls(), and return the new instance they created.
  • Both __new__ and __init__ ran when cls() was called.

Avoiding __init__

This is fine, but there are two more features that we often need. First, we might not want to call the __init__ function at all; this is fine in the above example, but we might not happen to be able to make the transformation to the original parameters. If we can’t do that, it is actually possible in Python to create a new instance of a class without calling __init__ using:

cls.__new__(cls, ...)

where the ellipsis are the parameters for the new function (often this is simply cls.__new__(cls)).

Take the following example, where we’ve reversed the class, so that cylindrical is now default:

from math import sin, cos, pi as π

class Vector(object):
    def __init__(self, ρ, θ, z):
        self.x = ρ*cos(θ)
        self.y = ρ*sin(θ)
        self.z = z
    
    @classmethod
    def from_xyz(cls, x, y, z):
        ob = cls.__new__(cls)
        ob.x = x
        ob.y = y
        ob.z = z
        return ob
    
    def __repr__(self):
        return "<{0.__class__.__name__}({0.x}, {0.y}, {0.z})>".format(self)
Vector(1,π/4,1)
Vector.from_xyz(1,2,3)

Though it should be obvious now, I’ll point out that each class method using this technique is responible for doing anything that the __init__ function normally does. In this case, self.x, .y, and .z all must be manually set. If we forgot to set self.z, for example, bad things might happen when we call other methods.

Auto generator

__init__ Method

This is something I do a lot that is still is difficult with above trick. Let’s say you want to have an auto-selecting factory function. For example, if you have .from_csv(filename) and .from_excel(filename) functions, you might want to make a .from_any function that bases it’s choise of loading function on the extenstion of filename. This is easy to do with normal factory functions unless you decide that you’d like your __init__ function to act like .from_any. Then, you’ll need to write two functions for each factory function (this only uses one from_ method pair for clarity):

class FromFile(object):
    def __init__(self, filename):
        if filename[-3:] == 'csv':
            self._from_csv(filename)
        else:
            self.file = 'Not a valid format'
        
    @classmethod
    def from_csv(cls, filename):
        self = cls.__new__(cls)
        self._from_csv(filename)
        return self
        
    def _from_csv(self, filename):
        self.file = 'I got {0} from csv!'.format(filename)
        
    def __repr__(self):
        return "<{0.__class__.__name__}({0.file})>".format(self)
FromFile('this.file')
FromFile("this.csv")
FromFile.from_csv("this.is")

__new__ Method

One other way we could do this would be to change __new__ itself. This is technically what we want anyway; __new__ is where new instances of a class get created.

The downside to this method is that we are limited in our use of __init__ with arguments (if you have an __init__, you need to accept everything that __new__ accepts). __init__ will run only when we call the class normally, so it may or may not run when a factory function is called, forcing you to put any common init code in a seperate function, and then manually calling it from each factory function, and __new__ too, if necessary.

class FromFile(object):
    def __new__(cls, filename):
        if filename is None:
            return super(FromFile,cls).__new__(cls)
        elif filename[-3:] == 'csv':
            return cls.from_csv(filename)
        else:
            self = super(FromFile,cls).__new__(cls)
            self.file = 'Not a valid format'
            self._common_init_code()
            return self
    
    def __init__(self, *args, **kargs):
        print("This was called directly with", *args)
    
    def _common_init_code(self):
        print("All correct creations of this class will print this line!")
    
    @classmethod
    def from_csv(cls, filename):
        self = super(FromFile,cls).__new__(cls)
        self.file = 'I got {0} from csv!'.format(filename)
        self._common_init_code()
        return self
        
    def __repr__(self):
        return "<{0.__class__.__name__}({0.file})>".format(self)
FromFile('this.file')
All correct creations of this class will print this line! This was called directly with this.file
FromFile("this.csv")
All correct creations of this class will print this line! This was called directly with this.csv
FromFile.from_csv("this.is")
All correct creations of this class will print this line!

There are other ways to generate objects of certain classes; subclassing is a valid method, or using a factory function, or even metaclasses. (For metaclasses, this article is hard to beat.) Several of these methods cause the type of the object not match the object used to create it (like numpy.array() is a numpy.ndarray), but still is commonly used.

comments powered by Disqus