Defining, Construct & Init for Classes

defining classes: constructors and init

python vs kotlin Syntax

# syntax is:
class ():
    def __init__(self, ):
        # put init code here
#now example 1
class Fred:
    def __init__(self, var1, var2):
       self.var1 = var1
       self.var2 = var3
       self.container = []
#and example 2
class Fred(BaseClass):
    def __init__(self, var1, var2):
       super().__init__(var1)
       self.var1 = var1
       self.var2 = var3
       self.container = []
           # rest of init here

Hopefully the above python code is self explanatory. Example 2 adds a base class.  I do not deal with multiple inheritance at this time, and will devote a specific page at some future time as for python it gets complex, and for kotlin it requires additional concepts.  Now here is the kotlin ‘imitate python’ equivalent to example 1.

//syntax is:
//
 class (): 
//
// example 1a:bad version- to be python like - not best kotlin
class Fred{
    val var1: Int
    val var2: String
    val container: MutableList

    constructor(var1: Int, var2:String){
        this.var1 = var1
        this.var2 = var2
        this.container = mutableListOf()
    }
}
// example 1b: still bad version- to be python like again- not best kotlin
//  move 'constructor' to class definition, now contructor body is 'init'
class Fred constructor(var1: Int, var2:String){
    val var1: Int
    val var2: String
    val container: MutableList

    init{
        this.var1 = var1
        this.var2 = var2
        this.container = mutableListOf()
    }
}

// example 1c:better - 'constructor' keyword omitted and defines variables
// in the primary constructor
class Fred(val var1: Int, val var2: String){
    val container: MutableList

    init{
        container = mutableListOf()
    }
}

First the ‘not the best kotlin way’ examples. Is constructor or init the best equivalent python init? The first example keeps the class definition more similar to python, and uses a constructor to perform the role of the python init. In kotlin, a class can have multiple different constructors to enable constructing an object from different types. So there could be another constructor accepting String in place of Int for the parameters.

But where the python code simply assigns to self.var1 without first declaring var1, in kotlin all variables must be declared, so example aside from the declarations of the three instance variables (var1, var2 and container) and constructor in place of init, 1a above looks almost directly like the python version. However, in this form, there is more code than the python version.

Version 1b above moves the constructor(var1: Int, var2:String to the class declaration. Doing this makes this the default constructor for the class, but the body of the constructor method cannot be on this line declaring the class so the body of the constructor is now called init, and the class declaration reads: class Fred constructor(var1: Int, var2:String).
init is the special reserved word to identify the block of code which is the body of the default constructor.

So example 1b is very similar to example 1a, but introduces the concept of a default constructor and the init block.

Example 1c introduces some improvement. A common pattern is that values to the constructor (init) as saved as instance variables. Simply adding var or val in the constructor means the parameter is the declaration, plus this result in the constructor automatically saving the values passed in. So we lose 4 lines of code as unnecessary (two declarations, plus 2 assignments). We can also omit the word ‘constructor’ for the primary constructor except for some rare special cases. So now the code is almost as brief as the python code. But there are still optimisations to come.

// example 1d:best
class Fred(val var1: Int, val var2: String){
    val container = mutableListOf()
}
// example 2
class Fred(val var1: Int, val var2: String):BaseClass(Var1){
    val container = mutableListOf()
}

So for example 1d, the code is now more concise than python, despite the declaration of variables. Yes, container does now look like a python class variable, but this is how instance variables are in kotlin. So the code is brief with types, and perhaps more so than the code without types. This is because the remaining code in the kotlin constructor prior to this step, initialisation of container, can happen at the declaration of container. So no Normally, all code needed in a constructor is setting initial values, so normally no init block is needed as initial values at the definition, either automatically in the case of default constructor parameters, or at the definition in the main block of other instance variables.

Example 2 covers a class with a base class, just for completeness, to have the syntax covered.

Advertisements

Background & History of OO Languages

Background

Python, Kotlin or Java, which best supports Object Oriented Programming (OOP)?

History.

Simula 67 (in 1967) was probably the first Object Oriented Language.  It was a great language, but wow was it slow!  And slow even on the massive mainframe computers it lived on.

The origin of C++ was to add the power of Object Oriented Programming to the very efficient language C, while still delivering real performance. In first reference book on C++ was released in 1985 (tired of waiting for everyone to have access to the internet?) at a time when micro-computers were popular, but far slower than even old mainframe computers.  The result was a language with great control over how objects were created, but in reality necessarily severely compromised by the features to deliver performance.

Java began life as Oak in 1991, but was only used by the original team until 1995 and then had a very short gestation to reaching 1.0 in 1996.  Java delivered greater ease of Object Oriented Programming than C++, and with good performance even with the added layer of a virtual machine.  While the language is hardly truly Object Oriented throughout,  it struck winning formulae with a balance of Object Oriented and performance.

Python had origins also in 1991, but did not really get public exposure until python 2.0 in the year 2000, and did not get full object orientation until ‘new style classes’ were introduced at the end of 2001. Computing power had moved a long way between 1996 and 2001 (consider the 150Mhz 1995 Pentium pro vs the 1.3Ghz 2000 Pentium 4),  and python was targeting easy of programming over ultimate performance anyway.  As you can guess as a result python was able to be Object Oriented throughout instead of just having an outer veneer of being Object Oriented.  But yes, still slower than Java.

Of course for Kotlin, designed in 2011 and having version 1.0 released in 2016, having Object Oriented structure throughout and adding features for functional programming as well was not even really a challenge. But slower than Java? Sometimes, but a 2016 technology compiler that ran run in a 2016 computer with gigabytes of memory available (windows 3.1, the common version at the time of Java release, required 1MB of Ram!) can optimise and produce almost the same code.  Oh yes, the compiler is slower than a Java compiler can be, but is it a problem?

OOP Myths.

But Java has created some OOP myths with almost a generation of programmers learning OOP on Java.

“This program cannot be True OOP, where are the getters and setters!”

That encapsulation requires getters and setters is actually true.  That you have to write your own getters and setters just to access and store data is not true.  Both python and kotlin automatically provide getters and setters without the programmer noticing unless the program requires non-standard behaviour from the getters and setters. Java requires the program writes their own getters and setters in order to allow breaking the rules of OO for performance reasons.

True OOP requires and functions and data to be inside a class so every thing is an object!

True OOP does have everything as an object.  Functions, all data types, everything.  For performance reasons Java broke this rule.  Understandable, but breaking the rules of OOP does not make the language more OO.  Functions should be ‘first class functions’ and in java they were not, for performance reasons.  Putting the non-OOP functions inside an object at least gave them an Object Wrapper, and the same applies for Java primitive data types.  Modern OO languages do not need to wrap functions or data in a class because functions and data are always already objects.

Python dataclasses: A revolution

Python data classes are a new feature in Python 3.7, which is currently in Beta 4 and scheduled for final release in June 2018.  However, a simple pip install brings a backport to Python 3.6.  The name dataclass sounds like this feature is specifically for classes with data and no methods, but in reality, just as with the Kotlin dataclass, the Python dataclass is for all classes and more.

I suggest that the introduction of the dataclass will transform the Python language, and in fact signal a more significant change than the move from Python 2 to Python 3.

The Problems Addressed By Dataclasses

There are two negatives with Python 3.6 classes.

‘hidden’ Class Instance Variables

The style rules for Python suggest that all instance variables should be initialised (assigned to some value) in the class __init__() method.  This at least allows scanning __init__ to reverse engineer a list of class instance variables. Surely an explicit declaration of the instance variables is preferable?

A tool such as PyCharm scans the class __init__() method to find all assignments, and then any reference to an object variable that was not found in the __init__() method is flagged as an error.

However, having to discover the instance variables for a class by scanning for assignments in the __init__() method is a poor substitute for scanning a more explicit declaration.  The body of the __init__() method essentially becomes part of the class statement.

The dataclass provides for much cleaner class declarations.

Class Overhead for Simple Classes

Creating basic classes in Python is too tedious. One solution is for programmers to use dictionaries as classes – but this is bad programming practice. Other solutions include namedtuples, the Struct class from the ObjDict package or the attrs package. With this number of different solutions, it is clear that people are looking for a solution.

The dataclass provides a cleaner, and arguably more powerful syntax than any of those alternatives, and provides the stated Python goal of one single clear solution.

Dataclass Syntax Example

Below is an arbitrary class to implement an XY coordinate and provide addition and a __repr__ to allow simple printing.

class XYPoint:

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        new = XYPoint(self.x, self.y)
        new.x += other.x
        new.y += other.y
        return new

    def __repr__(self):
        printf(f"XYPoint(x={self.x},y={self.y})")

Now the same functionality using a data class:

@dataclass
class XYPoint:
        x:float
        y:float

    def __add__(self, other):
        new = XYPoint(self.x, self. y)
        new.x += other.x
        new.y += other.y

The dataclass is automatically provided with an __init__() method and a __repr__() method.

The class declaration now has the instance variables declared at the top of the class more explicitly.

The type of ‘x’ and ‘y’ are declared as float above, although to exactly match the previous example, they should be of type Any but float may be more precise, and more clearly illustrates that annotation is usually a type.

The dataclass merely requires the class variables to be annotated using variable annotation.  ‘Any‘ provides a type completely open to any type, as is traditional with Python.  In fact, the type is currently just for documentation and is not type checked, so you could state ‘str‘ and supply an ‘int‘, and no warning or error is raised.

As you can see from the example, dataclass does not mean the class implemented is a data only class, but rather the class contains some data which almost all classes do. There are code savings for simple classes, mostly around the __init__ and __repr__ and other simple operations with the data, but the cleaner declaration syntax could be considered the main benefit and is useful for any class.

When to Use Dataclasses

The Candidates

The most significant candidates are any Python class and any dictionary used in place of a class.

Other examples are namedtuples and Struct classes from the ObjDict package.

Code using the attrs package can migrate to the more straightforward dataclass which has improved syntax at the expense of losing attrs compatibility with older Python versions.

Performance Considerations

There is no significant change to performance by using a dataclass in place of a regular Python class. A namedtuple could be slightly faster for some uses of immutable, ‘method free’ objects, but for all practical purposes, a dataclass introduces no performance overhead, and although there may be a reduction in code, this is insignificant. This video shows the results of actual performance tests.

Compatibility Limitations?

The primary compatibility constraint is that dataclasses require Python 3.6 or higher. With Python 3.6 being released in 2016, most potential deployments are well supported, leaving the main restriction as, not being available in Python 2.

The only other compatibility limitation applies to classes with existing type annotated class variables.

Any class which can limit support to Python 3.6+, and does not have type annotated class variables, can add the dataclass decorator without compatibility problems.

Just adding the dataclass decorator does not break anything, but without then adding data fields also, it does not bring significant new functionality either. ?? But the compatibility means adding data fields can be incremental as desired, with no step to ensure compatibility. ??

New capabilities do not guarantee delivery without code to make use of those capabilities. Unless the class is part of a library merely getting a spec upgrade, conversion to dataclasses makes most sense either when refactoring for readability or when code makes use of one or more functions made available by converting to a dataclass.

The ‘free’ Functionality

In addition to the clean syntax, the features provided automatically to dataclasses are:

  • Class methods generated automatically if not already defined
    • __init__ method code to save parameters
    • __repr__ to allow quick display of class data, e.g. for informative debugging
    • __eq__ and other comparison methods
    • __hash__ allowing a class to function as a dictionary key
  • Helper functions (see PEP for details)
    • fields() returns a tuple of the fields of the dataclass
    • asdict() returns a dictionary of the class data fields
    • astuple() returns a tuple of the dataclass fields
    • make_dataclass() as a factory method
    • replace() to generate a modified clone of a dataclass
    • is_dataclass
  • New Standardized Metadata
    • more information in a standard form for new methods and classes

Dataclass Full Syntax & Implementation

How Dataclasses Work

dataclass is based on the dataclass decorator. This decorator inspects the class, generates relevant metadata, then adds the required methods to the class.

The first step is to scan the class __annotations__ data. The __annotations__ data has an entry for each class level variable provided with an annotation.  Since variable annotations only appeared in Python 3.6, and class level variables are not common, there is no significant amount of legacy code with annotated class level variables.

This list is scanned for actual values of these class level variables which are of the type field. Values of type field can contain additional data for building the metadata which is stored in __dataclass_fields__ and  __dataclass_params__. Once these two metadata dictionaries are built, the standard methods are then added if they are not already present in the class. Note while an __init__() method blocks the very desirable boilerplate removing automatic __init__ method, simply renaming __init__ to __post_init__ allows retaining any code desired in an __init__, and removing the distracting boilerplate.

This process means that any class level variables that are not decorated are ignored by the dataclass decorator and not impacted by the move to a data class.

Converting Class

Consider the previous example, which was very simple. Real classes have default __init__ parameters, instance variables that are not passed to __init__, and code that will not be replaced with the automatic __init__. Here is a slightly more complicated contrived example to cover those complications with a straightforward use case.

This example adds a class level variable, last_serial_no, just to have an example of a working, class level variable, which allows a counter of each instance of the class.

Also added is serial_no which holds a serial number for each instance of the class.  Although it makes more sense to always increment the serial number by 1, an optional __init__ parameter allows incrementing by another value, showing how to deal with __init__ parameters which cannot be processed by the default __init__ method.

class XYPoint:

    last_serial_no = 0

    def __init__(self, x, y=0, skip=1):
        self.x = x
        self.y = 0
        self.serial_no = self.__class__.last_serial_no + skip
        self.__class__.last_serial_no = self.serial_no

    def __add__(self, other):
        new = XYPoint(self.x, self. y)
        new.x += other.x
        new.y += other.y
        return new

    def __repr__(self):
        printf(f"XYPoint(x={self.x},y={self.y})")

Now the same functionality using a dataclass.

from dataclasses import dataclass, field, InitVar

@dataclass
class XYPoint:
    last_serial_no = 0
    x: float
    y: float = 0
    skip: InitVar[int] = 1
    serial_no: int = field(init=False)

    def __post_init__(self, skip):
        self.serial_no = self.last_serial_no + self.skip
        self.__class__.last_serial_no = self.serial_no

    def __add__(self, other):
        new = XYPoint(self.x, self. y)
        new.x += other.x
        new.y += other.y

The class level variable without annotation needs no change. The __init__ parameter that is not also an instance variable has the InitVar type wrapper. This ensures it is passed through to __post_init__  which provides all __init__ logic that is not automatic.

The serial number is an instance variable or field that is not in the init, and to change default settings for a field, just assign a value to the field (which can still include a default value as a parameter to the field).

I think this example covers every realistic use requirement to convert any existing class.

Types and Python

Dataclasses are based on usage of annotations. As noted in annotations, there is no requirement that annotations be types. The reason for providing annotations was primarily driven by the need to allow for third-party type hinting.

Dataclasses do give the first use of annotations (and by implication, potentially types) in the Python standard libraries.

Annotating with None or docstrings is possible. There are many in the Python community adamant that types will never be required, nor become the convention. I do see optional types slowly creeping in though.

Issues and Considerations

It is possible there are some issues with existing classes which use class level variables and instance variables, but none have been found so far, which this leaves this section as mostly ‘to be added’ (check back).

Conclusion

There is a strong case that all classes, as well as namedtuples, and even other data not currently implemented as a class and some other constructs better implemented as classes, should move to dataclasses. For small classes, and all classes start as small classes, there is the advantage of saving some boilerplate code. Reducing boilerplate code makes it easier to maintain, and ultimately more readable.

Ultimately, the main benefit is any class written using a dataclass is more readable and maintainable than without dataclasses. Converting existing classes is as simple as renaming.

Python Class vs Instance Variables

Python class variables offer a key tool for building libraries with DSL type syntax, as well as data shared between all instances of the class. This page explains how Python class variables work and gives some examples of usage.

  • Class & Instance Variables
    • Instances vs the class object
    • Instance Variables
    • Class Variables
  • Using Class and Instance Variables

Class & Instance Variables

Instances vs the class object

In object oriented programming there is the concept of a class, and the definition of that class is the ‘blueprint’ for objects of that class.  The class definition is used to create objects which are instances of the class. In Python, for each class, there is an additional object for the class itself.  This object for the class itself, is an object of the class ‘type’. So if there is a class ‘Foo’ with two instanced objects ‘a’ and ‘b’, created by a = Foo() and b = Foo(), this creates objects a and b of class Foo. In Python, the code declaring the class Foo does the equivalent of, for example, Foo = type(). This Foo object can be manipulated at run time, the object can be inspected to discover things about the class, and the object can even be changed to alter the class itself at run time.

Instance Variables

Consider the following  code  as run in Idle (Python 3.6):


class Foo:
    def __init__(self):
         self.var = 1
>>> a = Foo()
>>> type(a)
'class: __man__'.Foo
>>type(Foo)
'class: type'
>>'var' in a.__dict__
True
>>>var in Foo.__dict__
False
>>> a.var
1
>>> Foo.var
Traceback (most recent call last):
  File "", line 1, in
AttributeError: type object 'Foo' has no attribute 'var'

Note the results of type(a) compared to type(Foo). 'var' appears in the a.__dict__ , but not in the Foo.__dict__ within Foo. Further, a.var gives a value of 1 while Foo.var returns an error.

This is all quite straightforward, and is as would be expected.

Class Variables

Now consider this code as run in  Idle that has a class variable in Idle (Python 3.6):


class Foo:
    var = 1
>>> a = Foo()
>>'var' in a.__dict__
False
>>>'var' in Foo.__dict__
True

>>> Foo.var
1
>>> a.var
1
>>> a.var = 2
>>> a.var
2
>>> Foo.var
1
>>> a.__class__.var
1

All as would be expected, the __dict__ results are reversed, this time the class Foo initially has var, and the instance a does not.

But even though a does not have a var attribute,  a.var returns 1, because when there is no instance variable var, Python will return a class level variable if one is present with the same name.

However, assignment does not fall back to class level variables, so setting a.var = 2 actually creates an instance variable var and reviewing the __dict__ data now reveals  both that the class level object and instance each have var. Once an instance variable is present, the class level variable is hidden from access using a.var which will now access the instance variable. In this way, code can add an instance variable replacing the value of class variable which provides what can be effectively a default value until the instance value is set.

Simple Usage Example

Consider the following Python class:


class XYPoint:

    scale = 2.0
    serial_no = 0

    def __init__(self, x, y):
        self.serial_no = self.serial_no + 1
        self.__class__.serial_no = self.serial_no
        self.x = x
        self.y = y

    def scaled_vector(self):
        vector = (self.x**2 + self.y**2)**.5
        return vector * self.scale

   def set_scale(self, scale):
        self.__class__.scale = scale

    def __repr__(self):
        return f"XYPoint#{self.serial_no}(x={self.x},y={self.y})"

The class encloses an (x,y) coordinate, and has a method scaled_vector() which calculates a vector (using Pythagoras theorem) from that coordinate, and then scales the vector from the class variable scale.

If the class level variable scale is changed, automatically all XYPoint objects will return vectors using the new scale.

Being a class variable, scale can be read as self.scale but must be set as in the set_scale() method by self.__class__.scale = scale.

The other use case illustrated is the instance counter serial_no, which provides each instance of XYPoint a unique, incrementing serial_no.

Python Annotations

Python syntax allows for two flavours of annotations:

Introduction.

Both flavours of annotation work in the same manner.

They build a dictionary called __annotations__ which stores the list of annotations for a function, a module or a class.

It is common practice to annotate with types, such as int or str, but Python language implementation allows any valid expression. Using types can make Python look like other languages which have similar syntax and require types, and is one of the motivations for annotations in Python. Third-party tools may report expressions which are not types as errors, but Python itself currently allows any expression.

The Python dataclass now makes use of annotations.  Outside of this use, if you are not using an external type validator like mypy there seems little incentive to bother with type annotations, but  if you are going to document what type a variable should be, then annotation is the optimum solution.

The following code illustrates variable annotation at class level, module level, and local level:


>>> class Foo:

    class_var1: "Annotation"
    class_var2: "Another" + " annotation" = 3
    class_int: int

    def func(self):
        local1 : int
        local2 : undeclared_var
        self.variable : undeclared * 2 = 7
>>> module_var: Foo = Foo()

>>> module_var2: 3*2 = 7
>>> module_var3: "another" + " " + "one"
>>> Foo.__annotations__
{'class_var1': 'Annotation', 'class_var2': 'Another annotation',
'class_int':  }
>>> __annotations__
{ 'module_var': , 'module_var2': 6, 'module_var3': 'another one'}
>>> f = Foo()
>>> f.func()
>>> f.variable
7

Class Variables: The code annotates 3 identifiers at Foo scope (these identifiers, and the annotations all then appear in Foo.__annotations__. Note that only class_var2 is actually a variable and will appear in a dir() for Foo.  class_var1 and class_int appear in __annotations__ but are not actually created as variables.

Module Variables: Three module_var variables annotated at the module level, and all appear in __annotations__, and again module_var3 does not appear in globals as annotation itself does not actually create the variable, it solely creates the entry in __annotations__.  (module_var and module_var2 are assigned values, so are actual variables).

Local & Instance Variables: The func within the Foo class illustrates two local annotations, one of which uses an undeclared_var. This use of an undeclared identifier would generate an error with either class or module variables, in which case the expression is evaluated for the relevant __annotations__ dictionary. The expressions for local and instance variables annotations are not evaluated. At this stage, I have not found where, or how, the annotation data is stored.

The PEP for variable annotations is available here. Note the stated goal is to enable third party type checking utilities, even though the implementation does not restrict annotations to types. The non-goals are also very interesting.

While the practice of using annotations only with valid types might be best practice, it is worth understanding the compiler does not require this.

Function annotations: Introduced in Python 3.0 (2008)

Here is an example of function annotation:


>>> def func(p1: int, p2: "this is also an int" + " but ...") -> float:
	return p1 + p2

>>> func.__annotations__
{'p1': , 'p2': 'this is also an int but ...', 'return':  }

The expression following the ‘:‘ (colon) character (or the ‘->' symbol) is evaluated and the result stored in the __annotations__ dictionary.

The PEP is available here, but it is the fundamentals section that is the most highly recommended reading.