Kotlin Limitations vs Python

social-media-failThis page will serve as repository of limitations found when attempting to move kotlin code to python code. Some limitations are a natural consequence of moving to static types, others are effectively inherited from Java.

The list here is based on placing those I find most significant at the top, and the further down the list the less significant.

base class parameter pass through

background

Kotlin, like python, allows for optional parameters, a feature not found in Java.  A consequence of this feature is that libraries with heavily used classes can over time evolve to have a large number of optional parameters in the most heavily used methods.  A larger number of parameters should only occur in code in a language that has optional parameters, as without the parameters being optional, every extra parameter  would introduce overhead on every call.  These methods with a large number of parameters are often frequently used methods, and the goal is that well chosen defaults will mean that most of these optional parameters are only provided in exceptional cases.  While a typical instance of the class may require only 2 or 3 parameters, while there may be as many as 20 to chose from.

To find examples, I simply thought: “what are the most widely used python packages I can think of?”.  The first three that occurred to me to check were sqlalchemy,  attr, and Django, and for each of these I looked for the most basic usage cases I could find. The first examples I found are:

  • sqlalchemy:   Table Class  20 optional parameters (see ‘parameters’)
  • attr: atrr.s() function – 9 optional parameters (see attr.s function)
  • Django: Field class – 22 optional parameters (see __init__)

While such a large number of parameters should not normally occur in Java due to lack of optional parameters, I think these example establish that the pattern is common in well regarded python packages.

The Problem.

Consider the Django Fieldclass . This class serves as a base class for several other classes, such as  TextField, TimeField, BinaryField etc.  To code examples like these cleanly, some type of language mechanism is needed which effectively achieves:  “include all fields from the base class constructor (or other method from the base class) as also parameters to this class unless specifically declared in this class“.

Python uses the *args,**kwargs system, which for several reasons is not an appropriate solution for kotlin, but is at least a solution.  There are some more elegant solutions possible for kotlin, and perhaps one will be added at a later time.

In Python the code for defining the BinaryField Class is as follows (not exact code for simplicity):

class BinaryField(Field):

    def __init__(self, *args, **kwargs):
        kwargs['editable'] = False
        super().__init__(*args, **kwargs)
        // other init code goes here

while in kotlin (slightly changed for name conventions and simplicity) the code becomes:

class BinaryField(
        verboseName:String?=null, name:String?=null, primaryKey:Boolean=false,
        maxLength:Int?=null, unique:Boolean=false, blank:Boolean=false,
        nulled:Boolean=false, dbIndex:Boolean=false, rel:String?=null,
        default:Any?=null, //editable:Boolean=true, - force 'false' for editable
        serialize:Boolean=true,
        uniqueForYear:Int?=null, choices:String?=null, helpText:String="",
        dbColumn:Int?=null, dbTablespace:Int?=null, autoCreated:Boolean=false,
        validators:List<Any>?=null, errorMessages:String?=null
 ):Field(verboseName=verboseName, name=name, primaryKey=primaryKey,
        maxLength=maxLength, unique=unique, blank=blank, nulled=nulled,
        dbIndex=dbIndex, rel=rel, default=default, editable=false,
        serialize=serialize, uniqueForYear=uniqueForYear, choices=choices,
        helpText=helpText, dbColumn=dbColumn,
        dbTablespace=dbTablespace, autoCreated=autoCreated,
        validators=validators, errorMessages=errorMessages) {
  // class code here
}

Clearly, the call to the base constructor will be much shorter if not using named parameters, which is a choice, but in a list this long I would use named parameters.

The code (or almost identical code) will be repeated TextField, TimeField and the over 12 other fields that inherit from Field. Any update to the parameter list for the base Field class is tedious to say the least.

This is a case where kotlin requires boilerplate that is not required in python. What is needed is some way to say “accept all parameters to the base default constructor not specifically named in the base call, and pass these parameters through“. Given this problem will not occur in Java libraries which have no default parameters, it may be some time before kotlin designers consider this, if ever.  In the mean time, messy.

Constructor Calls

(to be added: by Nov 13)

Intricate, preset order, class data initialisation

Kotlin has what can appear a rather strange way of implementing the overriding of properties when extending classes.  The result is that when extending classes, the behaviour of initcode can be unexpected.  The way to avoid this is to use lazy properties in place of initialising during init.

open class Base{
    val simple = 3
    open val open = 3
    open val openGet = 3
    open val getter get()= 3
    open val getOpen get()= 3
    open val getDelg by GetDelg(3)
    init {
        println("base simple $simple open $open openG $openGet "+
                "getOpen $getOpen getter $getter "+
                " getDelg $getDelg")
    }
    //open fun add(a:Int,b:Int) = a + b
}

class SubClass:Base(){
    override val open = 4
    override val openGet get()= 4
    override val getter get() = 4
    override val getOpen = 4
    //override val getDelg by GetDelg(4)  //uncomment for null pointer
    init {
        println("sub simple $simple open $open openG $openGet "+
                "getOpen $getOpen getter $getter "+
                " getDelg $getDelg")

    }
}
class GetDelg(val value:Int){
    operator fun getValue(thisRef: Any?, property: KProperty<*>): Int {
        return value
    }
}

The print from instancing a SubClass object is:

base simple 3 open 0 openG 4 getOpen 0 getter 4 getDelg 3
sub simple 3 open 4 openG 4 getOpen 4 getter 4 getDelg 3

 

Open is 0 in the base class init because the code to set the value has not yet been run, but not it is not 3 as you would expect.

openG, a value overridden by a get() method, perhaps unexpected returns the override value in both init() methods

getOpen, a get() method in the base overridden by a simple initialisation, behaves as a simple initialised value overridden by a new initialise, which is to be unitialised in the base init() method

getter() , a get() method overidden by another get() method returns the override value as does openG

getDelg() actually generates a null pointer exception if called during  the base init() method, as the overridden value has not been calculated

Note: part of this behaviour is base on the fact that overridden properties are actually new distinct properties, so the do not inherit values from base class property which is still accessible via super.<name>.  This means, counterintuitively, that open in the base init() method, returns 0while super.open in the subclass init()will return 3

I will update with more on the use of lazy to avoid this issue, but the main point is to think carefully before initialising values in a base class that are open.

*args, **kwargs (to be added)

A specific use of *args and **kwargs has already been covered in base class parameter pass through.

Outside of that specific use, kotlin does have effective equivalents to most use cases, but may depend on the reflections library, which is not available on all platforms at this time.

vararg parameters capture the equivalent to a *args list, and allows for using the list for calling in a manner very similar to python.

callBy provide most of the functionality of **kwargs when used for calling functions, but some code is needed to map parameter names to KParameters.  A link to such code may be added to this page is someone asks 🙂

For cases where it is desired to capture parameters in map for, using a map actually makes better sense in every case I have found, but I will update this further if I find a case where this is not true.

Advertisements

Machine Code, Global memory, the Stack and the Heap

Today we computers that have multiple ‘cores’ (in fact complete processors) on a single chip. But one way to understand the fundamentals of computers is to analyse far simpler computers from the past.  This section will review those very fundamental concepts in term of a hypothetical programmable calculator, one of the very first ever popular ‘mini-computers’, the PDP-8 from 1965, and the Intel 8080 cpu from 1974 that still influences the instructions of intel Xenon and ‘Core’ processors in 2017 .

The JavaScript (with Kotlin source code) version of the ‘hypothetical programmable calculator’ is available here.  I will also add links to emulators for the PDP 8 and 8080.

Sections:

Machine Code and Global Memory

sdl453269107_1375355122_image1-1310fTo understand how memory works, it can be useful to consider how the CPU itself works, and what follows from that is how memory works.  Languages build on the underlying principles and automate using the different types of memory.

This page uses a ‘hypothetical programmable calculator’ (Magic Calculator) to illustrate the principles of how instructions are executed and how global memory works with the instructions described below.

 

Hypothetical Programmable Calculator

How does a CPU work?

The example of the ‘hypothetical programming calculator is used for ‘how does a  ‘Central Processing Unit’ (CPU) work.  The working memory inside a CPU is the CPU ‘registers’, and the value displayed on a calculator screen is very analogous to simple computer with a main register.  For this exercise, consider this displayed value as the ‘a’ register (Many original computer did have an ‘a’ register, or ‘accumulator’ register, and some even provided a continuous display of the value of this register).  To add two numbers on a calculator, we enter the first number to our display or ‘a’ register, then activate ‘+’, which has to store the operation as plus, and save the first number into a second register which we can call ‘b’. Then we enter the second number into the ‘a’ register (or display), and with the ‘=’ we do the stored operation with add ‘a’ and ‘b’ and leaves the result in a.

So we have some program steps, but how do we ‘run’ these steps? Well first we need some memory.

Program Memory and program counter

Many calculators have one or more ‘memories’.  Our programmable calculator is going to have 100 memories!  The simplest calculators have one memory, and you can save from the ‘a’ register to the memory, or load from the memory to the ‘a’ register.  On some calculators you can even add the ‘a’ register into the memory, but I digress. The big thing with our programmable calculator, is that values in memories represent instructions.  Number ‘1’ when used as instruction could be our ‘+’, number ‘2’ our ‘=’ and number ‘3’ could mean ‘set a,n’  to set a from value in the memory following the instruction.   To make this work, we need a new register, a ‘Program Counter’ register for pointing to instructions.  Every time we load an instruction, or load information with the ‘Program Counter’, the program counter increases by 1.

So our program to add 7 and 8 (in memory locations 0, 1, 2, 3, 4, 5, 6 )now looks like:

  • 3  7  1  3  8  2  0  (enter this string into the emulator ‘code’ field)

The steps are:

  1. The “program counter (PC) starts at zero so the instruction at zero, (3- load a) is run, and this instruction loads the next value from the memory location specified PC register (and again adds one to the register), so the result is the ‘7’ from location ‘1’ is loaded into ‘a’ and 7 is displayed.
  2. The PC register is now 2 (increased to 1 after loading the ‘3’ – load instruction, and again increased to 2 as the load instruction loaded the ‘7’ from location 1.  The plus instruction sets operation register to ‘add’ and copies the ‘7’ from the ‘a’ register to the ‘b’ register.
  3. The ‘load’ instruction (3) from location ‘3’ is loaded from the program counter and this instruction then loads the ‘8’ from memory location 4 into ‘a’ register
  4. the ‘=’ instructions (2) from memory location ‘5’ is loaded and this causes the ‘7’ from ‘b’ to be added to ‘a’ so the calculator then display our answer: ’15’
  5. the ‘stop’ instruction (0) from memory location 6 causes our program to stop.

This simple example illustrates how a program actually runs in a computer. The main memory can have both data and instructions.

Adding global variables: the instructions.

Currently the binary program for the ‘programmable calculator’  just does the equivalent of  ‘7 + 8’ in python.

This is only useful because we can see the ‘a’ register on the calculator display.  The equivalent of ‘7+8’ being useful in ‘idle’, because idle prints the answer. Now consider the program ‘answer = 7 + 8’.  This program stores the answer in a variable.  The previous program is stored in  7 memory locations, so there is lots of free memory locations for variables.   If we plan to use half of the memories for code, and half for variables, then all memories below 50 would hold code and numbers used inside code, and memories 50 and above would be for variables.

None of the current instructions use variables, so consider  two new instructions, load a,(n) and  save a,(n) to load ‘a’ register from the memory location we want, or save the ‘a’ register. The ‘load’ (instruction code 4)  and ‘save’ (instruction code 5) will both use the memory following the instruction to specify which memory is to be loaded or saved.

Currently the ‘Magic Calculator’ does not support these last two instructions(load and save), but if desired for experimentation, this could be added.

 

OOP vs Functional: No contest!

Moving to kotlin, the question can arise: “should I program in OOP (Object Oriented Programming) style, or move to Functional programming?”.  This page examines reviews what Object Oriented programming really means, what functional programing is, and outlines how there is no incompatibility if the correct approach to OOP is in place, and the two paradigms are best when combined.

Functional Programming vs Programming with functions.

What is FP?

Almost all modern programs use functions, but just using functions is not Functional programming.  What is required is functions that are ‘functional’ in the mathematical sense, not programming in the ‘using functions’ sense.  A mathematical function, or ‘pure function’ operates on the supplied arguments and returns a result and does nothing else. No ‘side effects’. Nothing changed by the function, no internal variables altered that will result a future call of the same function dealing with different values.

The best test for a ‘pure function’ is: will the result, and all aspects of calling the function be identical every time the function is invoked?

Limitations of ‘pure’ Functional Programing.

Any ‘side effect’ of calling the function disqualifies the function from being a ‘pure function’, as the only output from the function is the result of the function.  The reality is a program without any ‘side effects’ is actually useless, as input/output is generally a side effect.  The answer can to a problem may be able to be calculated using ‘pure functions’, but actually showing the answer is going to require input/output.  So while a program may incorporate ‘pure functions’, no program will be entirely functional program.   The second important point is that ‘pure functions’ can be built in any language.

Language requirements.

Languages that supports functional programming give tools to make it easy to see what code will qualify as a ‘pure function’, and what does not.

OOP vs non OOP programming.

What is OOP?

OOP means different things to different people.  To some the ‘class’ statement is the essence of object oriented programing, yet while ‘javacript’ has no class statement, but supports object oriented programming.

A more useful view of object oriented to think of ‘classes’ simply as ‘types’.  A type permits operations on that type. These operations are ‘methods’ of that type, that is computing that can be done with that type.  When different types (e.g. ‘int’ and ‘string’) each have their own interpretation of the same operation (e.g ‘+’ or plus) then we have one of the anchors of being object oriented: polymorphism.  As almost every language has several different types which each have their own version of ‘+’, if follows that every language has some elements of object oriented. The core concepts of object oriented is present in almost every program, but object oriented programing is more than just some exposure to the concepts.  An object oriented program does not just use the type system and which will already have some degree of object oriented approach, rather, and object oriented program extends the type system creating its own ‘types’ or ‘classes’.  Further, a an object oriented program, is about using this approach as the basis of the solution.

The limitations of pure Object Oriented

Pure object oriented is usually stated to have four ingredients:

  • polymorphism: the ability of different classes to effectively all act the same in some way but implementation their own version of one or more functions or operations to operate in an equivalent way
  • encapsulation: the implementation of the class (or type) is coded within the class and  code using the class has no need to consider how the class is implemented and will not be impacted if the implementation changes
  • inheritance: classes or types can inherit or ‘sublcass’ from a ‘superclass’.  This enables both reuse of code, and is a key tool for building polymorphism when more than one class subclasses from the same parent has its own implementation of the methods of the superclass
  • abstraction: this is giving the class an interface designed around the logical use of the class, and not simply a reflection of the mechanics of the underlying code

Of these four usually quoted points, all but one are mostly about techniques to do OOP well, while encapsulation is really a test of OOP or not OOP. ‘Pure’ OOP code be built from purely from Objects with the code encapsulated within the classes.

The language Java provides a perfect illustration of why ‘pure’ OOP is a mistake. Where efficiency dictates, the despatch tables within classes required for encapsulation are a performance killer.  For performance.  low level, close to the computer objects are best being reduced by the compiler to direct inline code.  This does not stop such low level object classes adhering to the object model, or at least linguistically, being objects.  However Java even goes as far a simply dropping all pretence of objects, or Object Oriented code for ‘native types’.  In contrast to Java itself, the Java must write all code in classes.  The simple example of the ‘hello world’ which results in a pointless class wrapper containing the ‘main’ method, illustrates how even enforcing there are classes, does not create an Object based approach, and that for some code, there is no object based approach.

Language Requirements.

In python, a programmer could implement their own class system using dictionaries of methods and properties. But creating new objects requires allocating memory, keeping track of when an object is no longer needed and freeing that memory. In fact some ‘object oriented’ languages (like c++) require programs to manually control garbage collection (returning memory to ‘free memory’ when objects are no longer needed).  Every feature needed for OOP can be implemented with no special features at all, however is most languages ‘DIY’ would be laborious and distract from the application. No special feature is needed to do OO, but to do OO well, a good syntax and feature set is essential.

Functional vs OOP: No contest!

The optimal FP approach, is avoid side effects wherever possible.  The optimal OOP approach is for all data in the program to be an object.  FP sounds like focusing on the purity of the logic steps, OOP is about focusing on the data.  But FP programs still need data, and operations on data, and regarding that data as objects and empowering the data makes sense. There is no problem combining both paradigms in a single program.

Some argue that the OO philosophy encourages programming in a ‘state-driven’ manner that is the antithesis of FP, but this argument ignores the foundations of the best FP languages have OO based type-systems.

OO itself makes it easy to break FP principles, but also can make it easy to follow FP principles.  Kotlin does give more tools to support using OO together with FP style than python does, but this can be done with either language.  As a simplistic example, a ‘val’ in kotlin is a variable which can never be reassigned, with python a programmer can still have variable which is never reassigned, but the programmer will need document this and check themselves that the rule is followed.

Misguided OOP vs the OOP and FP combination.

Recently, I was speaking with a programmer who declared, “Object Oriented is dead, Functional Programming is the correct way!” He had a link to a video describing how OOP programs could break all the principles of FP.  OOP programs that behave this way are simply bad OOP programs! I have not found that video again, but I can recommend this (rather long) video by ‘uncle’ Bob Martin on functional programming (see 50:00 minutes in for the points on OOP with functional programming).  Misguided OOP tries to force everything, including functions, into objects.  A classic example of this is with Java where even the main function of a program has to be wrapped within an object that adds nothing.  Kotlin moves from the misguided OOP of Java in many ways, for example by making functions first class objects.  As this guide progresses, the theme of Kotlin adapting true, more mature OOP into the system, and in a Functional Programming compatible way, while still maintaining java compatibility, is repeated over and over.

Variables and Objects

Introduction

The page follows on from the languages page, which reviewed basic concepts of the principles of how programming languages work, by further exploring how ‘variables’ programming languages work.  Understanding this, helps understand the philosophies of python and kotlin.

In this page, the language ‘c’ is discussed, but there is no need to know how to program in ‘c’ any more than there was a need to program in machine code or assembler for the last chapter.  It is just that explaining how ‘c’ works allows explaining some simple examples.

Topics:

  • Variables and ‘static memory’.
  • Variables and dynamic memory
  • Objects and memory.
  • More about objects.

Variables and Static memory.

In a language such as the original ‘C’ language, a byte variable requires a single byte of memory to be reserved for that variable. The original ‘int’ variable required 2 bytes, and a string required one byte for each character plus one extra byte to mark the end of the string. If a string was to hold a variable number of characters, then the variable could be declared (as in ‘char name[80];’ to declare a variable called name able to hold 80 char-acters) with the maximum allowed size, and then either a separate variable declared to record the current length of ‘name’, or a special character at the end to indicate the end of the characters in use.

A new concept added in ‘c’ was the ‘pointer variable’.  This variable would not hold program information such as a ‘char’ or an ‘int’ or a string, but would hold the number of the memory location (the ‘address’) where information is stored. Consider two ‘int’ variables ‘a’ and ‘b’ and a pointer variable ‘p’. The variable ‘p’ could hold the address of ‘a’ or ‘b’.  Operations could operate on ‘p’, or they could operate on the memory location pointed to by ‘p’.  This allows the same code to using ‘p’ to read or change either ‘a’ or ‘b’ depending on where ‘p’ is ‘pointing’ at the time.

All of these variables, including the pointer type, require a fixed amount of memory so the compiler could calculate the size of the block of memory to hold all the variables.  These variable could be called ‘static’ variables as they use the same place in memory throughout the program. Every time a variable is declared, the space needed to hold that variable is reserved by the language.

A program with only static variables only requires a very low powered computer chip, and is the program in your toaster is very likely to work with only static variables.

Variables and dynamic memory: malloc().

Now consider the more complex problem of a linked list, with a variable number of elements. We could simply declare a variable with the maximum possible size for the most extreme circumstance for the program.  This ‘maximum ever’ size would likely almost never eventuate, meaning that normally there a lot of wasted memory.

In ‘c’ the answer can come from a the ‘malloc’ function,  which manages a pool of memory space, and allows allocating space from that came from a function q’ (memory allocate) where the program could ask the operating system for a block of memory of a given size at run time.  The memory could then vary according to requirements.  When the memory is allocated, the location of the memory could be stored in a ‘pointer variable’ and the program do whatever it wishes with the pointer. This for the first time allows true variable length variables. If we are finished with our list we can call ‘free’ to allow the space to be reused for any other data.  We could later call malloc again asking for a different size, and this time our data may be in a different place.  This makes our data ‘dynamic’ as where in memory our data is located, and how much space the data uses can change.

From the diagram, it can be seen that our static ‘int’ variable ‘a’ is held in static memory, but the dynamic ‘int’ variable ‘b’ is actually held in dynamic memory, so getting the value is a two step process.  First get the pointer, then use the pointer to get the value.  Dynamic variables make the program run slower, but they give greater flexibility.

Objects and Memory

Now consider objects. Objects are instanced at run time, so the easiest way to handle them is by simply holding a ‘pointer’ in static memory and creating the object in dynamic memory and then pointing to the object.  A simple test for this in python is to try this code:

>>> c = [ 1, 2, 3 ]
>>> d = c
>>> c[1] = 4
>>> print(d)
[1, 4, 3]

It become clear that like that wonderful hand drawn diagram, both ‘c’ and ‘d’ share the same memory, so changing ‘c’ also changes ‘d’.

Every variable in python is in fact a ‘pointer’ or ‘reference’ to an object in dynamic memory. No exceptions.  You may wonder “but if i set ‘a = 3’ and ‘b=a’ then add ‘2’ to a ‘a = a + 2’, then ‘b’ does not change!”.  This is because ‘a = a + 2’ creates a new dynamic memory value ‘5’ and then ‘a’ is set to point to this new value, while b still points to ‘3’.

More on this in ‘basics’.