Machine Code and Global Memory

sdl453269107_1375355122_image1-1310fTo understand how memory works, it can be useful to consider how the CPU itself works, and what follows from that is how memory works.  Languages build on the underlying principles and automate using the different types of memory.

This page uses a ‘hypothetical programmable calculator’ (Magic Calculator) to illustrate the principles of how instructions are executed and how global memory works with the instructions described below.

 

Hypothetical Programmable Calculator

How does a CPU work?

The example of the ‘hypothetical programming calculator is used for ‘how does a  ‘Central Processing Unit’ (CPU) work.  The working memory inside a CPU is the CPU ‘registers’, and the value displayed on a calculator screen is very analogous to simple computer with a main register.  For this exercise, consider this displayed value as the ‘a’ register (Many original computer did have an ‘a’ register, or ‘accumulator’ register, and some even provided a continuous display of the value of this register).  To add two numbers on a calculator, we enter the first number to our display or ‘a’ register, then activate ‘+’, which has to store the operation as plus, and save the first number into a second register which we can call ‘b’. Then we enter the second number into the ‘a’ register (or display), and with the ‘=’ we do the stored operation with add ‘a’ and ‘b’ and leaves the result in a.

So we have some program steps, but how do we ‘run’ these steps? Well first we need some memory.

Program Memory and program counter

Many calculators have one or more ‘memories’.  Our programmable calculator is going to have 100 memories!  The simplest calculators have one memory, and you can save from the ‘a’ register to the memory, or load from the memory to the ‘a’ register.  On some calculators you can even add the ‘a’ register into the memory, but I digress. The big thing with our programmable calculator, is that values in memories represent instructions.  Number ‘1’ when used as instruction could be our ‘+’, number ‘2’ our ‘=’ and number ‘3’ could mean ‘set a,n’  to set a from value in the memory following the instruction.   To make this work, we need a new register, a ‘Program Counter’ register for pointing to instructions.  Every time we load an instruction, or load information with the ‘Program Counter’, the program counter increases by 1.

So our program to add 7 and 8 (in memory locations 0, 1, 2, 3, 4, 5, 6 )now looks like:

  • 3  7  1  3  8  2  0  (enter this string into the emulator ‘code’ field)

The steps are:

  1. The “program counter (PC) starts at zero so the instruction at zero, (3- load a) is run, and this instruction loads the next value from the memory location specified PC register (and again adds one to the register), so the result is the ‘7’ from location ‘1’ is loaded into ‘a’ and 7 is displayed.
  2. The PC register is now 2 (increased to 1 after loading the ‘3’ – load instruction, and again increased to 2 as the load instruction loaded the ‘7’ from location 1.  The plus instruction sets operation register to ‘add’ and copies the ‘7’ from the ‘a’ register to the ‘b’ register.
  3. The ‘load’ instruction (3) from location ‘3’ is loaded from the program counter and this instruction then loads the ‘8’ from memory location 4 into ‘a’ register
  4. the ‘=’ instructions (2) from memory location ‘5’ is loaded and this causes the ‘7’ from ‘b’ to be added to ‘a’ so the calculator then display our answer: ’15’
  5. the ‘stop’ instruction (0) from memory location 6 causes our program to stop.

This simple example illustrates how a program actually runs in a computer. The main memory can have both data and instructions.

Adding global variables: the instructions.

Currently the binary program for the ‘programmable calculator’  just does the equivalent of  ‘7 + 8’ in python.

This is only useful because we can see the ‘a’ register on the calculator display.  The equivalent of ‘7+8’ being useful in ‘idle’, because idle prints the answer. Now consider the program ‘answer = 7 + 8’.  This program stores the answer in a variable.  The previous program is stored in  7 memory locations, so there is lots of free memory locations for variables.   If we plan to use half of the memories for code, and half for variables, then all memories below 50 would hold code and numbers used inside code, and memories 50 and above would be for variables.

None of the current instructions use variables, so consider  two new instructions, load a,(n) and  save a,(n) to load ‘a’ register from the memory location we want, or save the ‘a’ register. The ‘load’ (instruction code 4)  and ‘save’ (instruction code 5) will both use the memory following the instruction to specify which memory is to be loaded or saved.

Currently the ‘Magic Calculator’ does not support these last two instructions(load and save), but if desired for experimentation, this could be added.

 

Advertisements

OOP vs Functional: No contest!

Moving to kotlin, the question can arise: “should I program in OOP (Object Oriented Programming) style, or move to Functional programming?”.  This page examines reviews what Object Oriented programming really means, what functional programing is, and outlines how there is no incompatibility if the correct approach to OOP is in place, and the two paradigms are best when combined.

Functional Programming vs Programming with functions.

What is FP?

Almost all modern programs use functions, but just using functions is not Functional programming.  What is required is functions that are ‘functional’ in the mathematical sense, not programming in the ‘using functions’ sense.  A mathematical function, or ‘pure function’ operates on the supplied arguments and returns a result and does nothing else. No ‘side effects’. Nothing changed by the function, no internal variables altered that will result a future call of the same function dealing with different values.

The best test for a ‘pure function’ is: will the result, and all aspects of calling the function be identical every time the function is invoked?

Limitations of ‘pure’ Functional Programing.

Any ‘side effect’ of calling the function disqualifies the function from being a ‘pure function’, as the only output from the function is the result of the function.  The reality is a program without any ‘side effects’ is actually useless, as input/output is generally a side effect.  The answer can to a problem may be able to be calculated using ‘pure functions’, but actually showing the answer is going to require input/output.  So while a program may incorporate ‘pure functions’, no program will be entirely functional program.   The second important point is that ‘pure functions’ can be built in any language.

Language requirements.

Languages that supports functional programming give tools to make it easy to see what code will qualify as a ‘pure function’, and what does not.

OOP vs non OOP programming.

What is OOP?

OOP means different things to different people.  To some the ‘class’ statement is the essence of object oriented programing, yet while ‘javacript’ has no class statement, but supports object oriented programming.

A more useful view of object oriented to think of ‘classes’ simply as ‘types’.  A type permits operations on that type. These operations are ‘methods’ of that type, that is computing that can be done with that type.  When different types (e.g. ‘int’ and ‘string’) each have their own interpretation of the same operation (e.g ‘+’ or plus) then we have one of the anchors of being object oriented: polymorphism.  As almost every language has several different types which each have their own version of ‘+’, if follows that every language has some elements of object oriented. The core concepts of object oriented is present in almost every program, but object oriented programing is more than just some exposure to the concepts.  An object oriented program does not just use the type system and which will already have some degree of object oriented approach, rather, and object oriented program extends the type system creating its own ‘types’ or ‘classes’.  Further, a an object oriented program, is about using this approach as the basis of the solution.

The limitations of pure Object Oriented

Pure object oriented is usually stated to have four ingredients:

  • polymorphism: the ability of different classes to effectively all act the same in some way but implementation their own version of one or more functions or operations to operate in an equivalent way
  • encapsulation: the implementation of the class (or type) is coded within the class and  code using the class has no need to consider how the class is implemented and will not be impacted if the implementation changes
  • inheritance: classes or types can inherit or ‘sublcass’ from a ‘superclass’.  This enables both reuse of code, and is a key tool for building polymorphism when more than one class subclasses from the same parent has its own implementation of the methods of the superclass
  • abstraction: this is giving the class an interface designed around the logical use of the class, and not simply a reflection of the mechanics of the underlying code

Of these four usually quoted points, all but one are mostly about techniques to do OOP well, while encapsulation is really a test of OOP or not OOP. ‘Pure’ OOP code be built from purely from Objects with the code encapsulated within the classes.

The language Java provides a perfect illustration of why ‘pure’ OOP is a mistake. Where efficiency dictates, the despatch tables within classes required for encapsulation are a performance killer.  For performance.  low level, close to the computer objects are best being reduced by the compiler to direct inline code.  This does not stop such low level object classes adhering to the object model, or at least linguistically, being objects.  However Java even goes as far a simply dropping all pretence of objects, or Object Oriented code for ‘native types’.  In contrast to Java itself, the Java must write all code in classes.  The simple example of the ‘hello world’ which results in a pointless class wrapper containing the ‘main’ method, illustrates how even enforcing there are classes, does not create an Object based approach, and that for some code, there is no object based approach.

Language Requirements.

In python, a programmer could implement their own class system using dictionaries of methods and properties. But creating new objects requires allocating memory, keeping track of when an object is no longer needed and freeing that memory. In fact some ‘object oriented’ languages (like c++) require programs to manually control garbage collection (returning memory to ‘free memory’ when objects are no longer needed).  Every feature needed for OOP can be implemented with no special features at all, however is most languages ‘DIY’ would be laborious and distract from the application. No special feature is needed to do OO, but to do OO well, a good syntax and feature set is essential.

Functional vs OOP: No contest!

The optimal FP approach, is avoid side effects wherever possible.  The optimal OOP approach is for all data in the program to be an object.  FP sounds like focusing on the purity of the logic steps, OOP is about focusing on the data.  But FP programs still need data, and operations on data, and regarding that data as objects and empowering the data makes sense. There is no problem combining both paradigms in a single program.

Some argue that the OO philosophy encourages programming in a ‘state-driven’ manner that is the antithesis of FP, but this argument ignores the foundations of the best FP languages have OO based type-systems.

OO itself makes it easy to break FP principles, but also can make it easy to follow FP principles.  Kotlin does give more tools to support using OO together with FP style than python does, but this can be done with either language.  As a simplistic example, a ‘val’ in kotlin is a variable which can never be reassigned, with python a programmer can still have variable which is never reassigned, but the programmer will need document this and check themselves that the rule is followed.

Misguided OOP vs the OOP and FP combination.

Recently, I was speaking with a programmer who declared, “Object Oriented is dead, Functional Programming is the correct way!” He had a link to a video describing how OOP programs could break all the principles of FP.  OOP programs that behave this way are simply bad OOP programs! I have not found that video again, but I can recommend this (rather long) video by ‘uncle’ Bob Martin on functional programming (see 50:00 minutes in for the points on OOP with functional programming).  Misguided OOP tries to force everything, including functions, into objects.  A classic example of this is with Java where even the main function of a program has to be wrapped within an object that adds nothing.  Kotlin moves from the misguided OOP of Java in many ways, for example by making functions first class objects.  As this guide progresses, the theme of Kotlin adapting true, more mature OOP into the system, and in a Functional Programming compatible way, while still maintaining java compatibility, is repeated over and over.

Java: The king is dying

(originally written 23 May 2017)
TL:DR … but just read the headings in that case.
Java is the king.  In a similar way that the queen of England is ‘the queen’. Java is not everyone’s king, and some have another king, but even they may see java as the most recognised king.  Certainly, Java is the ‘king’ with the health problems.

Birth: 1995, An innovation, free software with slick professional marketing at the right time!

Java was born in San Francisco as a child of James Gosling in 1995, in the age before the internet had really taken off or allowed the collaboration we know today. The promise of the language was ‘write once, run anywhere’ and the language was fully designed to be ‘fully object oriented’.  The truth is Java did not deliver on either premise, but Java did bring the innovation of an efficient, and stable specification for the intermediate code interpreter referred to as the ‘Java Virtual Machine’. ‘C’ was the dominant language, and the early ‘C++’ extension did a very messy job of facilitating OO development, so there was a vacuum to fill.
Sun embarked on a mission to try and deliver ‘write once, run anywhere’.  They approached netscape about running in the browser, and when the answer was ‘it wont’ work, they asked ‘well could we call what would work java?’.  The approached Patrice Peyret about running in a smart card and ended up buying his company and naming another ‘non-java’ in JavaCard. They approached Cardsoft and STIP on another platform but people tried to really make it java and so it died.  But they never acheived ‘java everywhere’ except in having something ‘java’ in the name,   The OO design was that it forced applications to use objects, the language itself had no first class functions, and most variables were not objects. Messy, because design was in a vacuum compared to languages today.
But Java became cool.  A cool logo.  Named after an remote ‘exotic’ island with a reputation as an origin of coffee. Fun in a world of boring ‘Basic’, and ‘C’ and ‘Delphi’. ‘Java One’ conferences were the event to attend.  Sun was already ‘cool’ and Java bathed in that, and the marketing and became the language a hipster would program in. Lack of anyone pushing a real alternative, together with great marketing and the language being free resulted in Java taking off driven by its own momentum. Java became king.

Meanwhile….: 2000-2009 lost years in language, java holds on

By 2000, Java is already starting to decline from the peak. Microsoft now has its ‘won java; in C#, but with Microsoft not cool at the time and C# still new, the strongest competitor to java is still the cumbersome C++.  Rapid coding solutions scripting languages perl and php are on the rise and the better  ‘scripting’ languages like Python and Ruby gain some traction, but they still have ‘rough edges’. All these are regarded as ‘scripting languages’ that are not for real programmers, suffer performance limitations(particularly on artificial benchmarks), and more importantly, programs are difficult to distribute unless distributed as source.  The reality is businesses like yahoo, google others are all built using the competitive advantage of rapid coding with scripting languages, and as their software runs on servers the software distribution is not an issue. So the tech giants grow using newer technologies, but the ‘IT Deparaments’ stay with Java. By late 2004/2005 the secret of companies like google is not only revealed, google actually hires the original developer of python.  Around this time python jumps in popularity and Java declines, but the ‘quick and dirty’ php is still a favourite with millions of simple web sites.  Form 2008 through to 2012 android has halted the demise of java, the python 2/3 problem is in full swing, and nothing seems ready to fill the void.

The plotter gather: 2010 – 2017 contenders mount challenge

Every one realises java has problems, but with no clear successor to the throne, there is a void.  Scripting languages are still working on their limitations, and python is in the 2/3 phase.  Scala(first released beta 2004), feels it is ready for the main game, but it joined by Clojure (2007) and then GoSu, Frege, Kotlin, Ceylon, Fantom   ….so many contenders that it is hard to choose.  All are an improvement on Java, but which one improves on the others?  Plus there are new native languages R, Go, Swift and perhaps moving back to native is better?  Too many choices, so procrastination wins in many cases.

May 2017: A successor announced!

Google endorses Kotlin as a first class language on android.  This may seem, insignificant, but this is google anointing Kotlin as a successor to Java on Android.  Google is even promoting just how much of an improvement Kotlin is over Java, showcasing the savings in code of data classes over Java and other examples.  Not just to any old language is being ‘pushed aside’…  the ‘king’ is pushed aside.  When Apple anointed Swift as the successor to  objective-c, there was no real use of objective-c other than Apple and the language was far from a ‘king’.  Apple anointed Swift, but Swift was their own project, the anointment meets with some scepticism from many developers, just as would have happened if Google chose ‘go’ on Android.  What is interesting is just what happened to objective-c popularity.  See graph for the popularity of Objective C following Swift announcements. image2017-5-23_17-1-6

The Future Impact: A new king, but who will it be?

Expect Kotlin to jump into the top 50 on tiobe within around 1 month, and become the highest ranked Java alternative very soon.  Kotlin will enter PyPl and Java will come under threat of losing its top position on that scale.
Overall Java will have a significant dip, and Kotlin will have a ‘bullet’ following the announcement.  Ok, Java will not dip to same extent objective-c did, but it will be significant.  So Java will have a first phase of ‘dip’ because of the announcement, and then a second dip as a consequence of the first dip reinforcing that Java is now an ‘out of favour’ as a language.   Similarly Kotlin will get at least two ‘boosts’.  Most likely within a year it will be on all top 10 lists, and possible even ahead of the 3yr old Swift, although Apple is really doing some pushing with swift.
The real impact is that unlike Swift, which ‘poached’ programmers from Objective-C,  Kotlin has the potential to poach from Java, C, C++, Python, Javascript, and even others. There may even be enough impact on challegers to Java’s crown that these also lose some share and delay the inevitable abdication of Java,  but Kotlin being endorsed as effectively a better choice than Java has the potential to really shake up the industry.  Where it goes will depend on future decisions that may be likely, but not set in stone, just as google anointing Kotlin was likely, but not set in stone.  The less predictable part was that google would fund Kotlin moving to its own non-profit (but likely well subsidised) entity.
So now, a single language already targeting the JVM, the browser, and native code right down to Arduino with LLVM. This is new ground for computing.  Kotlin promises not write once run anywhere, but write key code once and reuse everywhere that code is relevant.  Only CLR is missing as a target, although there is a windows solution emerging.  How an ever more open Microsoft will respond is not known, but JetBrains is already producing a .NET IDE!   Another big question is whether a taste of success will derail JetBrains!
The future is does have questions, but this anointment of kotlin has the potential for a real change to the industry.  There will be a new king, and while Kotlin is an anointed heir to Java, it is not clear who the new king will be yet. While Kotlin may one day take the throne, it is unlikely to be the very next king.

Lists, Dictionaries Iterations & More

Python uses dictionary, list and tuple to hold collections information, with set also available but not quite as common.

Kotlin tends provides List and Maps and their mutable forms, MutableList and MutableMap as the main solutions for collections.  Again, Set and MutableSet are available matching the python ‘set’.  ‘List’ is equivalent to python ‘tuple’, MutableList to python ‘list’ and MutableMap to python ‘dictionary’.  In kotlin, there are many other options, some inherited from java, but they all have a logical role:

tuple -> List, Pair, data class

The python tuple is an immutable list, and the simple ability to use tuples  has a special role in the language with any expression with a comma creating a tuple.

Background: The good and bad of the ubiquetous python tuple

The good and bad of tuples can be seen in this :

>>> a, b = 1, 2
>>> a, b = b, a
>>> a, b
(2, 1)

The good is that tuples have almost zero syntactic overhead, all you need is a ‘,’ (comma). The bad is that it can be confusing when there is tuple, as opposed to another use of a ‘,’ in python.  Note that on the left of the assignment, it looks like a tuple, but the left of an assignment is special syntax and not a tuple.  While a tuple can hold values, it cannot hold names.  So construct like the one below

>>> a, b = 1, 2
>>> my_tuple = a, b
>>> *(my_tuple), = 6, 7
>>> a, b  # see, a and b are not changed
(1, 2)

Does not assign to a name that does not appear on the left of the assignment. In this case the ‘,’  is not indicating a tuple, but indicating ‘destructured assignment’.  Not every ‘,’  in python indicates a tuple even though a comma alone may indicate a tuple.  The comma syntax being brief is very useful most of the time, but can lead to some quirks as to whether a ‘,’ means a tuple or has another use.

>>> a = 3
>>> type(a)
<class 'int'>
>>> a = 3,
>>> type(a) #just one comma and a is tuple
<class 'tuple'>
>>> def test(a):
       print(type(a)
>>> test(1,) # does the comma mean a tuple or not?
<class 'int'>
>>> test((1)) # add brackets to make a tuple?
<class 'int'>
>>> test((1,),) # finally, now we have a single parameter that is a tuple
<class 'tuple'>

coding python tuples in kotlin

Kotlin List is close to a direct replacement for tuple, but unlike tuples and lists in python, there is no special syntax with using special brackets. “:List” for the type, “listOf()” to instance the list.

var (a1,b1) = listOf(1,2) // destructured assign a1 = 1, b1 = 2
var (a2,b2) = Pair(1,2)  // alternative destructured assign
data class XY(val x:Int, val y:Int)
var (a3,b3) = XY(1,2)  // destructured assign using data class

Note the ‘listOf() syntax to create a literal, but type is ‘List’ in a declaration.

Even in a declaration, a list can be used as a ‘drop in replacement’ for a tuple.  The syntax of declaring from a list is not as brief as python, and is not really within the kotlin ‘idiom’ to use List for a destrutured declaration.

In reality, a destructured declaration is a clear indicator that each element of the data is distinct in nature, and not really a collection.  The most common alternative to a data class in kotlin for destructured declarations are the ‘Pair’ or ‘Triple’, which are actually data classes, but without usage specific property names.

So while List is a direct equivalent to tuple, consider Pair or even a data class as the best substitute depending on the usage.

list -> MutableList

The python list can also be used in situations that are not really usage as a collection, but this tends to occur less than with lists. If there are the indexes to the list are literals, or the items in the list are not all of the same type, then consider if the list should be replaced by something other than a MutableList.  A true collection will be indexed mostly within loops. Use ‘mutableListOf() to instance a MutableList type.

common list operations:

val myList = mutableListOf(1,2) // myList cannot be reassigned, but list is mutable
val added = myList.add(33)  // add the value 3 to the (end of the) list
//  not 'add' return true if successful
myList.add(1,55) // insert into list at index 1 (2nd location)
//  list is now 1,55,2,33
remove(55)  // first the first entry matching 55 and remove (return true if successful)
val popped = removeAt(2) // remove the value found at index 2 and return that value

dictionary -> Map or MutableMap (or data class)

The direct replacement for dictionary is the MutableMap, but with no python equivalent to the Map, dictionaries are often used as maps.  If the dictionary is declared with literal values in place, then a Map (declared as mapOf( key to value) ) will be the replacement.  If the dictionary is declared empty, then a MutableMap ( declared with mutableMap() )  is likely to be the substitute.   Take care that dictionaries with literals strings for keys a probably really object substitutes and best replaced by a data class.

Common map operations.

val myMap = mutableMapOf<String,Int>()
//declared an empty map, type cannot be inferred without data
myMap.keys()
myMap.values()
myMap.toList()  // equivalent to python .items()

‘*’ and **

In python, any iterable can be used to provide parameters to a function.  Any dictionary can be used to provide keyword arguments.  kotlin has the concept of ‘varag’ which can have values provided by the ‘*’ prefix to an iterable, just as with python.  However, there is currently no equivalent to the ‘**”.  (more notes to be added on the * to be added)

list comprehension and other iterations vs map, filter, reduce

Python added map, filter and reduce together with lambda around 1994 , and then list comprehensions around 2000 with python 2.0. In that link Guido (creator of python)  presents strong arguments for list comprehensions, but notes that some people have suggested limitations to python lambda syntax is part of why in python comprehensions are favoured.  An argument is also presented on performance advantages, which in reality applies to python, but not to languages such as kotlin where compilation can support inline functions and other optimizations. It such languages, efficiencies depend on the compiler, not the technique itself.

Iterations vs map filter reduce, in both performance and appeal, is an argument that comes down to implementation details and personal taste.  Python has much stronger implementation of iterations in terms of both performance and appeal.  Kotlin has a map implementation with more performance than python list comprehensions,  but appeal is a personal choice.  Clearly, kotlin has better lambda than python, and that gives better map and filter, but comparing kotlin map and filter to the preferred python technique of iterations, cannot objectively produce a winner.   Just take the move from iterations in python to map and filter in kotlin with an open mind.

# first python
new_list = [map_func(it) for it in old_list]
//now kotlin
val newList = oldList.map{ mapFunc(it) }
#python for squaring list
new_list = [it * it for it in old_list]
//and kotlin
val new_list = oldList.map{ it*it }

# now version filtering out odd numbers
new_list = [it * it for it in old_list if it % 2 == 0]
//kotlin
newList = oldList.filter{ it % 2 == 0 }.map{ it*it }

With kotlin, a more complex, multi statement expression is possible with resorting to a ‘mapFunc’  (an external function to calculate the new value), but the two systems are similar.

Python added tuple and dictionary comprehension in python 2.7 in around 2012. Tuple comprehensions are basically identical to list comprehensions but with round brackets in place of square brackets.  In kotlin, map syntax is unchanged between immutableList (python list equivalent) and List (python tuple equivalent).

When the code is clearly cleaner for python is for a dictionary comprehension.  Starting with a map to produce a new map is case I have yet to find clean kotlin code for:

#python
new_dict = {k + '2': v * 2 for k, v in old_dict.items()}
// kotlin
val newMap = oldMap.toList().associateBy({it.first+"2"},{it.second*2})

There is more optimised code for mapping the values of a map, or mapping the keys, but mapping both at once, as you can see from above, is a little clumsy.  There may be a cleaner solution, but if so, I have not yet found it.

Sets

Sets are largely the same in both languages, with kotlin once again adding an immutable variant. Sets are not used as substitutes for other things, and the uses of a set are generally the same in both languages.  Common set operations:

val mySet = mutableSetOf<Int>()
mySet.add(3) // add value 3 to set

Arrays.

Lists, Maps and Sets, as well as data class objects, are all object stored as described in variables and objects page as having a reference stored in static (or stack) memory to an object in dynamic memory.  The ultimate in flexibility, but not the ultimate in performance.

Arrays are effectively an alternative to a List.  Fixed in size, and constructed in place. This fixed sized list, has storage of data directly in static memory, at the expense of flexibility.  For Arrays of basic types, think of allocating a block of memory to hold a fixed number items of the basic type.  If there are 10 four byte integers, then 40 bytes is required, and each integer can be addressed directly with no lookup required.  For other types, the static memory will be a block of reference to each object in dynamic memory.

In addition to efficiency, Arrays also allow interoperability with Java programs which make use of these data structures.  I will return to this section, but for now, these are features which have no direct equivalence in python.

Data Classes: Alternative to ‘Faux Collections’?

In coding solutions to problems, the choice of how to store the data can be between objects and lists and dictionaries.   Kotlin data classes can change which is the best choice.  This page examines just how tuples, dictionaries and lists can be used as for a ‘faux class’, and when to drop the ‘faux class’

Page contents(TL;DR – the kotlin solution):

The ‘struct’ problem: precursor to class?

The ‘c’ language has the concept ‘struct’, which is container for related, but not homogenous data.  Consider the following information about a person:

  • first name
  • last name
  • age
  • city

As long as the age is key in string form, ‘c’ could keep this information as an ‘array’ of 4 strings, referring to last name as ‘person[1]’ is far from ideal and there is that problem of needing to keep age as a string.   The struct provides an improved solution with descriptive names for the elements and types for each individudal field within the struct.  In c structs can be passed by value (which means copied) as will as by reference, comparison, ‘toString’ or other functions all have to be built separately.   The real lesson here is that every possible data type has a set of required methods.  In essence:  all data is an object.

Java: forced class hypocrisy

Java is a strange mixture.  The language designed at a time that Object Oriented programming was seen as the ‘magic bullet’ to end all problems in programming. C++ provided objects bolted on to the language ‘C’,  but java sought to have ‘pure’ object oriented programs,  but got the message wrong and decide ‘pure object oriented’ meant all code must be in classes, and missed that ‘all data is an object’.  The result is a language that is not really object oriented, but forces all code to be part of an object, even thought implementation of java does not even follow this edict itself.

Background: “one obvious solution” as a barrier object oriented programing in python.

Python itself started out with an underlying structure very object oriented, but allowing a procedural style for code written in python.   Python appears to follow the plan that beginner programmers can embrace a procedural style and allows for code to be procedural, often hiding object oriented underpinnings using procedural ‘syntactic sugar.

Programmers can learn python with no concept of OOP, then later learn OOP as they advance.  The language concentrates on ‘one obvious way to code’, requires that things done in a procedural method for learners,  should still appear procedural at all times. If you want one obvious way to solve a problem and the language allows a solution without OOP, then at least conceptually, an object oriented solution is not that one way.

In python, allowing beginners to code solutions without using objects, usually means allowing solutions substituting list, tuples, named tuples and dictionaries for data which might ideally be represented as ‘struct’ or objects.

In contrast, there has been no real work in the language to make it attractive to solve simple data requirements using classes.  Would this provide more than one logical way to solve a problem?  So data as an object remains still hard work. To start a useful object,  an ‘__init__’, method, a ‘__str__’ method and an __repr__ method are all required just for basic functionality. Contrast this with named tuples, where all is done automatically!

The result is a language that allows those who have not learnt object oriented concepts to progress as far as possible without ever declaring a class. Learning classes can wait, and all although code is built using a language with great object oriented foundations,  ‘faux objects’ built around collections (list, typles, dictionaries) are prevalent in python code.

‘faux collections’: python objects that appear as collections to the programmer.

But list, tuple, namedtuple and dictionary all can be used to describe data which is not really a collection. Used to pretend that objects which are not collections are collections. The danger to programming is to forget that these ‘faux collections’ are not really collections.  The ‘named tuple’, where each item in the ‘collection’ has its own name, is inherently designed for use purely as a ‘faux collection’.

List, tuple, named tuple and dictionary types are all described as collections. The concept of a collection is that all members of the collection are the same in nature.  But it is possible to use these types very effectively to describe things which are not really collections at all.  Consider some data read from a file to describe some people.  Each line of the file has ‘first name’, ‘last name’, age, and city.

So two lines of the file might be:

  • bill, smith, 23, new york
  • tom jones, 21, san Francisco

This file represents a true collection of ‘people’ because each line holds data which is the same in nature.  The first person or the ‘nth’ person are all people.  Every element in the collection has in common that it is a person.  But what do ‘first name’ and ‘age’ have in common?  The ‘collection’ of ‘first name’, ‘last name’, ‘age’ and ‘city’ can be held in a collection, but this is a ‘faux collection’.

In python:

people = []
while open("names") as lines:
for line in lines:
people.append(line.split(',"))

Would generate a list of people, but each person would be a list, where person[0] is the first name, person[1] is the last name etc.   So each line is using a collection for what really would be better as an object.  We could have a dictionary for each person so that person[‘first_name’] == ‘bill’ for our first person, and this may be more self documenting than person[0].

Python even gives named tuples, and each ‘person’ could a named tuple.

>>> from collections import namedtuple
>>> Person=namedtuple("Person", "first_name last_name age city")
>>> person=Person("bill","smith",19,"new york")
>>> person
Person(first_name='bill', last_name='smith', age=19, city='new york')
>>> person.age
19

The named tuple works exactly like a class, with the limitation all values are immutable. Like the ‘c’ struct, again the elements have a name, but there are more methods like ‘toString()’ already available.

A frequent request with python is for a ‘named list’, mirroring ‘named tuple’ to work just like a regular class.  But why not just make a class?  The reason is that a class definition requires a lot more code, with an __init__ and an __str__ and a __repr__ increasing the one or two lines required to declare our named tuple into around around 11 lines of code!

The kotlin solution: data classes

Consider this alternative to representing the ‘person’ from the previous section as a list, dictionary, tuple or named tuple.

data class Person(var first_name:String, var last_name:String, var age:Int, var city:String)

In one line we can define a class with a ‘constructor’ (equivalent to python __init__) a toString(), and even an equals comparator and a ‘toHash’. Using ‘val’ in place of var reproduces the ‘namedtuple’, but as used above it delivers on the request for a ‘namedlist’.  The python ‘namedtuple’ is really an class definition substitute, but in kotlin we can have an actual class just as easily.  This ease of use of a class makes many of the uses of dictionaries, lists and tuples in python redundant, and keeps the use of the kotlin equivalents to actually being used specifically for collections, and not the ‘object substitutes’ that usage that often occurs in python.