Data Classes: Alternative to ‘Faux Collections’?

In coding solutions to problems, the choice of how to store the data can be between objects and lists and dictionaries.   Kotlin data classes can change which is the best choice.  This page examines just how tuples, dictionaries and lists can be used as for a ‘faux class’, and when to drop the ‘faux class’

Page contents(TL;DR – the kotlin solution):

The ‘struct’ problem: precursor to class?

The ‘c’ language has the concept ‘struct’, which is container for related, but not homogenous data.  Consider the following information about a person:

  • first name
  • last name
  • age
  • city

As long as the age is key in string form, ‘c’ could keep this information as an ‘array’ of 4 strings, referring to last name as ‘person[1]’ is far from ideal and there is that problem of needing to keep age as a string.   The struct provides an improved solution with descriptive names for the elements and types for each individudal field within the struct.  In c structs can be passed by value (which means copied) as will as by reference, comparison, ‘toString’ or other functions all have to be built separately.   The real lesson here is that every possible data type has a set of required methods.  In essence:  all data is an object.

Java: forced class hypocrisy

Java is a strange mixture.  The language designed at a time that Object Oriented programming was seen as the ‘magic bullet’ to end all problems in programming. C++ provided objects bolted on to the language ‘C’,  but java sought to have ‘pure’ object oriented programs,  but got the message wrong and decide ‘pure object oriented’ meant all code must be in classes, and missed that ‘all data is an object’.  The result is a language that is not really object oriented, but forces all code to be part of an object, even thought implementation of java does not even follow this edict itself.

Background: “one obvious solution” as a barrier object oriented programing in python.

Python itself started out with an underlying structure very object oriented, but allowing a procedural style for code written in python.   Python appears to follow the plan that beginner programmers can embrace a procedural style and allows for code to be procedural, often hiding object oriented underpinnings using procedural ‘syntactic sugar.

Programmers can learn python with no concept of OOP, then later learn OOP as they advance.  The language concentrates on ‘one obvious way to code’, requires that things done in a procedural method for learners,  should still appear procedural at all times. If you want one obvious way to solve a problem and the language allows a solution without OOP, then at least conceptually, an object oriented solution is not that one way.

In python, allowing beginners to code solutions without using objects, usually means allowing solutions substituting list, tuples, named tuples and dictionaries for data which might ideally be represented as ‘struct’ or objects.

In contrast, there has been no real work in the language to make it attractive to solve simple data requirements using classes.  Would this provide more than one logical way to solve a problem?  So data as an object remains still hard work. To start a useful object,  an ‘__init__’, method, a ‘__str__’ method and an __repr__ method are all required just for basic functionality. Contrast this with named tuples, where all is done automatically!

The result is a language that allows those who have not learnt object oriented concepts to progress as far as possible without ever declaring a class. Learning classes can wait, and all although code is built using a language with great object oriented foundations,  ‘faux objects’ built around collections (list, typles, dictionaries) are prevalent in python code.

‘faux collections’: python objects that appear as collections to the programmer.

But list, tuple, namedtuple and dictionary all can be used to describe data which is not really a collection. Used to pretend that objects which are not collections are collections. The danger to programming is to forget that these ‘faux collections’ are not really collections.  The ‘named tuple’, where each item in the ‘collection’ has its own name, is inherently designed for use purely as a ‘faux collection’.

List, tuple, named tuple and dictionary types are all described as collections. The concept of a collection is that all members of the collection are the same in nature.  But it is possible to use these types very effectively to describe things which are not really collections at all.  Consider some data read from a file to describe some people.  Each line of the file has ‘first name’, ‘last name’, age, and city.

So two lines of the file might be:

  • bill, smith, 23, new york
  • tom jones, 21, san Francisco

This file represents a true collection of ‘people’ because each line holds data which is the same in nature.  The first person or the ‘nth’ person are all people.  Every element in the collection has in common that it is a person.  But what do ‘first name’ and ‘age’ have in common?  The ‘collection’ of ‘first name’, ‘last name’, ‘age’ and ‘city’ can be held in a collection, but this is a ‘faux collection’.

In python:

people = []
while open("names") as lines:
for line in lines:
people.append(line.split(',"))

Would generate a list of people, but each person would be a list, where person[0] is the first name, person[1] is the last name etc.   So each line is using a collection for what really would be better as an object.  We could have a dictionary for each person so that person[‘first_name’] == ‘bill’ for our first person, and this may be more self documenting than person[0].

Python even gives named tuples, and each ‘person’ could a named tuple.

>>> from collections import namedtuple
>>> Person=namedtuple("Person", "first_name last_name age city")
>>> person=Person("bill","smith",19,"new york")
>>> person
Person(first_name='bill', last_name='smith', age=19, city='new york')
>>> person.age
19

The named tuple works exactly like a class, with the limitation all values are immutable. Like the ‘c’ struct, again the elements have a name, but there are more methods like ‘toString()’ already available.

A frequent request with python is for a ‘named list’, mirroring ‘named tuple’ to work just like a regular class.  But why not just make a class?  The reason is that a class definition requires a lot more code, with an __init__ and an __str__ and a __repr__ increasing the one or two lines required to declare our named tuple into around around 11 lines of code!

The kotlin solution: data classes

Consider this alternative to representing the ‘person’ from the previous section as a list, dictionary, tuple or named tuple.

data class Person(var first_name:String, var last_name:String, var age:Int, var city:String)

In one line we can define a class with a ‘constructor’ (equivalent to python __init__) a toString(), and even an equals comparator and a ‘toHash’. Using ‘val’ in place of var reproduces the ‘namedtuple’, but as used above it delivers on the request for a ‘namedlist’.  The python ‘namedtuple’ is really an class definition substitute, but in kotlin we can have an actual class just as easily.  This ease of use of a class makes many of the uses of dictionaries, lists and tuples in python redundant, and keeps the use of the kotlin equivalents to actually being used specifically for collections, and not the ‘object substitutes’ that usage that often occurs in python.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s