TDD or Not TDD? That is the question!

What actually is TDD (Test Driven Development) ? Is TDD Dead?

Do you associate this term for when Tests actually Drive Development,  or use the label TDD for the practice of ensuring code coverage by having units tests? TDD can be taken to mean different things than the original meaning, and there are some risks from that shift in meaning.

I recently was searching online for discussion on TDD, and was surprised to find many pages describing TDD as simply ensuring unit tests are in place, but then other pages using TDD to refer to when Test actually Drive Development.  This difference in definition result in considerable confusion.

This page looks at what people is accepted as best practice today, how that fits with the original meaning of TDD, and the dangers and problems that do, and have already, resulted from a shift in meaning of TDD, what is dead and what is not dead.

Topics:

Terminology

Unit Test

It is generally assumed that a reader of this page will know what a ‘unit test’ is, but for clarity, a unit test a is program function that sets up specific inputs and then calls a target software ‘unit’  in order to verify the output of the target software unit is as expected, when given those specific inputs.  A software unit could be a function, a class or a module or even an overall software package.

Unit Tests

‘Unit Tests’, plural, or perhaps even clearer (but longer) a ‘unit test suite’ denotes a set of unit tests that should contain sufficient individual tests to infer that a software ‘unit’ will perform as expected, for each possible combination of inputs that the software unit under test could be expected to encounter in normal use.

TDD (Test Driven Development)

There is no universal agreed meaning of TDD.  There is the original meaning by Kent Beck, and some say even Kent has changed ideas as we all do, but the original meaning is the only one in a book, so on this page I will tend use that original meaning, except where I specifically discuss how people take TDD to mean something different.

From the original meaning, TDD is using tests to drive development. Such tests are specifically created not to form a test suite, but to enable software design and development. Some tests created during Test Driven Development are useful for a test suite, some may become redundant once software has been developed, and the TDD process does not automatically result in a complete set of Unit Tests.

Assertion Test.

This is a term introduced here, and can help reading this page if nothing else.  Unit tests can have one or more assertions. These assertions should together make a cohesive Unit test and that is discussed on another page. In the following examples, Uncle Bob sometimes says he is adding a new unit test, when in fact he then adds a new assertion to an existing unit test.  How many assertions does it take to make a unit test? Ideally one, but in real world it may take more.  When this page refers to an assertion test, it is a an individual (assertion) component of a unit test, and it could be confusing to describe that as a unit test.

Common to both TDD tests & Unit Tests (Test Suites)

Tests: The Only Real Specification

What does a program actually do? It passes the tests.

Any other specification is what someone believes the program should do, not what the program actually does.

A program is measured by its tests, and the result of those tests are the only real specifications.  Confusingly, sometimes design goals are described as specifications.

If you consider the specification of a camera, or car. Almost all specifications are established by measuring the values that are specifications, eg. engine power in horse power or kilowatts.  Certainly, the measured value may match the value that was the design goal, but for example if the car had a design goal of engine power 110kw but actually is measured to produce 105kw, it is only the measured value, not the design goal, which can be quoted as the product specification.  If the design goal was quoted as a specification, a customer would feel mislead.

A program is measured by its tests, and the result of those tests is the real specification.

Easily Repeatable Automated Tests Are Best.

Some code is difficult to test automatically. How do you test a function with a program that prints for example?  For some code it is simply far easier to run the program a see what prints.  In almost all cases, a system redesign to allow an automated Unit Test is the only satisfactory solution.  Unit tests can even be presented as a system specification.

A Failing Test Before Any Production Code.

No code should ever be written without first predetermining what the code should do.  This simply means do not start a task without first deciding what constitutes completing that task. For unit tests, add the unit test before the code is in place (if the production code already exists, still run the test before including the code in the system). For TDD as originally proposed, the test should be added before the solution has been determined.

TDD vs Unit Tests

A TDD Example with ‘Uncle Bob’

The following video of a talk by Uncle Bob is very useful, but quite long, so the main points will be discussed here without needing to watch entire video. Consider now the  video from 24m05s through to 42m:00s.

A total of 10 assertion tests are created.  The first 9 assertion tests are best described as TDD tests, with the 10th test the only actual unit test assertion.  This is because as the story unfolds, as told in the video, assertion tests 1 though 9 are all created without first creating the algorithm.  There is no algorithm other than what emerges as a result of incrementally adjusting code to pass tests. These tests drive development of a solution to the requirements of each test.  Test 10  (line 18) fits the definition of a conventional unit test.  The algorithm code already exists and works before this last test is written, and this test never exists as a failing test.

In fact it could be argued that all of the first 9 assertions are no longer required once test #10 is added.  It could be argued that at least the first test helps at least with documentation.  Perhaps even the first and second test add to explanation of the code, but clearly having an assertion test for every value from 1 through 9 is somewhat redundant.

On the other extreme, test cases such as factoring 0 (zero), or negative numbers, are not considered.  Sufficient tests to drive the development does not automatically ensure a full set of tests for all case, and can result in some tests not really required once the development is complete.

Unit Test Without TDD Example

TDD or not, there is a important rule that the test should be in place before the code to be tested is in place, which enable verification that can fail,  but that requirement does make the test drive the solution.  In fact, if the solution is obvious, the solution will drive the test.

Clearly, at least by the time of example video,  Uncle Bob actually knew in advance how to code to solution to prime factors. If you are Uncle Bob and already know how to code the solution, why not move directly to test #10?  The advantage of using tests to drive development, is that you can built up to the solution by adding new tests cases.  while having certainty that previous functionality still works.  A solution can be developed step by step, with the increasing set of tests providing certainly every previous step is not being broken.  But what is the point of those steps if you already know the complete solution?   In that case, why not just  create a tests that validate the overall solution.

If you have an algorithm at the outset, then you could move directly to test number 10  factorsOf(2x2x3x3x5x7x11x11x13) and bypass all the simplistic tests 1 through 9, that test cases so simple that if any of those simple cases failed, test 10 would fail anyway.

Benefits and Limitations of TDD.

Benefits

The promise of TDD is that the problem can be reduced to the simplest solution that passes the required tests, and allowing a simple solution.  When a complete solution seems challenging, instead of being locked out by the design challenge, development can commence immediately and build the solution piece by piece.  In the Uncle Bob example, a solution to factorsOf()arises from the tests without any formal design process.  In the late 90s, when Kent Beck and others first developed TDD this seemed like magic.  Not only did solutions arise without a formal design, process, they say that elegant solutions could arise as from testing. It seemed all solutions could be provided this way, something which most proponents (including Uncle Bob as discussed below)  have since come to realise is not true.  Design driven from tests can solve problems not solved otherwise, but it simply is not an optimum solution, or even a solution, for every problem

Limitations.

camel-is-a-horse-with-drop-shadowThere is a famous quotation  ‘a camel is a horse designed by a committee’. The implication being when design tasks are split, an elegant overall design can be missed.  Consider the factorisation function called with 101:   factorsOf(101)

The main loop will test if every number from 11 through 100 is a factor of 101, when once 11 (where 11×11 > 101) is reached, it is already clear the number is prime.  No number between 11 and 100 need be tested.  Perhaps development driven by tests would never discover this inefficiency?

Balancing Benefits and Limitations.

A solution arrived at through tests will not always be better than a solution planned by studying the overall problem.  The best approach is to consider both methods and compare solutions.  Driving to a solution through tests can breakthrough when no overall solution is clear, but in the end very few software projects are as simple overall as the factorsOf example.  Most often it is only parts of the solution that will have an immediate clear solution.

Solutions where possible should start with an architecture, but as code is built and tested the results allow for redefining the architecture.

In some ways, the only difference between may be immediately apparent solution and the solution driven by steps is the size of the steps a problem.  The factorsOf() project could actually be tacked as a single step, with a single test to be passed.  But if the solution is not apparent, then break it into steps and incrementally add tests.

Most software projects are more significant than ‘factorsOf” and are too large to be developed in one step before testing.  They should be broken into steps, but should those steps be broken into smaller steps?

The balance between driving to a solution with staged tests and simply testing for the end result comes down to choosing the right sized steps to tackle as a single step.

The full original TDD has its place, but a more balance development process should be taken overall.

The Three Rules of ‘TDD’?

Newton created three laws of motion.  There are three laws of thermodynamics.  Hey, even Isaac Asimov got to write three laws, so why not Uncle Bob?  Note there questions on to what definition of TDD these three rules apply. But in the case of both thermodynamics and Isaac Asimov, later review resulted in a more fundamental ‘zeroth’ law, so perhaps some review of Uncle Bobs laws is also acceptable?  Uncle Bob compares his laws to procedures that surgeons treats a ‘law’.  Although failure to follow the pre-surgery procedures suggest a surgeon is unprofessional, it should also be considered the following the procedures does not ensure a surgeon is a good surgeon. Following the laws for TDD alone will not ensure code is quality TDD code.

1. No production code without a failing test.

Recall that a test is a tangible specification, and at least at one level, this law should seem axiomatic. It could be translated as ‘have some specification of what you are going to code before you code, and you should not bother coding if the specification is already met’.

For example, if you set out to write a program that prints the national flag. Your test might be ‘when i run it, what it prints should look like the national flag’.  The test is very subjective, and could be considered an ad-hoc test, and it is very hard to automate, but it is a test.  There should always be a test before you write any code.

It is very important that the test is a unit test. However, in the rare cases a unit test is not practical, having a test that is as concrete as possible is still essential. The clear the specification..  A project can be started without a concrete overall specification, but at the very least each stage should be specified before that stage is commenced.  The specification, and hence the test, can still have flexibility.  But how flexible and deciding what test(s) to  apply is critical.

I suggest this law is essential to any software development. No production code without a failing test, and unless there is a very sound reason why it is impractical, that test should be a unit test.

2. Apply tests one at a time, in the smallest increments possible

I have changed this ‘law’, and in fact still do not regard it as a clear ‘law’, but more of a goal.   The goal is hard to word with the precision required for a ‘law’, and it is more difficult to determine when it is being broken or followed. The original wording from Uncle Bob: You are not allowed to write any more of a unit test than is sufficient to fail, and compilation tests are failures.  has two problems.  1) it is open to reading as making mandatory the  very part of the original Kent Beck definition of TDD  Uncle Bob is on the record as saying is ‘horseshit’ (more on this later on this page),  secondly the wording is open to different interpretation.

The original Kent Beck  definition of TDD would require strict adherence to tests driving all development- including design. The code to meet test number ‘n’  for a system (test=specification) must be in place prior to writing test number ‘n+1′  ( the next specification).   Strictly adhering to this principle would mean if someone says to you, “I want a new program, and it must do these three things…” you would stop them and say… “No, wait, I can only record one specification detail at a time!  Wait until the code is in place for the first thing, before considering any further functionality!”.   More normal convention would suggest that if it is planned that there are three things the program should do, surely what those three things are can be written down.  If you have good tools, the bet way to record those three ‘things’ or specifications is to record what they are as tests.  They those tests can still be activated one at a time, and that is what should be done.  Appropriate TDD is to activate tests on the code incrementally one at a time, but actually recording them ahead of time should not be banned.  It is sill possible to amend the specifications/tests as the system develops, without banning writing down suggested specifications/tests ahead of time in any form… either as code or as any other language form.

The second problem of the ‘law’ is that words are open to interpretation. What exactly is sufficient to fail?  Perhaps ‘sufficient to be used as a failing test’ makes more sense?  And what does ‘write’ mean?  If a future test occurs to you ahead of time, you should never write it down? In practice, there should be some way of recording that tests are not to be applied yet, even if it means commenting them out or preferably marking them as ‘future’ or some agreed notation.  With the factorsOf() example as explained and coded in the video, one assert at a time makes sense.  But if you know the solution, in which case there are too many asserts in the example, then adding all asserts you do need before adding code that should pass all asserts immediately simply makes sense. In fact, in the example, the last assert could be interpreted as several tests in one…..but it is still practical.

3. One there is code passes tests, do not progress before considering tests for other condition for the code just added.

Ok, this is not what Uncle Bob said in his laws (although it is followed in his example).  It could be claimed that this is about sound unit tests rather than under the heading TDD, but different people have different interpretations of terminology.

Uncle Bobs third law is stated as You are not allowed to write any more production code than is necessary to pass the one failing test.  This to me is simply restating the first law. Don’t write production code without a failing test.  Once the test is passes, then you no longer have a failing test.  This rule describes what you should not do once production code passes tests ….  but rather than a reminder of law 1, perhaps consider what you should do once production code passes tests.  What you should do is think of other tests that are need for that code.  In the factorsOf() example, Uncle Bob adds his final test, exactly as described here.  What other tests are needed?  In this case the factorsOf(2x2x3x3x5…)  test is added.  This test never fails, shows Uncle Bob actually follows this amended third law.

The Confusion: Is TDD Dead?

At least three interpretations of the term ‘TDD’ are in use, including :

  1. The Original Kent Beck Full Concept of Using Tests to Drive Development (including design)
  2. Never Code without a failing test
  3. Any Use of Unit Tests is TDD

With such variation of meaning confusion sets in.  One expert, who is using definition number 2, declares “any development not using TDD is unprofessional”.  Then another expert, hearing the statement but themselves using definition #1 responds “TDD has some uses, but more elegant designs can result from not using TDD”.  Then a third, non expert, hears that second statement, but connects the statement with definition #3 and declares “experts declare that Unit Tests block the writing of quality software”.

You can see this play out over and over on the internet. You will people claiming TDD is essential and others claiming TDD is dead….. without the posters  ever checking what exactly either those they are debating with our their sources actually specifically mean by TDD.

Here is Uncle Bob declaring that a key original idea of TDD is ‘horseshit’ .  Promoting a new definition to TDD has the problem as pointed out Jim Coplien, is that people will find the original definition from the books and talks defining the topic, and believe that original idea is what they are being instructed to do.

Is TDD dead?

One of the original ideas within the original definition of TDD, that building all system architecture from tests will always product the best solution,  is indeed dead.  Nothing else about the original TDD idea is dead.  Unit tests are not dead, and build tests before coding is certainly not dead.  Requiring all design to originate from tests  is the only part of TDD that is dead.  Building architecture from tests is also NOT dead,  but it now recognised that it will often not build the best architecture and is just one alternative, no longer a mandate.   It has since be realised that traditional system design still makes sense, and is still needed.  TDD is usually now redefined not included that one dead idea, and as such TDD is not dead, just the one idea that went too far.  In fact TDD is redefined to mean many different things. Redefining TDD as something new, like TDD=Unit tests, and then declaring this redefined TDD is dead is just confusing.

I have even seen more than one debate, as with the  example already quoted from, where the against-TDD speaker effectively concedes that TDD as defined by the pro-TDD  speaker does make sense, and it that one specific part of the original definition that is dangerous.   Arguments for and against TDD tend to be arise from different interpretation of  just what TDD actually means, and what definition different people are using.

Conclusions.

Different definitions of what TDD means are in circulation. Before considering any point of view on TDD, it is advisable to check how the source of the opinion is interpreting the term TDD.  The originators of TDD did get ‘carried’ away with the capabilities which are very useful, but those original ideas should not be into laws.

Code should only be written with a test first identified, and unless there is a very good reason otherwise, that test should be a unit test.

Driving Development by Tests is useful, especially for specific detailed problems, but is not a practice that provides all the answers and may not answer the big picture of what is required.

In all cases, productions code should only be written with a test first identified, and unless there is a good reason why not, that test should be a unit test.

Neither full TDD, nor writing code only to failing tests,  will automatically result in a full Unit Test suite.

Advertisements

Building DSLs: Why, When & How?

As outlined on ‘what is a DSL’,  both intent and implementation of DSLs vary considerably. The two types of internal DSL are most relevant to these pages, and how to implement an external DSL can be is ‘off topic’ for these pages, but the goals of external DSLs and a DSL-Full are the same, so the why is discussed.

Topics:

  • Why Create A DSL
    • Why Extend an Existing Language?
    • Why Create a New Language?
  • When
    • When to extend a language? (almost always)
    • When does an extension become a DSL? (Less often)
    • When to create a Detached DSL (very rarely)
  • How to build a DSL?
    • Names
    • DSL building Tools

     

     

Why Create (A) DSL?

Semantically there is a difference between ‘domain specific language’ and ‘a domain specific language’. Domain specific language may be just new words, but ‘a domain specific language’ implies a complete new language.

Why Extend an Existing Language?

As discussed in language, new a language becomes extended in order to allow more precise and concise expression of ideas.

Having a new word which replaces a list of phrases is the human language of, for example, a function that contains several statements.  With human language, if the concept conveyed by the new word is used rarely, then it may be better state the list of phrases than use the new word which few people will bother to learn.  In the computing example, the function is not really new language if people using the function have no reason to remember exactly what the function does.   A function simply used once is for that reason not really new language.  The function would need to be used often, or describe a concept that itself is a building block to another concept, so the function allows understanding an even more complex function.  With either of these tests, there is reason for functions or other language building blocks to be considered new language.

Why Create a new language (Kotlin DSLs).

The previous section described why create new language which can extend an existing language with new vocabulary, but creating a new language is a much bigger step, and requires further justification, especially considering that learning a new language is a far greater barrier than learning extensions to a language already known.

There are two possible reasons:

  • The may be no common language starting point for those using the new language, specifically those learning the resulting detached dsl should not need to know the host language used to build the detached dsl.
  • The detached dsl (or semi detached DSL) may be created to allow expression in a form already known by users of the DSL. For example, xhtml is designed to reflect html and allow representing html data structures within kotlin, and allow those data structures to look more familiar to people who already know html (which is itself a language)

If moving to Kotlin from Python, there are new ways to create a detached DSL, and the temptation is to overuse this new capability.  It can be important to consider the real reasons for creating a detached DSL.

When?

Can a program ever avoid extending a language?

DSL methodology suggests that almost all programs define new language, at least for the domain of that program.  This is a parallel to the way any novel defines some specific language, even if it is only the names of the central characters.  In neither of these two cases is it actually usually thought of as ‘new language’, but rather, simply the way language is always being extended.

In a novel, the equivalent to a DSL typically only results if the novel is written for a specific domain, such as with science fiction or fantasy novels where the ‘universe’ described is in some way different from a universe already familiar to the reader. A new ‘universe’ is the building of a setting for the novel, and that setting could equally be used for other novels.

Almost every program will deal with some concepts not already familiar to a given reader, but while this can be minimised by keeping to language features and packages already widely known, there will still be variables and functions which will need to be defined.  However, even thought variables are defined, well chosen names can result in no further definition being needed.  For example, the variable ‘i’ when used as a loop iterator.

A program can keep new language to a minimum, but almost every program will create some new language, which means two important guidelines should be followed

  • craft definitions so that they create new definitions the work linguistically
  • consider when to move new language to a separate DSL

When does ‘additional language’ become a DSL?

The formulae for creating a DSL

The quick answer would be that anytime code is moved from the main application to a separate package it could be considered a DSL. However,  simply moving code to a separate package does not ensure a DSL that is actually workable as a DSL, or could be accurately described as a DSL.

There are two additional requirements of  the code moved into a separate package:

  • the code must be sufficiently independent of the ‘donor’ application for fully by other independent applications and can be documented and understood without the need to understand the original application
  • The code moved must be cohesive, and not simply a set of independent utilities.  Although a set of independent utilities may be shared by several applications, if such utilities are entirely general purpose in nature then they do not constitute ‘domain specific language’.  Such utility libraries can still be useful, but being general in nature require wider uptake as general language enhancements to gain acceptance across the wider use case.

 

Metaprogramming and Other Magic.

It is common to think of Metaprogramming as a clear indicator of a package being best described as a DSL.  Metaprogramming (and other DSL tools) can be like super powers that give a language extension supernatural abilities.  The Python Briefly DSL package specially mentions metaprogramming in the first few lines of the description. The definition of Metaprogramming is where programs manipulate code as data.  Since the code can be manipulated as data, this allows changing what is meant by any given code, which allows that code to take on new meaning.

Metaprogramming allows code to differ from or become detached from the host language. The reality is that meta programming allows creation of detached or semi-detached DSL features,  but such features will most often come at the expense of the DSL being more difficult to learn.  Caution is needed when using features that can appear magical, because although magic is fun, part of the fun can be that it is hard to understand.  Metaprogramming can make the DSL code mimic another external DSL that people already know (as with kotlinx.html) or can allow complex actions to be in the background as with kivy properties,  or python dataclasses, but care is needed to ensure these features are productive, rather than just fun at the expense of complexity. Metaprogramming or treating code as data, does not itself make a DSL. Languages such as LISP always treat code as data, so is every LISP program an implementation of a DSL?

It is the new capabilities added that make a DSL useful, not how magical it appears.

Don’t almost all programs  build a DSL? No.

Even hello world? (clearly a stretch)

You could really stretch definitions and argue that even hello world builds new vocabulary.  In the case of “hello world” the only new vocabulary created is available in a shell. If the program is called “hello”, then the computer gains new syntax in ‘the shell’ such that typing ‘hello‘ now does something, and prints “Hello world”. The shell gains “hello” as one new word of vocabulary. This is not a DSL as the new vocabulary is not available at development time.

The reality is the functionality provided by ‘main’ is available only to the Operating System, but every other function, or class is a new building block available to main.

The Real DSL Test: Code with Classes or functions beyond main.

For any program with functions or classes outside of main,  main can make use of those functions or classes as building blocks for the code in main.  The rest of the program defines at least some specific language or ‘mini-DSL’ that gives main an extended language to work with.

Yes, main has extended language to work with, and good program design requires that extended language is well designed, but that does not make a DSL unless the extended language can be used in a domain that is broader than a single application.

How to Build a DSL?

Names

Most new vocabulary is simply names.  While some novels have far more extended language, almost any work of fiction will need to introduce names for characters that will come to have more meaning as more details become associated with those names. In some novels the names can seem to similar and it is harder to remember who is who than with other novels.

There is an art to choosing good names, and naming things is hard.

The simple art of choosing good names is central to building domain specific language as much of any new language will be names.

DSL building tools

standard tools

Every language has declarations of variables and functions, and classes or types.  These allow adding verbs and nouns.

Metaprogramming

This allows writing code that itself writes or modifies other code.  This allows code to act as a modifier of other code, in the manner of  adjectives/adverbs in conventional language.

Others?

Various languages offer specific syntax such as decorators or operator overloading that are specifically designed for language extension.

Conclusions

Almost every project has Domain Specific Language

Every program that contains functions and classes, is building some DSL for use by the main function.  So every program builds a ‘partial-DSL’, which generally only becomes  a ‘true DSL’, if the functions and classes are separated into a separate package, are reusable , and defines functionality is independent of any specific application, and provides toolkit functionality which can extend language capabilities.

Building DSLs: part of the fabric of programming

Defining variables, functions, and classes is major part of programming. Even the algorithms, are then wrapped in functions, and then become part of the vocabulary.  Almost every part of coding is building new vocabulary, and building vocabulary is a core part of extending a language, regardless whether the resulting language then becomes a DSL. The only question is ‘do the language extensions provided by this program become an external  DSL – or more like a ‘glossary’ for the one project.

Language: The Core of DSL design

For both standalone DSLs and language extensions, to build a DSL is to build new language. Being restricted to specific domain should allow the language to be small and simple, but it is still building new language.  How do we keep new language intuitive and simple? This page looks at the basics for guidance.

  • What is ‘language’ anyway?
    • vocabulary
    • syntax
  • Language Evolution
    • Computer Language
    • Human Language
  • DLS – Standard Language is only the Starting Point
    • Human Language
      • Jargon
      • Jargon Like Language in Non-Technical Domains
      • Language Augmentation for Every Domain
  • How Do We Learn Languages?
  • Why does language evolve, why do jargons emerge?
  • Conclusion

What is Language anyway?

Vocabulary.

The core of language is vocabulary.  Words provide very concise representation of concept or object.  For example,  the word ‘tree’ represents to most people somewhere between the dictionary definition of ‘tree’ and the Wikipedia entry for ‘tree‘.  In fact there are many different meanings of tree depending on context or ‘domain’ of the discussion.  Without a specific context, we would normally assume the  botanical tree,  and just in that one case the between dictionary and Wikipedia, there is so much information that a person may associate with the word tree, that they could spends pages of words explaining all that is communicated by just one word.

Syntax.

Each language also has a syntax, or set of rules for how words are combined to represent the interactions of the concepts referred to by vocabulary.  While the vocabulary is highly dynamic, constantly changing, syntax tends to be far more stable for any given language.

Language Evolution and Continual Extension.

Computer Languages

Consider the C language.  Different people took the language along different paths, such that it was seen as necessary to create a definition of ‘Standard C’ know as Ansi C, with that standard being agreed in 1989.

The large number of extensions and lack of agreement on a standard library, together with the language popularity and the fact that not even the Unix compilers precisely implemented the K&R specification, led to the necessity of standardization. (Quoting Wikipedia)

But people keep not only making their own extensions, they also want some components of these extensions to become part of the language itself, leading to standards updates in 1999 and again in 2011,then there have been further standards in 1999 and 2011.  Many things added in a new standard are already widely in use prior to adoption in the standard. Note there are proposals for a new standard for around 2021.

Every language keeps being extended.

Human Language.

It becomes clear from reading Shakespeare than human languages also evolve.  They also diverge with English being in the form of English(UK), English(US), English (Australia) etc., with each of these adding new words every year. Common practice is for each language has its own authority: e..g English(UK) has the Oxford Dictionary as the official reference.  In a sense, computer languages with their standards bodies are following the same process as human languages. But human language also evolves in advance of standards as words are only added to references like the Oxford Dictionary once they are already in use. Further, not all words are in the dictionary, which is one of the reasons there have been for some time encyclopedias, which can contain words such as company names that may be commonly used, but are from categories that are not part of the official language.  In fact references like Wikipedia, become a common source of ‘what is widely understood’ even though there could be inaccuracies.

Human language, not just computer languages, always keep being extended.  Again there are ‘standards’, but language extension happens naturally and in advance of  standards.

DSLs – The standard is only the base

Human Language

Jargon

To again quote Wikipedia: Jargon is a type of language that is used in a particular context and may not be well understood outside that context.

The word  ‘context‘ can be considered a synonym for ‘domain’ as used in ‘Domain Specific Language‘ (DSL).   The reality is almost all jargon, as in the many examples,  provides vocabulary which extends a language, but that vocabulary extension is insufficient for a complete conversation by itself.  That is, most communication that makes use of jargon, still will take place mostly in a general language such as English with the jargon providing extra words and extending the language.  In this way almost all jargon can be considered as an Augmentation DSL. An exception would be maths ‘jargon’ which is not based on extension of English or another general language, with Maths, at least maths in written form, even using a different set of symbols than general languages. This clearly puts maths in the External DSL category.

Jargon Like Language in Non-technical Domains.

Is jargon always technical? Is the phrase ‘technical jargon’ an example of a tautology? There was certainly a heavy bias to the Wikipedia list of examples towards technical fields, but there is the example of the sport of cricket. Is the jargon all related to the technical aspect of cricket?

Now consider Star Wars.  Luke and Leia become words that take on a special meaning, but so do wookie, porg, Tatooine, Jedi, Sith and ‘The Force’. I feel there are as word specific to the Star Wars domain as with most of the jargon examples.

In fact any novel creates it own specific vocabulary, although it the simplest cases that vocabulary may be little more than names, the fact still remains you have define some vocabulary for any literary work.  Depending on the novel, significant amounts can be specific to the novel. Consider Lord of the Rings. Not only are characters explored, but also types of creatures, new locations and imaginary world. It can be described as a “Lord of the Rings Universe” being created.

The new vocabulary may be far short of the vocabulary of Star Wars or Lord of The Rings, and hardly a ‘jargon’ or ‘language augmentation’. but the process of introducing new words that take on meaning throughout a novel,  is the same regardless of how many or how few words are introduced.  The structure for introducing new words is part of language. Every work introduces words that when read, convey meaning not known at the outset.

Language Augmentation for Every Domain.

Beyond novels, even a household has its own vocabulary including words with meaning beyond that which would be known by any outsider.  Naturally, the names require introduction, but simple things like the rooms have labels that are associated to the family with where the room is and what is in the room and even who normally uses that room at what time. An outsider, without the domain specific extra meaning associated  to the word ‘bathroom’, a visitor may need directions. knowledge.  The domain specific knowledge and meaning that becomes added to words enables concise conversation in a way that would be tedious if every word had to be explained every time.

It seems every domain has its own words. Some words, such as names like ‘Bob’ are generally known outside the domain to be a name, but within a specific domain take on a specific meaning ‘Bob’ becomes not just a name, but a person, with a complete set of associations about that person.  Many words simply take on extra more defined meaning within a domain, while other words can be unique to the domain.

How Do We Learn Languages?

A significant part of the human brain has evolved specifically to process language.  While it is clear from the above reflection that everyone keeps learning language extensions throughout our life, relatively few of us learn new languages once we become adults.

The message is “language extension easy, new language much harder”.

When a new concept arises, we can start by describing it in full using a combination of existing words, much like quoting a dictionary definition. Adopting a single word to replace repeating the description over and over is very natural.

It is also observerd that human languages exist in related families, and learning other languages from the same family is easier than learning from new language families.  So if you know Spanish, Learning Italian may be easier than learning Hungarian.  Many computer language families are also related making moving from computer language to language easier.  If it is necessary to build and external or detached DSL, can it be a simpler language to learn if it is related to an existing and well known DSL?

Similarly, we can learn a new word from the dictionary or encyclopedia provided we can already understand the language used in the dictionary definition or encyclopedia entry.  Extending computer languages is the similarly, build a definition of a new language element using existing language, is familiar, and that definition can easily be found to learn about the new language element.  It is when we find completely new language, that understanding enough to even find the definition becomes difficult.

Language exists to enable communication. New language arises because it improves communication.  You can substitute any word in English with the meaning from the dictionary, but that is replacing a single word with one or more phrases, and sometimes losing some precision of exactly what is meant. The invention of a new word can do the opposite, it can allow reducing one or more phrases to a single word, and conveying even more precise meaning by doing so.

That is why new words, and new languages emerge.  Because creating new single word can allow conveying meaning that would otherwise require several phrases, and the single word can even convey meaning better than several phrases.  But there is a trade off – people have to learn the new word.

Jargon arises because some new words are useful only in one specific domain, so for many people outside that specific domain there is insufficient benefit to learning those words.

The benefit to new words and additional language is the ability to communicate more precisely and concisely.  The new language ‘works’ when the benefit outweighs the effort of learning that additional language.

Conclusion.

We are on a  Journey towards computers with ‘natural’ language. Since the first move from machine code to assembler, the goal has been for computer programs also to be processed by humans.  Computer language continues to evolve, to express concepts specific to computing and allow humans to interact with computer systems.

Extending computer languages in ways that mirror how we extend human languages, is in practice leveraging how the human brain processes language.

Specifically, the following points stand out from considering  human languages:

  • New words and new language allows more precise and concise expression, at the cost of learning the additional language or words
  • Defining DSLs or computer ‘jargons’ language extensions for repeated use across a domain
  • Given that extending language just for a single work of fiction is a well proven model,  a single program extending language should also be highly practical.
  • External DSLs and Detached DSLs that do not extend an existing language should be used very carefully considering the difficulty of learning completely new human languages

Python Class vs Instance Variables

Python class variables offer a key tool for building libraries with DSL type syntax, as well as data shared between all instances of the class. This page explains how Python class variables work and gives some examples of usage.

  • Class & Instance Variables
    • Instances vs the class object
    • Instance Variables
    • Class Variables
  • Using Class and Instance Variables

Class & Instance Variables

Instances vs the class object

In object oriented programming there is the concept of a class, and the definition of that class is the ‘blueprint’ for objects of that class.  The class definition is used to create objects which are instances of the class. In Python, for each class, there is an additional object for the class itself.  This object for the class itself, is an object of the class ‘type’. So if there is a class ‘Foo’ with two instanced objects ‘a’ and ‘b’, created by a = Foo() and b = Foo(), this creates objects a and b of class Foo. In Python, the code declaring the class Foo does the equivalent of, for example, Foo = type(). This Foo object can be manipulated at run time, the object can be inspected to discover things about the class, and the object can even be changed to alter the class itself at run time.

Instance Variables

Consider the following  code  as run in Idle (Python 3.6):


class Foo:
    def __init__(self):
         self.var = 1
>>> a = Foo()
>>> type(a)
'class: __man__'.Foo
>>type(Foo)
'class: type'
>>'var' in a.__dict__
True
>>>var in Foo.__dict__
False
>>> a.var
1
>>> Foo.var
Traceback (most recent call last):
  File "", line 1, in
AttributeError: type object 'Foo' has no attribute 'var'

Note the results of type(a) compared to type(Foo). 'var' appears in the a.__dict__ , but not in the Foo.__dict__ within Foo. Further, a.var gives a value of 1 while Foo.var returns an error.

This is all quite straightforward, and is as would be expected.

Class Variables

Now consider this code as run in  Idle that has a class variable in Idle (Python 3.6):


class Foo:
    var = 1
>>> a = Foo()
>>'var' in a.__dict__
False
>>>'var' in Foo.__dict__
True

>>> Foo.var
1
>>> a.var
1
>>> a.var = 2
>>> a.var
2
>>> Foo.var
1
>>> a.__class__.var
1

All as would be expected, the __dict__ results are reversed, this time the class Foo initially has var, and the instance a does not.

But even though a does not have a var attribute,  a.var returns 1, because when there is no instance variable var, Python will return a class level variable if one is present with the same name.

However, assignment does not fall back to class level variables, so setting a.var = 2 actually creates an instance variable var and reviewing the __dict__ data now reveals  both that the class level object and instance each have var. Once an instance variable is present, the class level variable is hidden from access using a.var which will now access the instance variable. In this way, code can add an instance variable replacing the value of class variable which provides what can be effectively a default value until the instance value is set.

Simple Usage Example

Consider the following Python class:


class XYPoint:

    scale = 2.0
    serial_no = 0

    def __init__(self, x, y):
        self.serial_no = self.serial_no + 1
        self.__class__.serial_no = self.serial_no
        self.x = x
        self.y = y

    def scaled_vector(self):
        vector = (self.x**2 + self.y**2)**.5
        return vector * self.scale

   def set_scale(self, scale):
        self.__class__.scale = scale

    def __repr__(self):
        return f"XYPoint#{self.serial_no}(x={self.x},y={self.y})"

The class encloses an (x,y) coordinate, and has a method scaled_vector() which calculates a vector (using Pythagoras theorem) from that coordinate, and then scales the vector from the class variable scale.

If the class level variable scale is changed, automatically all XYPoint objects will return vectors using the new scale.

Being a class variable, scale can be read as self.scale but must be set as in the set_scale() method by self.__class__.scale = scale.

The other use case illustrated is the instance counter serial_no, which provides each instance of XYPoint a unique, incrementing serial_no.

Kotlin Mobile / Kotlin Native

Kotlin is well established in mobile as a language for Android development, if not the language for Android development.

But to be mobile development tool in general, kotlin needs to also work for iOS. The path to iOS applications is Kotlin Native

Kotlin Native was first released on April 1st 2017.  It was an actual early access preview and not an April fools joke, but did not at that time include iOS as a target.  In fact pieces of iOS support gradually started to appear by August, but see the reviews below for what it could do at which date.