Scheduling: Schedule, design, code?

Or perhaps: schedule, design, build?  Either way, while that sequence may sound like waterfall, even agile is really a repetition of this sequence over and over.  This page discusses why these 3 steps are problematic for software, but still must be followed despite problems, and how to schedule projects using this sequence that initially can appear to be broken.


Build Vs Engineer: Which Best Describes Software Development?

To be able to schedule the completion of a task, it is useful to consider the nature of the task and the nature of the steps within a task.

Design: Engineering Design vs Artistic Design.

The verb design can be interpreted in subtly different ways.  The design of the Rubik’s Cube by Erno Rubik is an example of engineering design, not of artistic design. But someone can make a new Rubik’s cube with a new artistic design. When Engineering design is consider  novel the designer can apply for a patent, while with artistic design the designer considers their work protected by copyright.  The nature of the intellectual property created by different design types is sufficiently different that there are different ways to protect that intellectual property.

The engineering process as describe here gives some insight.  Note the iteration, with no way of being certain how many iterations are required to produce a result.  The implication is it is impossible to be certain how long the process will take.   However,  in most cases, with artistic design, a single iteration is sufficient, making the process easier to predict.  Depending on the task, software design can be a about artistic design, engineering design or mixture of both. In reality, the design tasks are almost always engineering design.

Generally software is written because software to do the task does not already exist. Unlike with a chair that you might build because you need another chair, software is generally no written because you need an exact copy of software that already exists.  This means there should always be engineering design required, which in turn suggests there could be a highly variable time to predict how long the process will take.

Engineering Design: A tautology?

Engineering is generally described by the dictionary as both design and building. The dictionary definition suggests a waterfall sequence, while the flow from an engineering perspective clearly conveys a more agile approach.  Regardless of the sequence, there is both design and build, however for engineering the substance is in the design.  Consider the phrases ‘engineer a bridge’ and ‘build a bridge’  as we use them today. To engineer a bridge is to either design the bridge or some detail of the bridge, while to build a bridge implies no design work is required.  As we use ‘engineer’ in this sense today, an engineering project is considered complete when the design is complete.   Building that is necessary to test the design or to be ready to test the design could be part of engineering. Once a design is complete, in todays world we then normally use ‘construction’ or simply ‘building’ and keep ‘engineering’ for when there is a new design.

Construction vs Engineering.

While both ‘construction projects’ and ‘engineering projects’ can involve building something and will need a design- the construction project is primarily all about the building, and the and the engineering project is all about the design.  A construction company will often outsource the design to an engineering company when custom design is required, and then commence construction with all design work complete.

Consider building Swedish designed flat pack furniture.   All that is required by the consumer is building or construction.  Engineering is what took place at head office to produce and test the original design. But even within that original engineering project there a simple construction components, because the design of that component already exists, and is already tested.

Construction is where the overall design already exists and is already considered tested. Engineering is where the overall design is not yet tested, although there will normally be components of the overall design that are tested.

Software Development: Construction or Engineering?

The waterfall development process works on the principle that the software development can work like construction, and all design is already complete and effectively tested.

In contrast, agile is based on considering software as an engineering project, and that even components within that engineering project are themselves engineering projects.

Waterfall Advantage: Scheduling

As in a construction industry project, there is still the initial engineering/design phase which can be very difficult to schedule. With the construction industry the main costs and the major time for the project all take place one the design is agreed, so the variable timing of the shorter, lower cost engineering/design phase is not that significant to the overall project.

Construction itself is following known steps, so it can (at least in theory) be accurately scheduled and costed.  In theory, applying this model to software could provide not only accurate overall scheduling, but with a set of discrete steps, accurate tracking of project progress.

Waterfall Advantages: Skill Diversity and Low Cost

Again consider the construction industry.  The design phase requires architects and engineers who are highly educated and costly, however most of the actual work can be done by labourers who are less expensive labour.  The construction phase still needs project managers and foremen, but few in proportion to the labourers.

Applying this same model to software results in software architects and business analysts for the design phase, and while the actual development will require project managers and team leaders, these will be few in relation to more junior programmers, and these programmers can possibly be outsourced or even off shored to lower costs.

The advantages and disadvantages of Agile

Agile has one simply advantage:  the engineering metaphor it implements actually does apply to software.  This gives the advantage of actually confronting reality, while the waterfall approach has attractions,  as a model for software it is fundamentally flawed since design is required during the ‘construction’ phase.  The result is any supposed advantages are not realistic.

The disadvantage of facing reality is that 1) it become clear that it is impossible to guarantee a schedule for a design that is not yet complete, and the design will not be complete until the software is complete, and 2) the skill diversity approach is broken as with design needed at all stages the low cost ‘labourers’ will still need to make design decisions and the impact of these decisions is significant.  So no exact scheduling of construction, and the separation of roles to analysis, programmers, project managers is flawed and at the very least needs each skill in every team, if not every team member.

Summary: There are question marks around whether software can really be reduced to the waterfall model, and that would be the only way to enable reliable time estimates.  The implication is that in project discoveries must impact schedules and cannot be avoided.  Something has to give.

Rethink: Scheduling engineering vs scheduling construction.

The Problem.

The mindset of construction is that at the project outset it becomes known what is to be done, and then good planning will result in an accurate timeline for the project.

However software cannot generally be properly reduced to the construction model,  leaving an engineering approach where it is never known exactly what is to be done until the goal is which point the project is basically complete.

The Proven Solution.

Consider car manufacturers.  They construct cars. In advance of manufacture they know very accurately what the specification of each car to be built will be, and how long to build each car.

However, now consider the engineering aspect of a car manufacturer: designing new cars to then be manufactured.  Design, prototyping and tooling.  As an engineering task, on the basis of theory described here, the time to design a new car to a given set of specifications can be estimated (educated guess), but there is insufficient information for an accurate figure.  The solution: something has to give.  Either the exact specification, or the exact amount of time must be adjustable.  Given many manufactures desire a new model each year, the time is not adjustable, so the result is the specification becomes flexible.

A fixed amount of resources is allocated to engineer improvements to the current model.  All improvement ready in time become part of the new model,  There is a list of desired improvements and ideas for improvements and these can be prioritised. The limitation is that the exact set of features that will be ready for the next model is not known at the start of the fixed length project. Only those that can be completed in time will be in the next model. Usually there will be more features and improvements identified as desirable than can be ready in time for that next model.  A list of features and improvements which are thought can be ready for the next model is made, and work starts on the list.  If a feature/improvement is not ready in time, it will have to wait as the release date for the next model cannot be pushed back.he projects include is

Applying the solution to software.

Agile allows taking the industrial engineering approach, and applying it to software.  Projects like Ubuntu Linux and now even Windows 10 releases industry have new software releases at fixed intervals.  The product versions even are based on dates and those dates are declared at the project outset.  There are have been two Ubuntu versions every year since October 2004 (4.10 Warty Warthog). One version in April, with the version number of the year and then ‘.04’ (April is the 4th month so 5.04 was in April of 2005) and then another in October with ‘.10’ for the tenth month. How do they keep such reliable schedules? Features that do not make the deadline are pushed back to the next version, that’s how.

Scheduling and Scrum.

The scrum process is like a series of mini-releases, with the completion of each sprint resulting in a new set of stories, tasks and bug fixes ready working tested and integrated into the system. Scrum planning can take the same type of approach with issues not able to be complete in time pushed back to a future scrum.

The danger is that if not managed correctly, an individual issue could absorb the team for entire time allocated to the sprint.  If that issue still cannot be completed, then there is a sprint with nothing completed.  The solution is to budget time for each issues.  When that budgeted time has elapsed, the issue should be reviewed.

The choices for the review are

  1. divide this issue into parts and push what cannot be done in this sprint back to the backlog (which could even be the entire issue)
  2. push other issues to the backlog to free time in this sprint for another allocation of time to this issue
  3. both of the above.  Push part of the issue to the backlog, but still allow a new block of time for this now simplified issue and push other issues to the backlog to free up time for the now reduced issue

Conclusion: Schedule, design, code.

Sprints should be set with fixed end dates, or at least end dates that will have only a small window of variation.  As the window approaches, new tasks are pushed back in place of being started when there is insufficient time and only tasks near completion can affect the sprint close date within a predetermined window.

Each task can be scheduled before the task is started, and the schedule should allow a high probability the task will be complete, but this cannot be guaranteed. At the end of the schedule time the task should be reviewed.  Either this task is pushed back, another task is pushed back, or the sprint will be late.

So it turns out that the sequence: Schedule, Design, Code, can work.


TDD or Not TDD? That is the question!

What actually is TDD (Test Driven Development) ? Is TDD Dead?

Do you associate this term for when Tests actually Drive Development,  or use the label TDD for the practice of ensuring code coverage by having units tests? TDD can be taken to mean different things than the original meaning, and there are some risks from that shift in meaning.

I recently was searching online for discussion on TDD, and was surprised to find many pages describing TDD as simply ensuring unit tests are in place, but then other pages using TDD to refer to when Test actually Drive Development.  This difference in definition result in considerable confusion.

This page looks at what people is accepted as best practice today, how that fits with the original meaning of TDD, and the dangers and problems that do, and have already, resulted from a shift in meaning of TDD, what is dead and what is not dead.



Unit Test

It is generally assumed that a reader of this page will know what a ‘unit test’ is, but for clarity, a unit test a is program function that sets up specific inputs and then calls a target software ‘unit’  in order to verify the output of the target software unit is as expected, when given those specific inputs.  A software unit could be a function, a class or a module or even an overall software package.

Unit Tests

‘Unit Tests’, plural, or perhaps even clearer (but longer) a ‘unit test suite’ denotes a set of unit tests that should contain sufficient individual tests to infer that a software ‘unit’ will perform as expected, for each possible combination of inputs that the software unit under test could be expected to encounter in normal use.

TDD (Test Driven Development)

There is no universal agreed meaning of TDD.  There is the original meaning by Kent Beck, and some say even Kent has changed ideas as we all do, but the original meaning is the only one in a book, so on this page I will tend use that original meaning, except where I specifically discuss how people take TDD to mean something different.

From the original meaning, TDD is using tests to drive development. Such tests are specifically created not to form a test suite, but to enable software design and development. Some tests created during Test Driven Development are useful for a test suite, some may become redundant once software has been developed, and the TDD process does not automatically result in a complete set of Unit Tests.

Assertion Test.

This is a term introduced here, and can help reading this page if nothing else.  Unit tests can have one or more assertions. These assertions should together make a cohesive Unit test and that is discussed on another page. In the following examples, Uncle Bob sometimes says he is adding a new unit test, when in fact he then adds a new assertion to an existing unit test.  How many assertions does it take to make a unit test? Ideally one, but in real world it may take more.  When this page refers to an assertion test, it is a an individual (assertion) component of a unit test, and it could be confusing to describe that as a unit test.

Common to both TDD tests & Unit Tests (Test Suites)

Tests: The Only Real Specification

What does a program actually do? It passes the tests.

Any other specification is what someone believes the program should do, not what the program actually does.

A program is measured by its tests, and the result of those tests are the only real specifications.  Confusingly, sometimes design goals are described as specifications.

If you consider the specification of a camera, or car. Almost all specifications are established by measuring the values that are specifications, eg. engine power in horse power or kilowatts.  Certainly, the measured value may match the value that was the design goal, but for example if the car had a design goal of engine power 110kw but actually is measured to produce 105kw, it is only the measured value, not the design goal, which can be quoted as the product specification.  If the design goal was quoted as a specification, a customer would feel mislead.

A program is measured by its tests, and the result of those tests is the real specification.

Easily Repeatable Automated Tests Are Best.

Some code is difficult to test automatically. How do you test a function with a program that prints for example?  For some code it is simply far easier to run the program a see what prints.  In almost all cases, a system redesign to allow an automated Unit Test is the only satisfactory solution.  Unit tests can even be presented as a system specification.

A Failing Test Before Any Production Code.

No code should ever be written without first predetermining what the code should do.  This simply means do not start a task without first deciding what constitutes completing that task. For unit tests, add the unit test before the code is in place (if the production code already exists, still run the test before including the code in the system). For TDD as originally proposed, the test should be added before the solution has been determined.

TDD vs Unit Tests

A TDD Example with ‘Uncle Bob’

The following video of a talk by Uncle Bob is very useful, but quite long, so the main points will be discussed here without needing to watch entire video. Consider now the  video from 24m05s through to 42m:00s.

A total of 10 assertion tests are created.  The first 9 assertion tests are best described as TDD tests, with the 10th test the only actual unit test assertion.  This is because as the story unfolds, as told in the video, assertion tests 1 though 9 are all created without first creating the algorithm.  There is no algorithm other than what emerges as a result of incrementally adjusting code to pass tests. These tests drive development of a solution to the requirements of each test.  Test 10  (line 18) fits the definition of a conventional unit test.  The algorithm code already exists and works before this last test is written, and this test never exists as a failing test.

In fact it could be argued that all of the first 9 assertions are no longer required once test #10 is added.  It could be argued that at least the first test helps at least with documentation.  Perhaps even the first and second test add to explanation of the code, but clearly having an assertion test for every value from 1 through 9 is somewhat redundant.

On the other extreme, test cases such as factoring 0 (zero), or negative numbers, are not considered.  Sufficient tests to drive the development does not automatically ensure a full set of tests for all case, and can result in some tests not really required once the development is complete.

Unit Test Without TDD Example

TDD or not, there is a important rule that the test should be in place before the code to be tested is in place, which enable verification that can fail,  but that requirement does make the test drive the solution.  In fact, if the solution is obvious, the solution will drive the test.

Clearly, at least by the time of example video,  Uncle Bob actually knew in advance how to code to solution to prime factors. If you are Uncle Bob and already know how to code the solution, why not move directly to test #10?  The advantage of using tests to drive development, is that you can built up to the solution by adding new tests cases.  while having certainty that previous functionality still works.  A solution can be developed step by step, with the increasing set of tests providing certainly every previous step is not being broken.  But what is the point of those steps if you already know the complete solution?   In that case, why not just  create a tests that validate the overall solution.

If you have an algorithm at the outset, then you could move directly to test number 10  factorsOf(2x2x3x3x5x7x11x11x13) and bypass all the simplistic tests 1 through 9, that test cases so simple that if any of those simple cases failed, test 10 would fail anyway.

Benefits and Limitations of TDD.


The promise of TDD is that the problem can be reduced to the simplest solution that passes the required tests, and allowing a simple solution.  When a complete solution seems challenging, instead of being locked out by the design challenge, development can commence immediately and build the solution piece by piece.  In the Uncle Bob example, a solution to factorsOf()arises from the tests without any formal design process.  In the late 90s, when Kent Beck and others first developed TDD this seemed like magic.  Not only did solutions arise without a formal design, process, they say that elegant solutions could arise as from testing. It seemed all solutions could be provided this way, something which most proponents (including Uncle Bob as discussed below)  have since come to realise is not true.  Design driven from tests can solve problems not solved otherwise, but it simply is not an optimum solution, or even a solution, for every problem


camel-is-a-horse-with-drop-shadowThere is a famous quotation  ‘a camel is a horse designed by a committee’. The implication being when design tasks are split, an elegant overall design can be missed.  Consider the factorisation function called with 101:   factorsOf(101)

The main loop will test if every number from 11 through 100 is a factor of 101, when once 11 (where 11×11 > 101) is reached, it is already clear the number is prime.  No number between 11 and 100 need be tested.  Perhaps development driven by tests would never discover this inefficiency?

Balancing Benefits and Limitations.

A solution arrived at through tests will not always be better than a solution planned by studying the overall problem.  The best approach is to consider both methods and compare solutions.  Driving to a solution through tests can breakthrough when no overall solution is clear, but in the end very few software projects are as simple overall as the factorsOf example.  Most often it is only parts of the solution that will have an immediate clear solution.

Solutions where possible should start with an architecture, but as code is built and tested the results allow for redefining the architecture.

In some ways, the only difference between may be immediately apparent solution and the solution driven by steps is the size of the steps a problem.  The factorsOf() project could actually be tacked as a single step, with a single test to be passed.  But if the solution is not apparent, then break it into steps and incrementally add tests.

Most software projects are more significant than ‘factorsOf” and are too large to be developed in one step before testing.  They should be broken into steps, but should those steps be broken into smaller steps?

The balance between driving to a solution with staged tests and simply testing for the end result comes down to choosing the right sized steps to tackle as a single step.

The full original TDD has its place, but a more balance development process should be taken overall.

The Three Rules of ‘TDD’?

Newton created three laws of motion.  There are three laws of thermodynamics.  Hey, even Isaac Asimov got to write three laws, so why not Uncle Bob?  Note there questions on to what definition of TDD these three rules apply. But in the case of both thermodynamics and Isaac Asimov, later review resulted in a more fundamental ‘zeroth’ law, so perhaps some review of Uncle Bobs laws is also acceptable?  Uncle Bob compares his laws to procedures that surgeons treats a ‘law’.  Although failure to follow the pre-surgery procedures suggest a surgeon is unprofessional, it should also be considered the following the procedures does not ensure a surgeon is a good surgeon. Following the laws for TDD alone will not ensure code is quality TDD code.

1. No production code without a failing test.

Recall that a test is a tangible specification, and at least at one level, this law should seem axiomatic. It could be translated as ‘have some specification of what you are going to code before you code, and you should not bother coding if the specification is already met’.

For example, if you set out to write a program that prints the national flag. Your test might be ‘when i run it, what it prints should look like the national flag’.  The test is very subjective, and could be considered an ad-hoc test, and it is very hard to automate, but it is a test.  There should always be a test before you write any code.

It is very important that the test is a unit test. However, in the rare cases a unit test is not practical, having a test that is as concrete as possible is still essential. The clear the specification..  A project can be started without a concrete overall specification, but at the very least each stage should be specified before that stage is commenced.  The specification, and hence the test, can still have flexibility.  But how flexible and deciding what test(s) to  apply is critical.

I suggest this law is essential to any software development. No production code without a failing test, and unless there is a very sound reason why it is impractical, that test should be a unit test.

2. Apply tests one at a time, in the smallest increments possible

I have changed this ‘law’, and in fact still do not regard it as a clear ‘law’, but more of a goal.   The goal is hard to word with the precision required for a ‘law’, and it is more difficult to determine when it is being broken or followed. The original wording from Uncle Bob: You are not allowed to write any more of a unit test than is sufficient to fail, and compilation tests are failures.  has two problems.  1) it is open to reading as making mandatory the  very part of the original Kent Beck definition of TDD  Uncle Bob is on the record as saying is ‘horseshit’ (more on this later on this page),  secondly the wording is open to different interpretation.

The original Kent Beck  definition of TDD would require strict adherence to tests driving all development- including design. The code to meet test number ‘n’  for a system (test=specification) must be in place prior to writing test number ‘n+1′  ( the next specification).   Strictly adhering to this principle would mean if someone says to you, “I want a new program, and it must do these three things…” you would stop them and say… “No, wait, I can only record one specification detail at a time!  Wait until the code is in place for the first thing, before considering any further functionality!”.   More normal convention would suggest that if it is planned that there are three things the program should do, surely what those three things are can be written down.  If you have good tools, the bet way to record those three ‘things’ or specifications is to record what they are as tests.  They those tests can still be activated one at a time, and that is what should be done.  Appropriate TDD is to activate tests on the code incrementally one at a time, but actually recording them ahead of time should not be banned.  It is sill possible to amend the specifications/tests as the system develops, without banning writing down suggested specifications/tests ahead of time in any form… either as code or as any other language form.

The second problem of the ‘law’ is that words are open to interpretation. What exactly is sufficient to fail?  Perhaps ‘sufficient to be used as a failing test’ makes more sense?  And what does ‘write’ mean?  If a future test occurs to you ahead of time, you should never write it down? In practice, there should be some way of recording that tests are not to be applied yet, even if it means commenting them out or preferably marking them as ‘future’ or some agreed notation.  With the factorsOf() example as explained and coded in the video, one assert at a time makes sense.  But if you know the solution, in which case there are too many asserts in the example, then adding all asserts you do need before adding code that should pass all asserts immediately simply makes sense. In fact, in the example, the last assert could be interpreted as several tests in one…..but it is still practical.

3. One there is code passes tests, do not progress before considering tests for other condition for the code just added.

Ok, this is not what Uncle Bob said in his laws (although it is followed in his example).  It could be claimed that this is about sound unit tests rather than under the heading TDD, but different people have different interpretations of terminology.

Uncle Bobs third law is stated as You are not allowed to write any more production code than is necessary to pass the one failing test.  This to me is simply restating the first law. Don’t write production code without a failing test.  Once the test is passes, then you no longer have a failing test.  This rule describes what you should not do once production code passes tests ….  but rather than a reminder of law 1, perhaps consider what you should do once production code passes tests.  What you should do is think of other tests that are need for that code.  In the factorsOf() example, Uncle Bob adds his final test, exactly as described here.  What other tests are needed?  In this case the factorsOf(2x2x3x3x5…)  test is added.  This test never fails, shows Uncle Bob actually follows this amended third law.

The Confusion: Is TDD Dead?

At least three interpretations of the term ‘TDD’ are in use, including :

  1. The Original Kent Beck Full Concept of Using Tests to Drive Development (including design)
  2. Never Code without a failing test
  3. Any Use of Unit Tests is TDD

With such variation of meaning confusion sets in.  One expert, who is using definition number 2, declares “any development not using TDD is unprofessional”.  Then another expert, hearing the statement but themselves using definition #1 responds “TDD has some uses, but more elegant designs can result from not using TDD”.  Then a third, non expert, hears that second statement, but connects the statement with definition #3 and declares “experts declare that Unit Tests block the writing of quality software”.

You can see this play out over and over on the internet. You will people claiming TDD is essential and others claiming TDD is dead….. without the posters  ever checking what exactly either those they are debating with our their sources actually specifically mean by TDD.

Here is Uncle Bob declaring that a key original idea of TDD is ‘horseshit’ .  Promoting a new definition to TDD has the problem as pointed out Jim Coplien, is that people will find the original definition from the books and talks defining the topic, and believe that original idea is what they are being instructed to do.

Is TDD dead?

One of the original ideas within the original definition of TDD, that building all system architecture from tests will always product the best solution,  is indeed dead.  Nothing else about the original TDD idea is dead.  Unit tests are not dead, and build tests before coding is certainly not dead.  Requiring all design to originate from tests  is the only part of TDD that is dead.  Building architecture from tests is also NOT dead,  but it now recognised that it will often not build the best architecture and is just one alternative, no longer a mandate.   It has since be realised that traditional system design still makes sense, and is still needed.  TDD is usually now redefined not included that one dead idea, and as such TDD is not dead, just the one idea that went too far.  In fact TDD is redefined to mean many different things. Redefining TDD as something new, like TDD=Unit tests, and then declaring this redefined TDD is dead is just confusing.

I have even seen more than one debate, as with the  example already quoted from, where the against-TDD speaker effectively concedes that TDD as defined by the pro-TDD  speaker does make sense, and it that one specific part of the original definition that is dangerous.   Arguments for and against TDD tend to be arise from different interpretation of  just what TDD actually means, and what definition different people are using.


Different definitions of what TDD means are in circulation. Before considering any point of view on TDD, it is advisable to check how the source of the opinion is interpreting the term TDD.  The originators of TDD did get ‘carried’ away with the capabilities which are very useful, but those original ideas should not be into laws.

Code should only be written with a test first identified, and unless there is a very good reason otherwise, that test should be a unit test.

Driving Development by Tests is useful, especially for specific detailed problems, but is not a practice that provides all the answers and may not answer the big picture of what is required.

In all cases, productions code should only be written with a test first identified, and unless there is a good reason why not, that test should be a unit test.

Neither full TDD, nor writing code only to failing tests,  will automatically result in a full Unit Test suite.

Building DSLs: Why, When & How?

As outlined on ‘what is a DSL’,  both intent and implementation of DSLs vary considerably. The two types of internal DSL are most relevant to these pages, and how to implement an external DSL can be is ‘off topic’ for these pages, but the goals of external DSLs and a DSL-Full are the same, so the why is discussed.


  • Why Create A DSL
    • Why Extend an Existing Language?
    • Why Create a New Language?
  • When
    • When to extend a language? (almost always)
    • When does an extension become a DSL? (Less often)
    • When to create a Detached DSL (very rarely)
  • How to build a DSL?
    • Names
    • DSL building Tools



Why Create (A) DSL?

Semantically there is a difference between ‘domain specific language’ and ‘a domain specific language’. Domain specific language may be just new words, but ‘a domain specific language’ implies a complete new language.

Why Extend an Existing Language?

As discussed in language, new a language becomes extended in order to allow more precise and concise expression of ideas.

Having a new word which replaces a list of phrases is the human language of, for example, a function that contains several statements.  With human language, if the concept conveyed by the new word is used rarely, then it may be better state the list of phrases than use the new word which few people will bother to learn.  In the computing example, the function is not really new language if people using the function have no reason to remember exactly what the function does.   A function simply used once is for that reason not really new language.  The function would need to be used often, or describe a concept that itself is a building block to another concept, so the function allows understanding an even more complex function.  With either of these tests, there is reason for functions or other language building blocks to be considered new language.

Why Create a new language (Kotlin DSLs).

The previous section described why create new language which can extend an existing language with new vocabulary, but creating a new language is a much bigger step, and requires further justification, especially considering that learning a new language is a far greater barrier than learning extensions to a language already known.

There are two possible reasons:

  • The may be no common language starting point for those using the new language, specifically those learning the resulting detached dsl should not need to know the host language used to build the detached dsl.
  • The detached dsl (or semi detached DSL) may be created to allow expression in a form already known by users of the DSL. For example, xhtml is designed to reflect html and allow representing html data structures within kotlin, and allow those data structures to look more familiar to people who already know html (which is itself a language)

If moving to Kotlin from Python, there are new ways to create a detached DSL, and the temptation is to overuse this new capability.  It can be important to consider the real reasons for creating a detached DSL.


Can a program ever avoid extending a language?

DSL methodology suggests that almost all programs define new language, at least for the domain of that program.  This is a parallel to the way any novel defines some specific language, even if it is only the names of the central characters.  In neither of these two cases is it actually usually thought of as ‘new language’, but rather, simply the way language is always being extended.

In a novel, the equivalent to a DSL typically only results if the novel is written for a specific domain, such as with science fiction or fantasy novels where the ‘universe’ described is in some way different from a universe already familiar to the reader. A new ‘universe’ is the building of a setting for the novel, and that setting could equally be used for other novels.

Almost every program will deal with some concepts not already familiar to a given reader, but while this can be minimised by keeping to language features and packages already widely known, there will still be variables and functions which will need to be defined.  However, even thought variables are defined, well chosen names can result in no further definition being needed.  For example, the variable ‘i’ when used as a loop iterator.

A program can keep new language to a minimum, but almost every program will create some new language, which means two important guidelines should be followed

  • craft definitions so that they create new definitions the work linguistically
  • consider when to move new language to a separate DSL

When does ‘additional language’ become a DSL?

The formulae for creating a DSL

The quick answer would be that anytime code is moved from the main application to a separate package it could be considered a DSL. However,  simply moving code to a separate package does not ensure a DSL that is actually workable as a DSL, or could be accurately described as a DSL.

There are two additional requirements of  the code moved into a separate package:

  • the code must be sufficiently independent of the ‘donor’ application for fully by other independent applications and can be documented and understood without the need to understand the original application
  • The code moved must be cohesive, and not simply a set of independent utilities.  Although a set of independent utilities may be shared by several applications, if such utilities are entirely general purpose in nature then they do not constitute ‘domain specific language’.  Such utility libraries can still be useful, but being general in nature require wider uptake as general language enhancements to gain acceptance across the wider use case.


Metaprogramming and Other Magic.

It is common to think of Metaprogramming as a clear indicator of a package being best described as a DSL.  Metaprogramming (and other DSL tools) can be like super powers that give a language extension supernatural abilities.  The Python Briefly DSL package specially mentions metaprogramming in the first few lines of the description. The definition of Metaprogramming is where programs manipulate code as data.  Since the code can be manipulated as data, this allows changing what is meant by any given code, which allows that code to take on new meaning.

Metaprogramming allows code to differ from or become detached from the host language. The reality is that meta programming allows creation of detached or semi-detached DSL features,  but such features will most often come at the expense of the DSL being more difficult to learn.  Caution is needed when using features that can appear magical, because although magic is fun, part of the fun can be that it is hard to understand.  Metaprogramming can make the DSL code mimic another external DSL that people already know (as with kotlinx.html) or can allow complex actions to be in the background as with kivy properties,  or python dataclasses, but care is needed to ensure these features are productive, rather than just fun at the expense of complexity. Metaprogramming or treating code as data, does not itself make a DSL. Languages such as LISP always treat code as data, so is every LISP program an implementation of a DSL?

It is the new capabilities added that make a DSL useful, not how magical it appears.

Don’t almost all programs  build a DSL? No.

Even hello world? (clearly a stretch)

You could really stretch definitions and argue that even hello world builds new vocabulary.  In the case of “hello world” the only new vocabulary created is available in a shell. If the program is called “hello”, then the computer gains new syntax in ‘the shell’ such that typing ‘hello‘ now does something, and prints “Hello world”. The shell gains “hello” as one new word of vocabulary. This is not a DSL as the new vocabulary is not available at development time.

The reality is the functionality provided by ‘main’ is available only to the Operating System, but every other function, or class is a new building block available to main.

The Real DSL Test: Code with Classes or functions beyond main.

For any program with functions or classes outside of main,  main can make use of those functions or classes as building blocks for the code in main.  The rest of the program defines at least some specific language or ‘mini-DSL’ that gives main an extended language to work with.

Yes, main has extended language to work with, and good program design requires that extended language is well designed, but that does not make a DSL unless the extended language can be used in a domain that is broader than a single application.

How to Build a DSL?


Most new vocabulary is simply names.  While some novels have far more extended language, almost any work of fiction will need to introduce names for characters that will come to have more meaning as more details become associated with those names. In some novels the names can seem to similar and it is harder to remember who is who than with other novels.

There is an art to choosing good names, and naming things is hard.

The simple art of choosing good names is central to building domain specific language as much of any new language will be names.

DSL building tools

standard tools

Every language has declarations of variables and functions, and classes or types.  These allow adding verbs and nouns.


This allows writing code that itself writes or modifies other code.  This allows code to act as a modifier of other code, in the manner of  adjectives/adverbs in conventional language.


Various languages offer specific syntax such as decorators or operator overloading that are specifically designed for language extension.


Almost every project has Domain Specific Language

Every program that contains functions and classes, is building some DSL for use by the main function.  So every program builds a ‘partial-DSL’, which generally only becomes  a ‘true DSL’, if the functions and classes are separated into a separate package, are reusable , and defines functionality is independent of any specific application, and provides toolkit functionality which can extend language capabilities.

Building DSLs: part of the fabric of programming

Defining variables, functions, and classes is major part of programming. Even the algorithms, are then wrapped in functions, and then become part of the vocabulary.  Almost every part of coding is building new vocabulary, and building vocabulary is a core part of extending a language, regardless whether the resulting language then becomes a DSL. The only question is ‘do the language extensions provided by this program become an external  DSL – or more like a ‘glossary’ for the one project.

Language: The Core of DSL design

For both standalone DSLs and language extensions, to build a DSL is to build new language. Being restricted to specific domain should allow the language to be small and simple, but it is still building new language.  How do we keep new language intuitive and simple? This page looks at the basics for guidance.

  • What is ‘language’ anyway?
    • vocabulary
    • syntax
  • Language Evolution
    • Computer Language
    • Human Language
  • DLS – Standard Language is only the Starting Point
    • Human Language
      • Jargon
      • Jargon Like Language in Non-Technical Domains
      • Language Augmentation for Every Domain
  • How Do We Learn Languages?
  • Why does language evolve, why do jargons emerge?
  • Conclusion

What is Language anyway?


The core of language is vocabulary.  Words provide very concise representation of concept or object.  For example,  the word ‘tree’ represents to most people somewhere between the dictionary definition of ‘tree’ and the Wikipedia entry for ‘tree‘.  In fact there are many different meanings of tree depending on context or ‘domain’ of the discussion.  Without a specific context, we would normally assume the  botanical tree,  and just in that one case the between dictionary and Wikipedia, there is so much information that a person may associate with the word tree, that they could spends pages of words explaining all that is communicated by just one word.


Each language also has a syntax, or set of rules for how words are combined to represent the interactions of the concepts referred to by vocabulary.  While the vocabulary is highly dynamic, constantly changing, syntax tends to be far more stable for any given language.

Language Evolution and Continual Extension.

Computer Languages

Consider the C language.  Different people took the language along different paths, such that it was seen as necessary to create a definition of ‘Standard C’ know as Ansi C, with that standard being agreed in 1989.

The large number of extensions and lack of agreement on a standard library, together with the language popularity and the fact that not even the Unix compilers precisely implemented the K&R specification, led to the necessity of standardization. (Quoting Wikipedia)

But people keep not only making their own extensions, they also want some components of these extensions to become part of the language itself, leading to standards updates in 1999 and again in 2011,then there have been further standards in 1999 and 2011.  Many things added in a new standard are already widely in use prior to adoption in the standard. Note there are proposals for a new standard for around 2021.

Every language keeps being extended.

Human Language.

It becomes clear from reading Shakespeare than human languages also evolve.  They also diverge with English being in the form of English(UK), English(US), English (Australia) etc., with each of these adding new words every year. Common practice is for each language has its own authority: e..g English(UK) has the Oxford Dictionary as the official reference.  In a sense, computer languages with their standards bodies are following the same process as human languages. But human language also evolves in advance of standards as words are only added to references like the Oxford Dictionary once they are already in use. Further, not all words are in the dictionary, which is one of the reasons there have been for some time encyclopedias, which can contain words such as company names that may be commonly used, but are from categories that are not part of the official language.  In fact references like Wikipedia, become a common source of ‘what is widely understood’ even though there could be inaccuracies.

Human language, not just computer languages, always keep being extended.  Again there are ‘standards’, but language extension happens naturally and in advance of  standards.

DSLs – The standard is only the base

Human Language


To again quote Wikipedia: Jargon is a type of language that is used in a particular context and may not be well understood outside that context.

The word  ‘context‘ can be considered a synonym for ‘domain’ as used in ‘Domain Specific Language‘ (DSL).   The reality is almost all jargon, as in the many examples,  provides vocabulary which extends a language, but that vocabulary extension is insufficient for a complete conversation by itself.  That is, most communication that makes use of jargon, still will take place mostly in a general language such as English with the jargon providing extra words and extending the language.  In this way almost all jargon can be considered as an Augmentation DSL. An exception would be maths ‘jargon’ which is not based on extension of English or another general language, with Maths, at least maths in written form, even using a different set of symbols than general languages. This clearly puts maths in the External DSL category.

Jargon Like Language in Non-technical Domains.

Is jargon always technical? Is the phrase ‘technical jargon’ an example of a tautology? There was certainly a heavy bias to the Wikipedia list of examples towards technical fields, but there is the example of the sport of cricket. Is the jargon all related to the technical aspect of cricket?

Now consider Star Wars.  Luke and Leia become words that take on a special meaning, but so do wookie, porg, Tatooine, Jedi, Sith and ‘The Force’. I feel there are as word specific to the Star Wars domain as with most of the jargon examples.

In fact any novel creates it own specific vocabulary, although it the simplest cases that vocabulary may be little more than names, the fact still remains you have define some vocabulary for any literary work.  Depending on the novel, significant amounts can be specific to the novel. Consider Lord of the Rings. Not only are characters explored, but also types of creatures, new locations and imaginary world. It can be described as a “Lord of the Rings Universe” being created.

The new vocabulary may be far short of the vocabulary of Star Wars or Lord of The Rings, and hardly a ‘jargon’ or ‘language augmentation’. but the process of introducing new words that take on meaning throughout a novel,  is the same regardless of how many or how few words are introduced.  The structure for introducing new words is part of language. Every work introduces words that when read, convey meaning not known at the outset.

Language Augmentation for Every Domain.

Beyond novels, even a household has its own vocabulary including words with meaning beyond that which would be known by any outsider.  Naturally, the names require introduction, but simple things like the rooms have labels that are associated to the family with where the room is and what is in the room and even who normally uses that room at what time. An outsider, without the domain specific extra meaning associated  to the word ‘bathroom’, a visitor may need directions. knowledge.  The domain specific knowledge and meaning that becomes added to words enables concise conversation in a way that would be tedious if every word had to be explained every time.

It seems every domain has its own words. Some words, such as names like ‘Bob’ are generally known outside the domain to be a name, but within a specific domain take on a specific meaning ‘Bob’ becomes not just a name, but a person, with a complete set of associations about that person.  Many words simply take on extra more defined meaning within a domain, while other words can be unique to the domain.

How Do We Learn Languages?

A significant part of the human brain has evolved specifically to process language.  While it is clear from the above reflection that everyone keeps learning language extensions throughout our life, relatively few of us learn new languages once we become adults.

The message is “language extension easy, new language much harder”.

When a new concept arises, we can start by describing it in full using a combination of existing words, much like quoting a dictionary definition. Adopting a single word to replace repeating the description over and over is very natural.

It is also observerd that human languages exist in related families, and learning other languages from the same family is easier than learning from new language families.  So if you know Spanish, Learning Italian may be easier than learning Hungarian.  Many computer language families are also related making moving from computer language to language easier.  If it is necessary to build and external or detached DSL, can it be a simpler language to learn if it is related to an existing and well known DSL?

Similarly, we can learn a new word from the dictionary or encyclopedia provided we can already understand the language used in the dictionary definition or encyclopedia entry.  Extending computer languages is the similarly, build a definition of a new language element using existing language, is familiar, and that definition can easily be found to learn about the new language element.  It is when we find completely new language, that understanding enough to even find the definition becomes difficult.

Language exists to enable communication. New language arises because it improves communication.  You can substitute any word in English with the meaning from the dictionary, but that is replacing a single word with one or more phrases, and sometimes losing some precision of exactly what is meant. The invention of a new word can do the opposite, it can allow reducing one or more phrases to a single word, and conveying even more precise meaning by doing so.

That is why new words, and new languages emerge.  Because creating new single word can allow conveying meaning that would otherwise require several phrases, and the single word can even convey meaning better than several phrases.  But there is a trade off – people have to learn the new word.

Jargon arises because some new words are useful only in one specific domain, so for many people outside that specific domain there is insufficient benefit to learning those words.

The benefit to new words and additional language is the ability to communicate more precisely and concisely.  The new language ‘works’ when the benefit outweighs the effort of learning that additional language.


We are on a  Journey towards computers with ‘natural’ language. Since the first move from machine code to assembler, the goal has been for computer programs also to be processed by humans.  Computer language continues to evolve, to express concepts specific to computing and allow humans to interact with computer systems.

Extending computer languages in ways that mirror how we extend human languages, is in practice leveraging how the human brain processes language.

Specifically, the following points stand out from considering  human languages:

  • New words and new language allows more precise and concise expression, at the cost of learning the additional language or words
  • Defining DSLs or computer ‘jargons’ language extensions for repeated use across a domain
  • Given that extending language just for a single work of fiction is a well proven model,  a single program extending language should also be highly practical.
  • External DSLs and Detached DSLs that do not extend an existing language should be used very carefully considering the difficulty of learning completely new human languages

Properties: Getters, Setters + self vs this & more


self vs this

Access to variables of an object from outside the code of class definition (as in the previous example), is the same for python and kotlin. The code will access the name variable from the object p1.   Code inside the class must work without any actual object name, so another naming system is needed.  The naming for python is self to indicate the current object, and for kotlin this to indicate the current object.  But the python self is needed far more often than the kotlin this, so in python for the object variable or property, and in kotlin, but in kotlin the this. is only needed when there is a parameter or local with the same name, and normally the this. can be omitted.   So a lot less this in kotlin than self in python.

Again, in python the first parameter to each method in a class should be self, this is not included in the parameter list in kotlin.  Again, less this than self.

# consider in method defintion
    def setNames(self, name): # self as first parameter = name
       self.otherName = ""
       self.fullName = name # use name as fullname
//method definition in kotlin
   fun setNames(name): // no 'this' in parameter list = name // '' is property, 'name' is parameter
       otherName = "" // only one 'otherName', so do not need 'this.'
       fullName = "" // also only one 'fullName'

properties: getters and setters

Traditionally, java programmers have been taught that encapsulation (a key part of OO) requires building a class so that how things work can be changed without affecting code using the class. To do this ‘getters’ and ‘setters’ are required, to provide for changes to how data inside the class is used. Instead of allowing a variable to be accessed or set from outside the class, a getter method is created to get the value, and a setter method to set the value. The idea is functions already there in place ready for a possible time when getting or setting is to be become more complex.
Modern languages have identified problems with this approach:
almost all getters and setters just get or set the value and do nothing else so they just bloat the program
it is much clearer for the calling code to get the value of a variable or have an assignment statement to set the value – even when what is happening inside the class is more complex

The solution is:
require code only for the complex cases
ensure setting and getting from outside the class looks the same for simple and complex and is most readable.

Consider this python class:

class Person:
    def __init__(self, name): = name
       self.otherName = ""
       self.fullName = name

>>> tom = Person("Tom")  #instance object
>>> tom.fullName = "Tom Jones" # set property using object
>>> tom.fullName  # get property
'Tom Jones'

getting and setting is as simple as possible when using the class, but what if we do wish to ‘complicate’ the fullName property changing the value from being simply its own data, to being the result of name together with otherNames?

class Person:

    def __init__(self, name): = name
       self.otherName = ""

    def fullName(self):
	    return " ".join([,self.otherName])
    def fullName(self,value):
	    if " " in value:,self.otherName = value.split(" ",1)
	    else: = value
	        self.otherName = ""
>>> bob = Person("Bob")
>>> bob.otherName = "Jones"
>>> bob.fullName
'Bob Jones'
>>> bob.fullName = "Bobby Smith"
>>> bob.fullName
'Bobby Smith'
>>> bob.otherName

Now we have the new implementation, and all code written before the change will still work.

class Person(var name:String) {
// instance variable declared in constructor

    var otherName = "" // an instance variable not from a parameter
    var fullName
        get()= listOf(name,otherName).joinToString(" ")
        set(value) {
            val (first,second) =
                    if(' ' in value) value.split(" " ,limit=2)
                    else listOf(value,"")
            name = first
            otherName = second

The kotlin code for having getters and setters is less changed by adding getters and setters. Simply follow the variable (or value) property declaration with the get and/or set methods.


What is not covered?
Super, which I feel needs no explanation, and
Delegated properties and more complex cases with does need more. I will add a separate page on these but for now see this page, and delegated properties are described here.

Extension functions will also be covered separately.