Scheduling: Schedule, design, code?

Or perhaps: schedule, design, build?  Either way, while that sequence may sound like waterfall, even agile is really a repetition of this sequence over and over.  This page discusses why these 3 steps are problematic for software, but still must be followed despite problems, and how to schedule projects using this sequence that initially can appear to be broken.

Contents:

Build Vs Engineer: Which Best Describes Software Development?

To be able to schedule the completion of a task, it is useful to consider the nature of the task and the nature of the steps within a task.

Design: Engineering Design vs Artistic Design.

The verb design can be interpreted in subtly different ways.  The design of the Rubik’s Cube by Erno Rubik is an example of engineering design, not of artistic design. But someone can make a new Rubik’s cube with a new artistic design. When Engineering design is consider  novel the designer can apply for a patent, while with artistic design the designer considers their work protected by copyright.  The nature of the intellectual property created by different design types is sufficiently different that there are different ways to protect that intellectual property.

The engineering process as describe here gives some insight.  Note the iteration, with no way of being certain how many iterations are required to produce a result.  The implication is it is impossible to be certain how long the process will take.   However,  in most cases, with artistic design, a single iteration is sufficient, making the process easier to predict.  Depending on the task, software design can be a about artistic design, engineering design or mixture of both. In reality, the design tasks are almost always engineering design.

Generally software is written because software to do the task does not already exist. Unlike with a chair that you might build because you need another chair, software is generally no written because you need an exact copy of software that already exists.  This means there should always be engineering design required, which in turn suggests there could be a highly variable time to predict how long the process will take.

Engineering Design: A tautology?

Engineering is generally described by the dictionary as both design and building. The dictionary definition suggests a waterfall sequence, while the flow from an engineering perspective clearly conveys a more agile approach.  Regardless of the sequence, there is both design and build, however for engineering the substance is in the design.  Consider the phrases ‘engineer a bridge’ and ‘build a bridge’  as we use them today. To engineer a bridge is to either design the bridge or some detail of the bridge, while to build a bridge implies no design work is required.  As we use ‘engineer’ in this sense today, an engineering project is considered complete when the design is complete.   Building that is necessary to test the design or to be ready to test the design could be part of engineering. Once a design is complete, in todays world we then normally use ‘construction’ or simply ‘building’ and keep ‘engineering’ for when there is a new design.

Construction vs Engineering.

While both ‘construction projects’ and ‘engineering projects’ can involve building something and will need a design- the construction project is primarily all about the building, and the and the engineering project is all about the design.  A construction company will often outsource the design to an engineering company when custom design is required, and then commence construction with all design work complete.

Consider building Swedish designed flat pack furniture.   All that is required by the consumer is building or construction.  Engineering is what took place at head office to produce and test the original design. But even within that original engineering project there a simple construction components, because the design of that component already exists, and is already tested.

Construction is where the overall design already exists and is already considered tested. Engineering is where the overall design is not yet tested, although there will normally be components of the overall design that are tested.

Software Development: Construction or Engineering?

The waterfall development process works on the principle that the software development can work like construction, and all design is already complete and effectively tested.

In contrast, agile is based on considering software as an engineering project, and that even components within that engineering project are themselves engineering projects.

Waterfall Advantage: Scheduling

As in a construction industry project, there is still the initial engineering/design phase which can be very difficult to schedule. With the construction industry the main costs and the major time for the project all take place one the design is agreed, so the variable timing of the shorter, lower cost engineering/design phase is not that significant to the overall project.

Construction itself is following known steps, so it can (at least in theory) be accurately scheduled and costed.  In theory, applying this model to software could provide not only accurate overall scheduling, but with a set of discrete steps, accurate tracking of project progress.

Waterfall Advantages: Skill Diversity and Low Cost

Again consider the construction industry.  The design phase requires architects and engineers who are highly educated and costly, however most of the actual work can be done by labourers who are less expensive labour.  The construction phase still needs project managers and foremen, but few in proportion to the labourers.

Applying this same model to software results in software architects and business analysts for the design phase, and while the actual development will require project managers and team leaders, these will be few in relation to more junior programmers, and these programmers can possibly be outsourced or even off shored to lower costs.

The advantages and disadvantages of Agile

Agile has one simply advantage:  the engineering metaphor it implements actually does apply to software.  This gives the advantage of actually confronting reality, while the waterfall approach has attractions,  as a model for software it is fundamentally flawed since design is required during the ‘construction’ phase.  The result is any supposed advantages are not realistic.

The disadvantage of facing reality is that 1) it become clear that it is impossible to guarantee a schedule for a design that is not yet complete, and the design will not be complete until the software is complete, and 2) the skill diversity approach is broken as with design needed at all stages the low cost ‘labourers’ will still need to make design decisions and the impact of these decisions is significant.  So no exact scheduling of construction, and the separation of roles to analysis, programmers, project managers is flawed and at the very least needs each skill in every team, if not every team member.

Summary: There are question marks around whether software can really be reduced to the waterfall model, and that would be the only way to enable reliable time estimates.  The implication is that in project discoveries must impact schedules and cannot be avoided.  Something has to give.

Rethink: Scheduling engineering vs scheduling construction.

The Problem.

The mindset of construction is that at the project outset it becomes known what is to be done, and then good planning will result in an accurate timeline for the project.

However software cannot generally be properly reduced to the construction model,  leaving an engineering approach where it is never known exactly what is to be done until the goal is achieved..at which point the project is basically complete.

The Proven Solution.

Consider car manufacturers.  They construct cars. In advance of manufacture they know very accurately what the specification of each car to be built will be, and how long to build each car.

However, now consider the engineering aspect of a car manufacturer: designing new cars to then be manufactured.  Design, prototyping and tooling.  As an engineering task, on the basis of theory described here, the time to design a new car to a given set of specifications can be estimated (educated guess), but there is insufficient information for an accurate figure.  The solution: something has to give.  Either the exact specification, or the exact amount of time must be adjustable.  Given many manufactures desire a new model each year, the time is not adjustable, so the result is the specification becomes flexible.

A fixed amount of resources is allocated to engineer improvements to the current model.  All improvement ready in time become part of the new model,  There is a list of desired improvements and ideas for improvements and these can be prioritised. The limitation is that the exact set of features that will be ready for the next model is not known at the start of the fixed length project. Only those that can be completed in time will be in the next model. Usually there will be more features and improvements identified as desirable than can be ready in time for that next model.  A list of features and improvements which are thought can be ready for the next model is made, and work starts on the list.  If a feature/improvement is not ready in time, it will have to wait as the release date for the next model cannot be pushed back.he projects include is

Applying the solution to software.

Agile allows taking the industrial engineering approach, and applying it to software.  Projects like Ubuntu Linux and now even Windows 10 releases industry have new software releases at fixed intervals.  The product versions even are based on dates and those dates are declared at the project outset.  There are have been two Ubuntu versions every year since October 2004 (4.10 Warty Warthog). One version in April, with the version number of the year and then ‘.04’ (April is the 4th month so 5.04 was in April of 2005) and then another in October with ‘.10’ for the tenth month. How do they keep such reliable schedules? Features that do not make the deadline are pushed back to the next version, that’s how.

Scheduling and Scrum.

The scrum process is like a series of mini-releases, with the completion of each sprint resulting in a new set of stories, tasks and bug fixes ready working tested and integrated into the system. Scrum planning can take the same type of approach with issues not able to be complete in time pushed back to a future scrum.

The danger is that if not managed correctly, an individual issue could absorb the team for entire time allocated to the sprint.  If that issue still cannot be completed, then there is a sprint with nothing completed.  The solution is to budget time for each issues.  When that budgeted time has elapsed, the issue should be reviewed.

The choices for the review are

  1. divide this issue into parts and push what cannot be done in this sprint back to the backlog (which could even be the entire issue)
  2. push other issues to the backlog to free time in this sprint for another allocation of time to this issue
  3. both of the above.  Push part of the issue to the backlog, but still allow a new block of time for this now simplified issue and push other issues to the backlog to free up time for the now reduced issue

Conclusion: Schedule, design, code.

Sprints should be set with fixed end dates, or at least end dates that will have only a small window of variation.  As the window approaches, new tasks are pushed back in place of being started when there is insufficient time and only tasks near completion can affect the sprint close date within a predetermined window.

Each task can be scheduled before the task is started, and the schedule should allow a high probability the task will be complete, but this cannot be guaranteed. At the end of the schedule time the task should be reviewed.  Either this task is pushed back, another task is pushed back, or the sprint will be late.

So it turns out that the sequence: Schedule, Design, Code, can work.

Advertisements

Background & History of OO Languages

Background

Python, Kotlin or Java, which best supports Object Oriented Programming (OOP)?

History.

Simula 67 (in 1967) was probably the first Object Oriented Language.  It was a great language, but wow was it slow!  And slow even on the massive mainframe computers it lived on.

The origin of C++ was to add the power of Object Oriented Programming to the very efficient language C, while still delivering real performance. In first reference book on C++ was released in 1985 (tired of waiting for everyone to have access to the internet?) at a time when micro-computers were popular, but far slower than even old mainframe computers.  The result was a language with great control over how objects were created, but in reality necessarily severely compromised by the features to deliver performance.

Java began life as Oak in 1991, but was only used by the original team until 1995 and then had a very short gestation to reaching 1.0 in 1996.  Java delivered greater ease of Object Oriented Programming than C++, and with good performance even with the added layer of a virtual machine.  While the language is hardly truly Object Oriented throughout,  it struck winning formulae with a balance of Object Oriented and performance.

Python had origins also in 1991, but did not really get public exposure until python 2.0 in the year 2000, and did not get full object orientation until ‘new style classes’ were introduced at the end of 2001. Computing power had moved a long way between 1996 and 2001 (consider the 150Mhz 1995 Pentium pro vs the 1.3Ghz 2000 Pentium 4),  and python was targeting easy of programming over ultimate performance anyway.  As you can guess as a result python was able to be Object Oriented throughout instead of just having an outer veneer of being Object Oriented.  But yes, still slower than Java.

Of course for Kotlin, designed in 2011 and having version 1.0 released in 2016, having Object Oriented structure throughout and adding features for functional programming as well was not even really a challenge. But slower than Java? Sometimes, but a 2016 technology compiler that ran run in a 2016 computer with gigabytes of memory available (windows 3.1, the common version at the time of Java release, required 1MB of Ram!) can optimise and produce almost the same code.  Oh yes, the compiler is slower than a Java compiler can be, but is it a problem?

OOP Myths.

But Java has created some OOP myths with almost a generation of programmers learning OOP on Java.

“This program cannot be True OOP, where are the getters and setters!”

That encapsulation requires getters and setters is actually true.  That you have to write your own getters and setters just to access and store data is not true.  Both python and kotlin automatically provide getters and setters without the programmer noticing unless the program requires non-standard behaviour from the getters and setters. Java requires the program writes their own getters and setters in order to allow breaking the rules of OO for performance reasons.

True OOP requires and functions and data to be inside a class so every thing is an object!

True OOP does have everything as an object.  Functions, all data types, everything.  For performance reasons Java broke this rule.  Understandable, but breaking the rules of OOP does not make the language more OO.  Functions should be ‘first class functions’ and in java they were not, for performance reasons.  Putting the non-OOP functions inside an object at least gave them an Object Wrapper, and the same applies for Java primitive data types.  Modern OO languages do not need to wrap functions or data in a class because functions and data are always already objects.

Python dataclasses: A revolution

Python data classes are a new feature in Python 3.7, which is currently in Beta 4 and scheduled for final release in June 2018.  However, a simple pip install brings a backport to Python 3.6.  The name dataclass sounds like this feature is specifically for classes with data and no methods, but in reality, just as with the Kotlin dataclass, the Python dataclass is for all classes and more.

I suggest that the introduction of the dataclass will transform the Python language, and in fact signal a more significant change than the move from Python 2 to Python 3.

The Problems Addressed By Dataclasses

There are two negatives with Python 3.6 classes.

‘hidden’ Class Instance Variables

The style rules for Python suggest that all instance variables should be initialised (assigned to some value) in the class __init__() method.  This at least allows scanning __init__ to reverse engineer a list of class instance variables. Surely an explicit declaration of the instance variables is preferable?

A tool such as PyCharm scans the class __init__() method to find all assignments, and then any reference to an object variable that was not found in the __init__() method is flagged as an error.

However, having to discover the instance variables for a class by scanning for assignments in the __init__() method is a poor substitute for scanning a more explicit declaration.  The body of the __init__() method essentially becomes part of the class statement.

The dataclass provides for much cleaner class declarations.

Class Overhead for Simple Classes

Creating basic classes in Python is too tedious. One solution is for programmers to use dictionaries as classes – but this is bad programming practice. Other solutions include namedtuples, the Struct class from the ObjDict package or the attrs package. With this number of different solutions, it is clear that people are looking for a solution.

The dataclass provides a cleaner, and arguably more powerful syntax than any of those alternatives, and provides the stated Python goal of one single clear solution.

Dataclass Syntax Example

Below is an arbitrary class to implement an XY coordinate and provide addition and a __repr__ to allow simple printing.

class XYPoint:

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        new = XYPoint(self.x, self.y)
        new.x += other.x
        new.y += other.y
        return new

    def __repr__(self):
        printf(f"XYPoint(x={self.x},y={self.y})")

Now the same functionality using a data class:

@dataclass
class XYPoint:
        x:float
        y:float

    def __add__(self, other):
        new = XYPoint(self.x, self. y)
        new.x += other.x
        new.y += other.y

The dataclass is automatically provided with an __init__() method and a __repr__() method.

The class declaration now has the instance variables declared at the top of the class more explicitly.

The type of ‘x’ and ‘y’ are declared as float above, although to exactly match the previous example, they should be of type Any but float may be more precise, and more clearly illustrates that annotation is usually a type.

The dataclass merely requires the class variables to be annotated using variable annotation.  ‘Any‘ provides a type completely open to any type, as is traditional with Python.  In fact, the type is currently just for documentation and is not type checked, so you could state ‘str‘ and supply an ‘int‘, and no warning or error is raised.

As you can see from the example, dataclass does not mean the class implemented is a data only class, but rather the class contains some data which almost all classes do. There are code savings for simple classes, mostly around the __init__ and __repr__ and other simple operations with the data, but the cleaner declaration syntax could be considered the main benefit and is useful for any class.

When to Use Dataclasses

The Candidates

The most significant candidates are any Python class and any dictionary used in place of a class.

Other examples are namedtuples and Struct classes from the ObjDict package.

Code using the attrs package can migrate to the more straightforward dataclass which has improved syntax at the expense of losing attrs compatibility with older Python versions.

Performance Considerations

There is no significant change to performance by using a dataclass in place of a regular Python class. A namedtuple could be slightly faster for some uses of immutable, ‘method free’ objects, but for all practical purposes, a dataclass introduces no performance overhead, and although there may be a reduction in code, this is insignificant. This video shows the results of actual performance tests.

Compatibility Limitations?

The primary compatibility constraint is that dataclasses require Python 3.6 or higher. With Python 3.6 being released in 2016, most potential deployments are well supported, leaving the main restriction as, not being available in Python 2.

The only other compatibility limitation applies to classes with existing type annotated class variables.

Any class which can limit support to Python 3.6+, and does not have type annotated class variables, can add the dataclass decorator without compatibility problems.

Just adding the dataclass decorator does not break anything, but without then adding data fields also, it does not bring significant new functionality either. ?? But the compatibility means adding data fields can be incremental as desired, with no step to ensure compatibility. ??

New capabilities do not guarantee delivery without code to make use of those capabilities. Unless the class is part of a library merely getting a spec upgrade, conversion to dataclasses makes most sense either when refactoring for readability or when code makes use of one or more functions made available by converting to a dataclass.

The ‘free’ Functionality

In addition to the clean syntax, the features provided automatically to dataclasses are:

  • Class methods generated automatically if not already defined
    • __init__ method code to save parameters
    • __repr__ to allow quick display of class data, e.g. for informative debugging
    • __eq__ and other comparison methods
    • __hash__ allowing a class to function as a dictionary key
  • Helper functions (see PEP for details)
    • fields() returns a tuple of the fields of the dataclass
    • asdict() returns a dictionary of the class data fields
    • astuple() returns a tuple of the dataclass fields
    • make_dataclass() as a factory method
    • replace() to generate a modified clone of a dataclass
    • is_dataclass
  • New Standardized Metadata
    • more information in a standard form for new methods and classes

Dataclass Full Syntax & Implementation

How Dataclasses Work

dataclass is based on the dataclass decorator. This decorator inspects the class, generates relevant metadata, then adds the required methods to the class.

The first step is to scan the class __annotations__ data. The __annotations__ data has an entry for each class level variable provided with an annotation.  Since variable annotations only appeared in Python 3.6, and class level variables are not common, there is no significant amount of legacy code with annotated class level variables.

This list is scanned for actual values of these class level variables which are of the type field. Values of type field can contain additional data for building the metadata which is stored in __dataclass_fields__ and  __dataclass_params__. Once these two metadata dictionaries are built, the standard methods are then added if they are not already present in the class. Note while an __init__() method blocks the very desirable boilerplate removing automatic __init__ method, simply renaming __init__ to __post_init__ allows retaining any code desired in an __init__, and removing the distracting boilerplate.

This process means that any class level variables that are not decorated are ignored by the dataclass decorator and not impacted by the move to a data class.

Converting Class

Consider the previous example, which was very simple. Real classes have default __init__ parameters, instance variables that are not passed to __init__, and code that will not be replaced with the automatic __init__. Here is a slightly more complicated contrived example to cover those complications with a straightforward use case.

This example adds a class level variable, last_serial_no, just to have an example of a working, class level variable, which allows a counter of each instance of the class.

Also added is serial_no which holds a serial number for each instance of the class.  Although it makes more sense to always increment the serial number by 1, an optional __init__ parameter allows incrementing by another value, showing how to deal with __init__ parameters which cannot be processed by the default __init__ method.

class XYPoint:

    last_serial_no = 0

    def __init__(self, x, y=0, skip=1):
        self.x = x
        self.y = 0
        self.serial_no = self.__class__.last_serial_no + skip
        self.__class__.last_serial_no = self.serial_no

    def __add__(self, other):
        new = XYPoint(self.x, self. y)
        new.x += other.x
        new.y += other.y
        return new

    def __repr__(self):
        printf(f"XYPoint(x={self.x},y={self.y})")

Now the same functionality using a dataclass.

from dataclasses import dataclass, field, InitVar

@dataclass
class XYPoint:
    last_serial_no = 0
    x: float
    y: float = 0
    skip: InitVar[int] = 1
    serial_no: int = field(init=False)

    def __post_init__(self, skip):
        self.serial_no = self.last_serial_no + self.skip
        self.__class__.last_serial_no = self.serial_no

    def __add__(self, other):
        new = XYPoint(self.x, self. y)
        new.x += other.x
        new.y += other.y

The class level variable without annotation needs no change. The __init__ parameter that is not also an instance variable has the InitVar type wrapper. This ensures it is passed through to __post_init__  which provides all __init__ logic that is not automatic.

The serial number is an instance variable or field that is not in the init, and to change default settings for a field, just assign a value to the field (which can still include a default value as a parameter to the field).

I think this example covers every realistic use requirement to convert any existing class.

Types and Python

Dataclasses are based on usage of annotations. As noted in annotations, there is no requirement that annotations be types. The reason for providing annotations was primarily driven by the need to allow for third-party type hinting.

Dataclasses do give the first use of annotations (and by implication, potentially types) in the Python standard libraries.

Annotating with None or docstrings is possible. There are many in the Python community adamant that types will never be required, nor become the convention. I do see optional types slowly creeping in though.

Issues and Considerations

It is possible there are some issues with existing classes which use class level variables and instance variables, but none have been found so far, which this leaves this section as mostly ‘to be added’ (check back).

Conclusion

There is a strong case that all classes, as well as namedtuples, and even other data not currently implemented as a class and some other constructs better implemented as classes, should move to dataclasses. For small classes, and all classes start as small classes, there is the advantage of saving some boilerplate code. Reducing boilerplate code makes it easier to maintain, and ultimately more readable.

Ultimately, the main benefit is any class written using a dataclass is more readable and maintainable than without dataclasses. Converting existing classes is as simple as renaming.

Python Class vs Instance Variables

Python class variables offer a key tool for building libraries with DSL type syntax, as well as data shared between all instances of the class. This page explains how Python class variables work and gives some examples of usage.

  • Class & Instance Variables
    • Instances vs the class object
    • Instance Variables
    • Class Variables
  • Using Class and Instance Variables

Class & Instance Variables

Instances vs the class object

In object oriented programming there is the concept of a class, and the definition of that class is the ‘blueprint’ for objects of that class.  The class definition is used to create objects which are instances of the class. In Python, for each class, there is an additional object for the class itself.  This object for the class itself, is an object of the class ‘type’. So if there is a class ‘Foo’ with two instanced objects ‘a’ and ‘b’, created by a = Foo() and b = Foo(), this creates objects a and b of class Foo. In Python, the code declaring the class Foo does the equivalent of, for example, Foo = type(). This Foo object can be manipulated at run time, the object can be inspected to discover things about the class, and the object can even be changed to alter the class itself at run time.

Instance Variables

Consider the following  code  as run in Idle (Python 3.6):


class Foo:
    def __init__(self):
         self.var = 1
>>> a = Foo()
>>> type(a)
'class: __man__'.Foo
>>type(Foo)
'class: type'
>>'var' in a.__dict__
True
>>>var in Foo.__dict__
False
>>> a.var
1
>>> Foo.var
Traceback (most recent call last):
  File "", line 1, in
AttributeError: type object 'Foo' has no attribute 'var'

Note the results of type(a) compared to type(Foo). 'var' appears in the a.__dict__ , but not in the Foo.__dict__ within Foo. Further, a.var gives a value of 1 while Foo.var returns an error.

This is all quite straightforward, and is as would be expected.

Class Variables

Now consider this code as run in  Idle that has a class variable in Idle (Python 3.6):


class Foo:
    var = 1
>>> a = Foo()
>>'var' in a.__dict__
False
>>>'var' in Foo.__dict__
True

>>> Foo.var
1
>>> a.var
1
>>> a.var = 2
>>> a.var
2
>>> Foo.var
1
>>> a.__class__.var
1

All as would be expected, the __dict__ results are reversed, this time the class Foo initially has var, and the instance a does not.

But even though a does not have a var attribute,  a.var returns 1, because when there is no instance variable var, Python will return a class level variable if one is present with the same name.

However, assignment does not fall back to class level variables, so setting a.var = 2 actually creates an instance variable var and reviewing the __dict__ data now reveals  both that the class level object and instance each have var. Once an instance variable is present, the class level variable is hidden from access using a.var which will now access the instance variable. In this way, code can add an instance variable replacing the value of class variable which provides what can be effectively a default value until the instance value is set.

Simple Usage Example

Consider the following Python class:


class XYPoint:

    scale = 2.0
    serial_no = 0

    def __init__(self, x, y):
        self.serial_no = self.serial_no + 1
        self.__class__.serial_no = self.serial_no
        self.x = x
        self.y = y

    def scaled_vector(self):
        vector = (self.x**2 + self.y**2)**.5
        return vector * self.scale

   def set_scale(self, scale):
        self.__class__.scale = scale

    def __repr__(self):
        return f"XYPoint#{self.serial_no}(x={self.x},y={self.y})"

The class encloses an (x,y) coordinate, and has a method scaled_vector() which calculates a vector (using Pythagoras theorem) from that coordinate, and then scales the vector from the class variable scale.

If the class level variable scale is changed, automatically all XYPoint objects will return vectors using the new scale.

Being a class variable, scale can be read as self.scale but must be set as in the set_scale() method by self.__class__.scale = scale.

The other use case illustrated is the instance counter serial_no, which provides each instance of XYPoint a unique, incrementing serial_no.

Python Annotations

Python syntax allows for two flavours of annotations:

Introduction.

Both flavours of annotation work in the same manner.

They build a dictionary called __annotations__ which stores the list of annotations for a function, a module or a class.

It is common practice to annotate with types, such as int or str, but Python language implementation allows any valid expression. Using types can make Python look like other languages which have similar syntax and require types, and is one of the motivations for annotations in Python. Third-party tools may report expressions which are not types as errors, but Python itself currently allows any expression.

The Python dataclass now makes use of annotations.  Outside of this use, if you are not using an external type validator like mypy there seems little incentive to bother with type annotations, but  if you are going to document what type a variable should be, then annotation is the optimum solution.

The following code illustrates variable annotation at class level, module level, and local level:


>>> class Foo:

    class_var1: "Annotation"
    class_var2: "Another" + " annotation" = 3
    class_int: int

    def func(self):
        local1 : int
        local2 : undeclared_var
        self.variable : undeclared * 2 = 7
>>> module_var: Foo = Foo()

>>> module_var2: 3*2 = 7
>>> module_var3: "another" + " " + "one"
>>> Foo.__annotations__
{'class_var1': 'Annotation', 'class_var2': 'Another annotation',
'class_int':  }
>>> __annotations__
{ 'module_var': , 'module_var2': 6, 'module_var3': 'another one'}
>>> f = Foo()
>>> f.func()
>>> f.variable
7

Class Variables: The code annotates 3 identifiers at Foo scope (these identifiers, and the annotations all then appear in Foo.__annotations__. Note that only class_var2 is actually a variable and will appear in a dir() for Foo.  class_var1 and class_int appear in __annotations__ but are not actually created as variables.

Module Variables: Three module_var variables annotated at the module level, and all appear in __annotations__, and again module_var3 does not appear in globals as annotation itself does not actually create the variable, it solely creates the entry in __annotations__.  (module_var and module_var2 are assigned values, so are actual variables).

Local & Instance Variables: The func within the Foo class illustrates two local annotations, one of which uses an undeclared_var. This use of an undeclared identifier would generate an error with either class or module variables, in which case the expression is evaluated for the relevant __annotations__ dictionary. The expressions for local and instance variables annotations are not evaluated. At this stage, I have not found where, or how, the annotation data is stored.

The PEP for variable annotations is available here. Note the stated goal is to enable third party type checking utilities, even though the implementation does not restrict annotations to types. The non-goals are also very interesting.

While the practice of using annotations only with valid types might be best practice, it is worth understanding the compiler does not require this.

Function annotations: Introduced in Python 3.0 (2008)

Here is an example of function annotation:


>>> def func(p1: int, p2: "this is also an int" + " but ...") -> float:
	return p1 + p2

>>> func.__annotations__
{'p1': , 'p2': 'this is also an int but ...', 'return':  }

The expression following the ‘:‘ (colon) character (or the ‘->' symbol) is evaluated and the result stored in the __annotations__ dictionary.

The PEP is available here, but it is the fundamentals section that is the most highly recommended reading.