presentation

---

# Managing complexity and modular code development

## [Radovan Bast](http://bast.fr)

### [NeIC](https://neic.nordforsk.org)/ [UiT The Arctic University of Norway](https://uit.no)

Text is free to share and remix under [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).

Code examples: [MIT license](http://opensource.org/licenses/mit-license.html)

Credits: [Jonas Juselius](https://github.com/juselius),
         [Roberto Di Remigio](http://totaltrash.xyz),
         [Ole Martin Bjørndalen](https://github.com/olemb)

---

## Complexity

---

## The front panel: many dependencies

---

## The back side: global variables

---

## Simple vs. easy

---

## [The tar pit](http://shaffner.us/cs/papers/tarpit.pdf)

- Over time software tends to become harder and harder to reason about
- The code base easily becomes untidy ("I'll fix it later")
- Small changes become harder to implement
- Hacks and workarounds trump design
- Bugs start appearing in unexpected places
- More time is spent debugging than developing
- Complexity strangles development because it does not scale well

(Slide adapted from [Complexity in software development by Jonas Juselius](https://github.com/scisoft/complexity))

---

## Modular design is good

### Examples

- Lego
- Car manufacturing
- Design of your phone or laptop
- Modular composition when you order a laptop
- Success of USB
- Study programs

### Advantages

- Separation of concerns
- Composability
- Leveraging functionality

---

## Prefer loose coupling and high cohesion

- Strong coupling

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/strong-coupling.svg)

- Loose coupling
    - Easier to reassemble
    - Easier to understand

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/loose-coupling.svg)

---

## Prefer loose coupling and high cohesion

- Low cohesion: difficult to maintain, test, reuse, or even understand
    - Non-cohesive code introduces unnecessary dependencies
    - Swiss army knife modules

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/low-cohesion.svg)

- High cohesion: robust, reliable, reusable, understandable
    - **Do one thing only and do it well**
    - API of cohesive code changes less over time
    - Power of the Unix command line is a set of highly cohesive tools
    - Microservices

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/high-cohesion.svg)

---

## Purity

- Pure functions have no notion of state: They take input values and return
  values
- Given the same input, a pure function *always* returns the same value
- Function calls can be optimized away
- Pure function == data
- Purity is key to equational reasoning

(Slide taken from [Complexity in software development by Jonas Juselius](https://github.com/scisoft/complexity))

---

## Example: pure vs. stateful

### a) pure

```python
# function which computes the body mass index
def get_bmi(mass_kg, height_m):
    return mass_kg/(height_m**2)

# compute the body mass index
bmi = get_bmi(mass_kg=90.0, height_m=1.91))
```

### b) stateful

```python
mass_kg = 90.0
height_m = 1.91
bmi = 0.0

# function which computes the body mass index
def get_bmi():
    global bmi
    bmi = mass_kg/(height_m**2)

# compute the body mass index
get_bmi()
```

---

## Enemy of the state

.left-column[
<img src="https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/mad.jpg" style="width: 150px;"/>
]
.right-column[
### Strive for pure functions, fear the state

- Pure functions do not have side effects
- Side effects lead to bugs and increase complexity
- Pure functions are easy to
    - Test
    - Understand
    - Reuse
    - Parallelize
    - Simplify
    - Refactor
    - Optimize

### But we need to deal with state somewhere

- Examples later in this talk
]

---

## Equational reasoning

- We start with a function:
  $$ f(x) $$
- We wish to evaluate this:
  $$ y = f(a) + f(b) \times [f(c) - f(c)] $$
- We can simplify:
  $$ y = f(a) + f(b) \times 0 $$
  $$ y = f(a) $$
- Another example:
  $$ z = f(a) + f(b) + f(c) + f(d) $$
- We know we can rearrange (important for concurrency):
  $$ z = f(b) + f(d) + f(c) + f(a) $$

---

## Concurrency

- Concurrency in imperative code is very hard
- You are totally lost in the dark without a good thread checker
- In a pure, immutable world concurrency is nearly trivial!
- Prefer immutable data to mutable data

(Slide taken from [Complexity in software development by Jonas Juselius](https://github.com/scisoft/complexity))

---

## Composition

- Build complex behavior from simple components
- We can reason about the components and the composite
- Composition is key to managing complexity
- Modularity does not imply simplicity, but is enabled by it

(Slide taken from [Complexity in software development by Jonas Juselius](https://github.com/scisoft/complexity))

---

## One way to look at your code

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/main-inside.svg)

- The main function calls other functions

---

## Another way to look at your code

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/main-outside.svg)

- The main function ("program" in Fortran) is on the outside shell

---

## Recommendations

- Keep I/O on the outside and connected (examples later)
- Always read/write on the outside and pass data
- Do not read/write deep down inside the code
- Keep the inside of your code pure/stateless
- Move all the state to the outside of your code
- Keep the stateful outside shell thin
- Unit test the inside
- Regression test the shell

![](https://cdn.jsdelivr.net/gh/bast/talk-complexity@970a58a661a4211ca0aabd460a57779cd9e03ccc/img/good-vs-bad.svg)

---

## Recommendations

---

## Divide and conquer

- Split the code up
- Construct your program from parts:
  - functions
  - modules
  - packages (Python) or libraries (C or or C++ or Fortran)

## Functions, functions, functions

- Build your code from functions
- Break your code down to more functions
  - if you have too many levels of indentation
  - if a function gets too long
  - if a function does more than one thing
  - if you find it hard to name a function
- A function that performs a single operation is simpler to
  understand, test, and reuse
- A function that does not fit on one screen is too long

---

## C interface for libraries

- English is to humans what C is to programs
- C is the common language
- Basically any language can talk to a C interface
- **Create a C interface for your code**
- Better than Fortran interface (the latter imposes compilation
  order and introduces compiler dependence)

### Core language

- Core language can be any language that can export a C API
- For single-core high-performance use C or C++ or Fortran
- With an eccentric language you risk to reduce the number of contributors

### Communicate through memory or through files?

- Through memory is more general than through files

---

## Import and export

- Import only the functionality that you need
- Import only where you need it
- Export as little functionality as possible

## Naming things

- Use meaningful names
- Indent
- Use standard naming conventions
- Python: use [PEP8](https://www.python.org/dev/peps/pep-0008/)
- Write comments in English

```
Makecodee
asytoread
```

---

## Encapsulation

- Hide internals by language or by convention (header file in C/C++,
  public/private in Fortran, underscores in Python)
- "Python has no locked doors; it's a consenting adults language.
  If you open the door you're responsible for what you see." [R. Hettinger]
- Expose the "what", hide the "how"

## Documentation

- Separate the "what it can do" from "how is it implemented"
- Document your API
- Version your API ([semantic](http://semver.org) or [sentimental](http://sentimentalversioning.org)
  or [romantic](https://github.com/jashkenas/backbone/issues/2888#issuecomment-29076249) versioning)

---

## Modules testable on its own

- Sharpens interfaces
- Once you start testing your library you really see the coupling and cohesion
- Increases development speed

## Libraries built on its own

- Prerequisite for testable on its own

## Libraries have own development history

- Decouple the development history
- Each unit should have its own Git history/repository

---

## Recommendations for C++

(thanks to [Roberto Di Remigio](http://totaltrash.xyz) for this list)

- Read [CppCoreGuidelines](http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines).
- Use C++11 and show the finger to obsolete compilers.
- Use CMake >= 3.2 as build system to enforce strict standard compliance of
  your code. Bonus: you can use C++14 whenever possible and have safe fallbacks.
- Consider very carefully whether you want to depend on the Boost library.
- Do not use C constructs in C++ code.
- Know the STL, 'cause the STL is your friend and will always be.
- Use algorithms from the <algorithm> header whenever possible.
- Use functional constructs from the <functional> header whenever possible.
- Use tuples for multiple return values whenever it makes sense (avoid doing
  pointer and/or reference tricks to get more stuff out of a function because it
  will mess up your head).
- Use Doxygen. Even just Doxygen-style comments are fine.

---

## Simplicity and clarity before elegance before efficiency

### Avoid premature optimization

- Do not optimize
- If you have to optimize, optimize later
- If you have to optimize, measure, do not guess

### Simple is better than complex

- If you cannot understand or explain a function on a cold gray Monday morning before coffee, it is too complex. (Quote adapted from [Pieter Hintjens, Social Architecture, 2009](https://www.gitbook.com/book/hintjens/social-architecture/details))

---

## More recommendations

- Introduce code reading sessions in your group
- Read sources written by others
- Apply [YAGNI](https://en.wikipedia.org/wiki/You_aren't_gonna_need_it) principle
- Do not postpone to run time what can be done well at compile time

---

## Quiz

### Decide which alternative you like better and discuss why

---

## Main function vs. global scope in Python

### a) main function

```python
def do_something(input):
    # ...
    return something

def main():
    result = do_something(2.0)
    print(result)

if __name__ == '__main__':
    main()
```

### b) global scope

```python
def do_something(input):
    # ...
    return something

result = do_something(2.0)
print(result)
```

---

## Pure attribute in Fortran

### a) default

```fortran
function my_function(a, b)
    ! ... function that happens to have no side effects
end function

subroutine my_subroutine(a, b, c)
    ! ... subroutine that happens to have no side effects
end subroutine
```

### b) with "pure" attribute

```fortran
pure function my_function(a, b)
    ! ... function body unchanged
end function

pure subroutine my_subroutine(a, b, c)
    ! ... subroutine body unchanged
end subroutine
```

---

## Argument vs. global input (example: R)

### a) "offset" passed as argument

```r
factor_and_offset1 <- function (x, factor, offset) {
    x*factor + offset
}

a <- factor_and_offset1(2.0, 3.0, 4.0)
```

### b) "offset" defined in outer scope

```r
offset <- 4.0

factor_and_offset2 <- function (x, factor) {
    x*factor + offset
}

a <- factor_and_offset2(2.0, 3.0)
```

---

## Explicit interface vs. invisible dependencies (C++)

(example taken from [CppCoreGuidelines](http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#a-nameri-explicitai1-make-interfaces-explicit))

### a)

```cpp
int rnd(double d)
{
    return (rnd_up) ? ceil(d) : d;
}
```

### b)

```cpp
int rnd(double d, bool rnd_up)
{
    return (rnd_up) ? ceil(d) : d;
}
```

---

## Constructor vs. explicit vs. intrinsic in Fortran

### a) array constructor

```fortran
result = (/(vector(i)*factor + offset, i = 1, size(vector))/)
```

### b) explicit

```fortran
do i = 1, size(vector)
    result(i) = vector(i)*factor + offset
end do
```

### c) intrinsic array operation

```fortran
result = vector*factor + offset
```

---

## Implicit vs. implicit none in Fortran

### a) implicit

```fortran
dmass = 2.0d0
acceleration = 3.0d0

force = dmass*acceleration
```

### b) implicit none

```fortran
implicit none

real(8) :: mass
real(8) :: acceleration
real(8) :: force

mass = 2.0d0
acceleration = 3.0d0

force = mass*acceleration
```

---

## Variable reuse

### a) reuse memory

```fortran
integer, parameter :: length = 1000000
real(8) :: distances(length)

call get_distances(distances)
call do_something(distances)

! now we do not need distances anymore, we can safely reuse the array
call get_angles(distances)
call do_something_else(distances)
```

### b) allocate twice the memory

```fortran
integer, parameter :: length = 1000000
real(8) :: distances(length)
real(8) :: angles(length)

call get_distances(distances)
call do_something(distances)

call get_angles(angles)
call do_something_else(angles)
```

---

## Ternary operator

### a) ternary

```python
i = 1 if is_odd else 2  # python
```

```fortran
i = merge(1, 2, is_odd)  ! fortran
```

```cpp
int i = (is_odd) ? 1 : 2;  // C++
```

### b) explicit

```python
# python
if is_odd:
    i = 1
else:
    i = 2
```

Similar for Fortran and C++.

---

## Implicit vs. named arguments

### a) implicit

```python
bmi = get_bmi(90.0, 1.91)
```

### b) named

```python
bmi = get_bmi(mass_kg=90.0, height_m=1.91)
```

---

## File I/O (example: Python)

### a) pass file name

```python
# parse_input1 function opens and reads the file
result = parse_input1(file_name)
```

### b) pass file handle

```python
with open(file_name, 'r') as f:
    # parse_input2 reads the file
    result = parse_input2(f)
```

### c) pass data

```python
with open(file_name, 'r') as f:
    input_lines = f.readlines()
    # parse_input3 does not know anything about the file
    result = parse_input3(input_lines)
```

---

## Private vs. public (Fortran)

### a) default is private

```fortran
module mymodule
    implicit none
    public myroutine  ! added these
    private           ! two lines
contains
    subroutine myroutine()
    ...
    end subroutine
end module
```

### b) default is public

```fortran
module mymodule
    implicit none
contains
    subroutine myroutine()
    ...
    end subroutine
end module
```

---

## Generic vs. function import

### a) generic

```python
import somelib  # python
```

```fortran
use somelib  ! fortran
```

### b) function import

```python
from somelib import this, that
```

```fortran
use somelib, only: this, that
```

---

## Module-level vs. function-level import (Python or Fortran)

### a) module-level

```python
from somelib import somefunction

def myfunction(a, b, c):
    # use somefunction
    return something
```

### b) function-level

```python
def myfunction(a, b, c):
    from somelib import somefunction
    # use somefunction
    return something
```

---

## Intent vs. unspecified

### a) intent specified

```fortran
subroutine do_something(a, b, c, d)
    real(8), intent(in)    :: a
    logical, intent(in)    :: b
    real(8), intent(inout) :: c
    real(8), intent(out)   :: d

! ...
end subroutine
```

### b) intent unspecified

```fortran
subroutine do_something(a, b, c, d)
    real(8) :: a
    logical :: b
    real(8) :: c
    real(8) :: d

! ...
end subroutine
```

---

## Loop vs. map/filter

### a) loop

```python
numbers = [1, 2, 3, 4, 5]

squares = []
for number in numbers:
    squares.append(number**2)

odds = []
for number in numbers:
    if number%2 == 1:
        odds.append(number)
```

### b) map and filter

```python
numbers = [1, 2, 3, 4, 5]

squares = map(lambda x: x**2, numbers)
odds = filter(lambda x: x%2 == 1, numbers)
```

---

## Conditionals vs. higher-order function

### a) conditionals

```python
def apply(x, option_a=False, option_b=False):
    parameter = 137
    if option_a:
        return complicated_function_a(x, parameter)
    elif option_b:
        return complicated_function_b(x, parameter)
    else
        # FIXME we should stop here
        pass

apply(2.0, option_a=True)
```

### b) higher-order function

```python
def apply(x, f):
    parameter = 137
    return f(x, parameter)

apply(2.0, complicated_function_a)
```

---

### a) class

```python
class Pet:
    def __init__(self, name):
        self.name = name
        self.hunger = 0
    def go_for_a_walk(self):
        self.hunger += 1

my_cat = Pet('Tom')
my_cat.go_for_a_walk()
print(my_cat.hunger)
```

### vs. b) named tuple (in Python)

```python
from collections import namedtuple

def go_for_a_walk(pet):
    new_hunger = pet.hunger + 1
    return pet._replace(hunger=new_hunger)

Pet = namedtuple('Pet', ['name', 'hunger'])

my_cat = Pet(name='Tom', hunger=0)
my_cat = go_for_a_walk(my_cat)
print(my_cat.hunger)
```

---

## Conclusions

- Modular and well structured code is easy to test
- Entangled code is difficult to test
- Introduce testing early - it will **automatically guide you towards a modular and well structured code**
- Compose your code out of **pure functions**
- Prefer immutable data structures, do not overuse classes