name: inverse layout: true class: center, middle, inverse --- # Library design the hard way ## Radovan Bast Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Code examples: [OSI](http://opensource.org)-approved [MIT license](http://opensource.org/licenses/mit-license.html). Credits: [Jonas Juselius](https://github.com/juselius), [Roberto Di Remigio](http://totaltrash.xyz) --- layout: false ## Modular design is good ### Examples - Lego - Car manufacturing - Design of your phone or laptop - Modular composition when you order a laptop - Success of USB - Study programs ### Advantages - Separation of concerns - Composability - Leveraging functionality --- ## Library design in a modular world ### This talk is not so much about why, it is about how
- Copyright Joe Paradiso (MIT) --- ## Coupling and cohesion - Strong coupling ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/strong-coupling.svg) - Loose coupling - Easier to reassemble - Easier to understand ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/loose-coupling.svg) --- ## Coupling and cohesion - Low cohesion: difficult to maintain, test, reuse, or even understand - Non-cohesive code has unnecessary dependencies - Swiss army knife modules ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/low-cohesion.svg) - High cohesion: associated with robustness, reliability, reusability, and understandability - Do one thing only and do it well - API of cohesive code changes less over time - Power of the Unix command line is a set of highly cohesive tools - Microservices ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/high-cohesion.svg) --- ## Coupling and cohesion - Minimize dependencies - Prefer loose coupling and high cohesion
--- ## Must haves 1/2 ### Encapsulation - Hide internals by language or by convention (header file in C/C++, public/private in Fortran, underscores in Python) - "Python has no locked doors; it's a consenting adults language. If you open the door you're responsible for what you see." [R. Hettinger] - Interface exposed in a separate file - Expose the "what", hide the "how" ### Documentation - Separate the "what it can do" from "how is it implemented" - Documented API - Versioned API ([semantic](http://semver.org) or [sentimental](http://sentimentalversioning.org) or [romantic](https://github.com/jashkenas/backbone/issues/2888#issuecomment-29076249) versioning) --- ## Must haves 2/2 ### Testable on its own - Sharpens interfaces - Once you start testing your library you really see the coupling and cohesion ### Built on its own - Prerequisite for testable on its own ### Own development history - Decouple the development history - Each unit should have its own Git history/repository --- ## C interface - English is to humans what C is to programs - C is the common language - Basically any language can talk to a C interface - **Do create a C interface for your code** - Better than Fortran interface (the latter imposes compilation order and introduces compiler dependence) ### Core language - Core language can be any language that can export a C API - For single-core high-performance use C or C++ or Fortran - With an eccentric language you risk to reduce the number of contributors ### Communicate through memory or through files? - Through memory is more general than through files --- ## Stateful vs. stateless code ### State - Imagine your program stops now and you need to checkpoint everything for a future restart - This information is the state of the program ### Pure/immutable/stateless - Mathematical functions - Functions which do not mutate the state - Referentially transparent ### Impure/mutable/stateful - Code which mutates the global state - Referentially opaque --- ## Enemy of the state .left-column[
] .right-column[ ### Strive for pure functions, fear the state - Pure functions do not have side effects - Side effects lead to bugs and increase complexity - Pure functions are easy to - Test - Understand - Reuse - Parallelize - Simplify - Refactor - Optimize ### But we need to deal with state somewhere ] --- ## One way to look at your code ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/main-inside.svg) - The main function calls other functions --- ## Where to deal with the state? Typical questions: - Where to do file I/O? - In the main function? - Should we pass the file name and open the files deep down in functions? - Should we open the file and pass the file handle? - Should we read the data and pass the data? - Where to keep "global" data? - In the main function? - In a Fortran common block? - In a Fortran module? - In functions with save attributes? --- ## Another way to look at your code ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/main-outside.svg) - The main function ("program" in Fortran) is on the outside shell --- ## Recommendations - Keep I/O on the outside and connected - Always read/write on the outside and pass data - Do not read/write deep down inside the code - Keep the inside of your code pure/stateless - Move all the state "up" to the caller - Keep the stateful outside shell thin - Unit test the inside - Regression test the shell ![](https://cdn.jsdelivr.net/gh/bast/talk-library-design@fb5e51e48385337c9c3aa7f7550e621ecd1ac32a/img/good-vs-bad.svg) --- ## Namespaces "Will somebody clean up the butts?" [citation needed] - This has a different meaning in the namespace "cigarette factory" and in the namespace "nursery" - "Namespaces are one honking great idea -- let's do more of those!" [Zen of Python] --- ## Namespace protection in Python - Bad ```python from mymodule import * ``` - Better ```python from mymodule import function1, function2 ``` - Also good ```python import mymodule ``` - Better for people writing code (avoids name collision) - Better for people reading code (easier to see where functions come from) - Consider importing modules at function level and not at module level --- ## Namespace protection in Fortran - Bad ```fortran use mymodule ``` - Better ```fortran use mymodule, only: function1, function2 ``` - Same reasoning as in Python - Consider importing modules at function level and not at module level --- ## API ### We will discuss three API examples - Stateless API - API with state: new/compute/delete - API with context --- ## Stateless API ### Example: BLAS ```fortran f = ddot(1000, vector_a, 1, vector_b, 1) ``` - From the client perspective there is no state - Sometimes libraries with "stateless" API have internal state (caching, memoization) ### Advantage - We only need to consider this one call to understand and predict what will happen ### Disadvantages - The library may need to recompute expensive intermediates at every call - Typically many arguments --- ## API with state - We split API into 3 phases/epochs - new/init - compute - delete/finalize ### Example - MPI ### Advantage - We can avoid possibly costly initialization at each compute call ### Disadvantages - Order matters - We have to remember to clean up memory - Only one context at a time --- ## API with state ### Another example: we program our own bank ```python bank_new() bank_deposit(100.0) bank_deposit(100.0) bank_withdraw(50.0) my_balance = bank_get_balance() bank_free() ``` - Problem: We have to close the account before opening a new one --- ## API with state ### It would be great to have more contexts ```python account1 = bank_new() bank_deposit(account1, 100.0) bank_deposit(account1, 100.0) account2 = bank_new() bank_deposit(account2, 200.0) bank_deposit(account2, 200.0) bank_withdraw(account1, 50.0) balance1 = bank_get_balance(account1) balance2 = bank_get_balance(account2) bank_free(account1) bank_free(account2) ``` --- ## API with context - Allows multiple contexts open at the same time ### Example: FFTW ```c fftw_complex in[N], out[N]; fftw_plan p; ... p = fftw_create_plan(N, FFTW_FORWARD, FFTW_ESTIMATE); ... fftw_one(p, in, out); ... fftw_destroy_plan(p); ``` --- ## Context-aware API in different languages - Shows how to implement and use context-aware APIs in C++, Fortran, and Python with a C interface: https://github.com/bast/context-api-example - Inspired by Armin Ronacher's ["Beautiful Native Libraries"](http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/) ``` . |-- api | |-- example.h (C interface) | `-- example.py (Python interface) |-- src | |-- bank.cpp (C++ library) | |-- bank.f90 (Fortran library) | `-- bank.h (C++ library) `-- test |-- test.cpp (C++ client) |-- test.f90 (Fortran client) `-- test.py (Python client; automatically tested) ``` --- ## There is no one-size-fits-all model - Stateless works wonders when there is no need for a global initialization - Exclusive state should work when creating more than one context does not make sense - Context-aware works when multiple instances might be needed --- ## High-level and low-level API ### Scared of asking the client to pass 20 parameters every time? - Offer a high-level API in addition to a low-level API - Experts can access low-level API directly - High-level convenience API is sufficient most of the time ```python def deposit(account, amount): deposit_explicit(account=account, amount=amount, currency='EUR', date=today(), message='standard deposit', ...) def deposit_explicit(account, amount, currency, date, message, ...): ... ``` --- ## Complexity/viscosity - Simplicity is hard - Writing modular code is hard - As we break up the code into libraries the surface area increases - Investment with late but huge return
--- ## Test your C/C++/Fortran code with Python - Forces you to create a clean interface (good) - Nice byproduct: you have a Python interface (good) - Encourages dynamic library (good) - You can write and prototype tests without recompiling/relinking the library (good) - Allows you to use the wonderfully lightweight [pytest](http://pytest.org) (no more excuses for the Fortran crowd) - Example: https://github.com/bast/context-api-example