name: inverse layout: true class: middle --- background-image: url(https://cdn.jsdelivr.net/gh/coderefinery/talk-intro@2220bb7ce537c6d29bba3ce87e81c5e6c9a78fde/img/background.png) --- layout: false class: split-50-50
### Nordic e-Infrastructure Collaboration - Facilitates the development and operation of high-quality e-Infrastructure solutions in areas of joint Nordic interest - Distributed organisation consisting of technical experts from academic high-performance computing centres - Across the Nordic countries (Denmark, Finland, Iceland, Norway, Sweden) - Ca. 100 persons contracted by NeIC --- ## Software is transforming research .column[
] .column[ - Quality of scientific software is **critical to modern research** - Reproducibility of most computations is questionable - Scientists often **lack the necessary training** in practices to enable them to collaboratively write high-quality scientific software ] --- ## Many researchers struggle with code complexity
(c) Joe Paradiso --- ## Problem: Red Queen's Race
``` "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!" - Lewis Carroll, Through the Looking Glass ``` --- ## FAQ: How do we differ? ### [Software Carpentry](https://software-carpentry.org) - Teaching basic lab skills for research computing. ### [The Software Sustainability Institute](https://www.software.ac.uk) - Cultivate better, more sustainable, research software to enable world-class research (better software, better research). ### [CodeRefinery](https://www.software.ac.uk) - Training in skills and tools for students/researchers who write code. - Managing complexity and collaborative modular development. - Provide infrastructure services. (commons conservancy?) --- .left-column[
] .right-column[ ## CodeRefinery launched September 2016 - Nordic e-Infrastructure Collaboration project - Funded for two years - We are a team of enthusiasts located in DK, FI, NO, SE ### Team - Bjørn Lindi - Erik Edelmann - Jyry Suvilehto - Lukasz Bartosz Berger - Nikolai Denissov - Radovan Bast - Risto Laurikainen - Sabry Razick - Sri Harsha Vathsavayi - Thor Wikfeldt ### Alumni - Pinja Koskinen ] --- ## [Ten simple rules for making research software more robust](https://doi.org/10.1371/journal.pcbi.1005412) **M. Taschuk, G. Wilson** (2017). PLoS Comput Biol 13(4): e1005412. - Use **version control** - **Document** your code and usage - Make common operations easy to control - **Version** your releases - **Reuse** software (within reason) - Rely on **build tools** and package managers for installation - Do not require root or other special privileges to install or run - Eliminate hard-coded paths - Include a small **test set** that can be run to ensure the software is actually working - Produce identical results when given identical inputs --- template: inverse ## Reproducible research --- template: inverse ## Why should **only** the publications benefit humanity as a whole? ## If your results are not reproducible, do they really benefit humanity? --- class: split-60-40 ## To reproduce results you need .column[
- By
NagayaS
-
Own work
,
CC BY-SA 4.0
,
Link
] .column[ - the right version of the code - the right version of the environment - the right version of data ] --- template: inverse ## What we preach --- ## Code complexity/viscosity: simple vs. easy
--- ## Version control: record snapshots as you develop
--- template: inverse ## "I don't need version control because ..." - ... it is just me. - ... we are only two people. - ... I carefully test my code. - ... we do not distribute the code. - ... we are a research group and not a software company. - ... I do not have time to learn it. It's publish or perish. - ... I am interested in science and not in software engineering. --- ## Motivation for version control ### Relevant also in a single-person universe - Undo functionality - Working on several features in parallel - Reproducibility - Bug exposure can be traced back: **code history is extremely valuable** - Attribution can be determined after the fact: **important for legal reasons** ### Working with others - People working in parallel on the same project - Simplify integration of external contributions --- ## Version control: make it possible to collaborate
--- ## Branching model
--- ## Automated testing ```python def get_bmi(mass_kg, height_m): """ Calculates the body mass index. """ return mass_kg/(height_m**2) def test_get_bmi(): bmi = get_bmi(mass_kg=90.0, height_m=1.91) expected_result = 24.670376 assert abs(bmi - expected_result) < 1.0e-6 ``` ### Motivation - More robust code - Simplify collaboration - Documentation which is up to date by definition - Make it easier to contribute code - Guides towards modular code structure --- class: split-60-40 .column[
] .column[ ### Suiting up to modify untested code ] --- ### Good code (pure: no side effects) ```python # function which computes the body mass index def get_bmi(mass_kg, height_m): return mass_kg/(height_m**2) # compute the body mass index bmi = get_bmi(mass_kg=90.0, height_m=1.91)) ``` ### Less good code (impure: side effects) ```python mass_kg = 90.0 height_m = 1.91 bmi = 0.0 # function which computes the body mass index def get_bmi(): global bmi bmi = mass_kg/(height_m**2) # compute the body mass index get_bmi() ``` --- ## Enemy of the state .left-column[
] .right-column[ ### Strive for pure functions, fear the state - Pure functions do not have side effects - Side effects lead to bugs and increase complexity - Pure functions are easier to - Test - Understand - Reuse - Parallelize - Simplify - Refactor - Optimize ] --- ## Equational reasoning - We start with a function: $$ f(x) $$ - We wish to evaluate this: $$ y = f(a) + f(b) \times [f(c) - f(c)] $$ - We can simplify: $$ y = f(a) + f(b) \times 0 $$ $$ y = f(a) $$ - Another example: $$ z = f(a) + f(b) + f(c) + f(d) $$ - We know we can rearrange (important for concurrency): $$ z = f(b) + f(d) + f(c) + f(a) $$ --- ## Concurrency - Concurrency in imperative code is very hard - You are totally lost in the dark without a good thread checker - In a pure, immutable world concurrency is nearly trivial! - Prefer immutable data to mutable data
(Slide taken from [Complexity in software development by Jonas Juselius](https://github.com/scisoft/complexity)) --- ## Modular code development ### Modular design is good - examples: - Lego - Car manufacturing - Design of your phone or laptop - Modular composition when you order a laptop - Success of USB - Erasmus study program ### Advantages - Separation of concerns - Composability - Leveraging functionality --- ## Composition - Build complex behavior from simple components - We can reason about the components and the composite - Composition is key to managing complexity - Modularity does not imply simplicity, but is enabled by it
(Slide taken from [Complexity in software development by Jonas Juselius](https://github.com/scisoft/complexity)) --- ## Documentation - Close to the code (minimize barrier to contribute) - **Versions** - **Branches** - Lightweight markup - Readable on any device - Division into tutorials and API reference - Tutorials contain good defaults - Ready examples that one can copy-paste to get quickly started ### Current gold standard - Hosting: [GitHub](https://github.com) or [GitLab](https://gitlab.com) or [Bitbucket](https://bitbucket.org) - Markup: [RST](http://docutils.sourceforge.net/rst.html) or [Markdown](http://daringfireball.net/projects/markdown/) - Rendering: [Sphinx](http://www.sphinx-doc.org) or [GitBook](https://www.gitbook.com) - Deployed to: [Read the Docs](https://readthedocs.org), [GitHub Pages](https://pages.github.com/) --- ## Building portable and modular code with CMake - Separation of source and build path - Portability - Language support - Supports modular code development - Provides tools - Popular - General --- class: split-50-50 .column[ ## E-mail workflow
] .column[ ## Version control
] --- class: split-50-50 .column[ ## Centralized workflow
] .column[ ## Code review workflow
] --- ## .blue[2017: Are you using peer review in publishing?] ### *Of course! What else?* --- ## .blue[2017: Are you using peer review in publishing?] ### *Of course! What else?* ## .blue[2017: Are you using code review in code development?] ### *I don't know what it is.* ### *I don't know how to do it.* ### *I don't have time to do it.* --- ## .blue[2017: Are you using peer review in publishing?] ### *Of course! What else?* ## .blue[2017: Are you using code review in code development?] ### *I don't know what it is.* ### *I don't know how to do it.* ### *I don't have time to do it.* ## .blue[202X: Are you using code review in code development?] ### *Of course! What else?* --- ## Use code review ### Peer review process in publishing - Papers are reviewed before they are published - Maintain standards of quality - Improve performance - Provide credibility ### Code review - Code is reviewed before it is integrated - Improve quality - Learning - Knowledge transfer - [GitHub](https://github.com)/[GitLab](https://gitlab.com)/[Bitbucket](https://bitbucket.org) offer a web solution for code review *"We don't need code review because we are just two."* --- ## Suggestion: Code reading sessions - Read and discuss code written in your group - Read code written by others - Read code in the standard library ### "Whenever you write, strive for originality, but if you have to steal, steal from the best." (Woody Allen in Anything Else, 2003) --- ## Suggestions and requests for future topics - Contribution guides - How to open-source a project - Software licenses - Reproducible science --- template: inverse ## Infrastructure plans --- ## Git repository hosting for Nordic research software - Repository hosting - Collaboration - Code review - Issue tracking - Documentation
- [http://coderefinery.org/repository/](http://coderefinery.org/repository) - [https://source.coderefinery.org](https://source.coderefinery.or) --- ## Why GitLab? - Repositories typically need to be private at least until research has been published - Research institute bureaucracy can be difficult for small, pay-per-seat per-user services --- ## Plan: Continuous integration service - [Travis CI](https://travis-ci.org), [GitLab CI](https://about.gitlab.com/gitlab-ci/), [Jenkins](https://jenkins.io), [Drone](https://github.com/drone/drone), [AppVeyor](https://www.appveyor.com), ... - Test every changeset - We plan to deploy a service which will make it easier for researchers to test their code
--- ## CI Challenges: - More complex CI pipelines typically require trust - Even a minimal test data set may be considered sensitive data - Per-project GitLab CI runners are the solution to this - How do we support researchers creating and maintaining their own CI runners? ## How much DevOps do you need for science? --- ## Idea: System for publishing/distributing environments (VM Images, Containers) - Many technical platforms exist ## Challenges - Where does the impulse publish environments come from? - What incentivizes a researcher to care about reproducibility of results? --- ## A message from our Sponsors: REMS - Resource Entitlement Management System - [www.csc.fi/rems](http://www.csc.fi/rems) - Models an application process to access a **resource** - Federated Identity - Configurable application workflow - Does not limit how resource should be delivered - Version 1 is LGPL, Version 2 is under the process of being opened with a permissive license
--- class: split-60-40 ## Coming to a city near you .column[
### Challenge: disciplines are at different levels ] .column[ - [Aarhus, Oct 24-26](http://coderefinery.org/workshops/2017-10-24-aarhus/) - Linköping, Nov 7-9 - Espoo, Dec 12-14 - Trondheim, Feb 2018 - Turku, Mar 2018 - Odense, Apr 2018 - Uppsala, May 2018 - Oslo Jun 2018 - Reykjavik Aug 2018 ### Seminars and meetups - [Umeå, Oct 16](http://coderefinery.org/events/2017-10-16-umea/) Git-themed one day workshop #### [Find them here](http://coderefinery.org/workshops/) #### [Sign up to be notified](http://tinyurl.com/CodeRefineryNotify) ] --- class: split-50-50 .column[ ## Get in touch! ## [coderefinery.org](http://coderefinery.org) ## [@coderefine](https://twitter.com/coderefine) ## [github.com/coderefinery](https://github.com/coderefinery) ## [coderefinery@googlegroups.com](https://groups.google.com/group/coderefinery) ] .column[
] --- template: inverse ## In case we have time: one more thing ## [Notebooks.csc.fi](https://notebooks.csc.fi)