Tool topics


Please find below a list of possible topic areas. For each of these topic areas, we will discuss more detailed ideas and planning in lecture two. If you have your own ideas that are not covered by the list below, please discuss with the module coordinator. You can have a look at the current stage of planning at the Trell Board and even contribute (after joining the board using your university email).

Note: it may not be to your advantage to be an "expert" in the
topic you choose. As you are to present and teach others this topic, you must know what others will find difficult to understand. On the other hand, having a broader understanding of the topic may help you to provide motivation and context, and choose educational examples that also have practical value.

1   Version control / Continuous Integration

We have already used the basics of version control for some course material using Git. However, there is much that can be done than the basics we have covered so far. Starting from eg Pro Git or the detailed help pages on github or bitbucket, investigate, evaluate and present the uses of the more complex aspects of version control systems, focusing on Git.

Testing code is an important aspect of good software engineering practice, especially in large scientific projects. Unit testing (eg using py.test) is one step in this process, but ensuring that the testing process is kept up-to-date, and checked, is more complex. One approach is continuous integration, where the tests are automatically run (either at fixed time intervals, or whenever the code is changed).

Possible topics could be:

  1. Demonstrate how continuous integration can be set up for gitlab.
  2. Introduce the buildbot project (that allows to run continuous integration locally)

2   Programming languages

We have focused on Python and C so far, given their ubiquity and broad range of libraries. However, other languages may have major advantages. Haskell is the epitome of functional programming, a rather different approach to coding. You could introduce it, commenting on its role in scientific computing where appropriate.

The usability of a particular programming language quite often depends on the availability of a library that makes your life easier and keeps you productive. There are a large number of useful libraries out there. You could present one and showcase its abilities and limitations.

3   Text editors / IDEs

When writing code, you produce text. The tool you use to produce the code has a major imact on your productivity as well as the quality and readability of what you produce.

There is a large number of text editors around that all have their fan base. Some are simple with limited (but usually well design) capabilities. Others are the swiss army knife of the ASCII ninja.

An Integrated Development Environment is one of the most most effective ways of constructing code. Somewhat well known IDEs with varying ranges of functionality include Eclipse, Visual Studio, Sublime Text, Emacs, VIM and Spyder.

You should investigate and introduce PyCharm, a sophisticated IDE for python, and evaluate (dis)advantages of IDEs that you can identify with your current knowledge of related tools.

4   Visualization tools

Understanding the results of simulations can be complex, and a good visualization can be incredibly useful. So far we have focused on simple packages such as matplotlib. There are many other options available.

5   Containers / Cloud

Getting complex software up and running, especially when it has many dependencies, can be lengthy and problematic. One solution is a virtual machine: a sandboxed environment that can be installed for tackling a particular problem, that can be downloaded and used without having to worry about the hardware it runs on.

A recently becoming very popular alternative approach is that of 'containers', mostly widely used through the Docker software: docker.

Demonstrate how Docker containers can be used, and created.

Let's assume we have developed a containerized code that is ready to be used. Now you want to deploy your workflows in an HPC environment, where docker is not available. Singlarity has been developed for this purpose.

6   Jupyter tools

1. The IPython notebook (now Jupyter Notebook) is now used very widely. There are a number of additional tools and commands that are not as widely known. Introduce some of them, for example

  • jupyter nbconvert: converting notebooks into html, latex, pdf and custom formats
  • nbdime: diffing and merging notebooks (can be linked to git)
  • nbval: doctests for notebooks
  1. Introduce JupyterHub

    Description, use cases and installation.

  2. Introduce JupyterLab

  3. Introduce JupyterLab - the next generation Notebook

  4. Introduce Hydrogen

7   Big Data and Machine Learning

Big Data and Machine Learning approaches to modelling complex systems have gained tracktion in recent years with major software and hardware efforts to make the approach computationally efficient. There is a number of machine learning software stacks (incl. scikit-learn and h2o) as well as Big Data platforms such as Apache Hadoop.

8   Other ideas

8.1   Python package management

Deliver one of both of the following topics in one session:

  1. Python installation of packages using the Python Packaging Index and PIP

    setup.py, requirements, how to search, install, upgrade, uninstall, install from github or local repo, install as editable (-e). Write a setup.py file for a project of our choice and install it locally.

    How, in princple, to add a package to Python packaging index (maybe no student exercise for this to avoid creating too many fake packages - they are hard to remove once created)

  2. conda (Anaconda's package manager).

    • How to search, install, remove and upgrade packages.
    • conda environments: how to create, use, remove. What's the purpose
    • interaction of conda and pip - best practice

8.2   Reproducible computing

The Recomputation Manifesto is one aspect of an ongoing worry in scientific computing: how do we check the results of science involving computation? In principle this should be straightforward: in reality, many papers are impossible to reproduce only shortly after release. There are many tools and social efforts attempting to mitigate this, which should be investigated and evaluated. A starting point might be Active Papers, Jupyter Notebooks, Containers.

8.3   Productivity and project management tools

Dealing with note taking, large numbers of competings tasks, sets of deadlines, and long term planning is challenging. A number of tools and methods have become available. Some key ideas will be covered by the lecturers, for example the Getting Things Done strategy. There are many software solutions available to support an organised professional lifestyle. A particularly powerful tool, writen by an Academic for academic workflows, is Emacs' orgmode. You could introduce it and highlight how it's features support productivity strategies such as GTD.