{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Common tasks\n", "============\n", "\n", "Here we provide a selection of small example programs addressing some common tasks and just providing some more Python code that can be read if seeking inspiration how to address a given problem.\n", "\n", "Many ways to compute a series\n", "-----------------------------\n", "\n", "As an example, we compute the sum of odd numbers in different ways." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "this result is 12, expected to be 12\n" ] } ], "source": [ "def compute_sum1(n):\n", " \"\"\"computes and returns the sum of 2,4,6, ..., m\n", " where m is the largest even number smaller than n.\n", "\n", " For example, with n = 7, we compute 0+2+4+6 = 12.\n", "\n", " This implementation uses a variable 'mysum' that is\n", " increased in every iteration of the for-loop.\"\"\"\n", "\n", " mysum = 0\n", " for i in range(0, n, 2):\n", " mysum = mysum + i\n", " return mysum\n", "\n", "\n", "def compute_sum2(n):\n", " \"\"\"computes and returns ...\n", "\n", " This implementation uses a while-loop:\n", " \"\"\"\n", "\n", " counter = 0\n", " mysum = 0\n", " while counter < n:\n", " mysum = mysum + counter\n", " counter = counter + 2\n", "\n", " return mysum\n", "\n", "\n", "def compute_sum3(n, startfrom=0):\n", " \"\"\"computes and returns ...\n", "\n", " This is a recursive implementation:\"\"\"\n", "\n", " if n <= startfrom:\n", " return 0\n", " else:\n", " return startfrom + compute_sum3(n, startfrom + 2)\n", "\n", "\n", "def compute_sum4a(n):\n", " \"\"\"A functional approach ... this seems to be\n", " the shortest and most concise code.\n", " \"\"\"\n", " return sum(range(0, n, 2))\n", "\n", "from functools import reduce\n", "def compute_sum4b(n):\n", " \"\"\"A functional approach ... not making use of 'sum' which\n", " happens to exist and is of course convenient here.\n", " \"\"\"\n", " return reduce(lambda a, b: a + b, range(0, n, 2))\n", "\n", "\n", "def compute_sum4c(n):\n", " \"\"\"A functional approach ... a bit faster than compute_sum4b\n", " as we avoid using lambda.\n", " \"\"\"\n", " import operator\n", " return reduce(operator.__add__, range(0, n, 2))\n", "\n", "\n", "def compute_sum4d(n):\n", " \"\"\"Using list comprehension.\"\"\"\n", " return sum([k for k in range(0, n, 2)])\n", "\n", "\n", "def compute_sum4e(n):\n", " \"\"\"Using another variation of list comprehension.\"\"\"\n", " return sum([k for k in range(0, n) if k % 2 == 0])\n", "\n", "\n", "def compute_sum5(n):\n", " \"\"\"Using numerical python (numpy). This is very fast\n", " (but would only pay off if n >> 10).\"\"\"\n", " import numpy\n", " return numpy.sum(2 * numpy.arange(0, (n + 1) // 2))\n", "\n", "\n", "def test_consistency():\n", " \"\"\"Check that all compute_sum?? functions in this file produce\n", " the same answer for all n>=2 and =2. Raise AssertionError if outputs disagree.\"\"\"\n", " funcs = [compute_sum1, compute_sum2, compute_sum3,\n", " compute_sum4a, compute_sum4b, compute_sum4c,\n", " compute_sum4d, compute_sum4e, compute_sum5]\n", " ans1 = compute_sum1(n)\n", " for f in funcs[1:]:\n", " assert ans1 == f(n), \"%s(n)=%d not the same as %s(n)=%d \" \\\n", " % (funcs[0], funcs[0](n), f, f(n))\n", "\n", " #main testing loop in test_consistency function\n", " for n in range(2, 1000):\n", " check_one_n(n)\n", "\n", "if __name__ == \"__main__\": \n", " m = 7\n", " correct_result = 12\n", " thisresult = compute_sum1(m)\n", " print(\"this result is {}, expected to be {}\".format(\n", " thisresult, correct_result))\n", " # compare with correct result\n", " assert thisresult == correct_result\n", " # also check all other methods\n", " assert compute_sum2(m) == correct_result\n", " assert compute_sum3(m) == correct_result\n", " assert compute_sum4a(m) == correct_result\n", " assert compute_sum4b(m) == correct_result\n", " assert compute_sum4c(m) == correct_result\n", " assert compute_sum4d(m) == correct_result\n", " assert compute_sum4e(m) == correct_result\n", " assert compute_sum5(m) == correct_result\n", "\n", " # a more systematic check for many values\n", " test_consistency()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the different implementations shown above compute the same result. There are a number of things to be learned from this:\n", "\n", "- There are a large (probably an infinite) number of solutions for one given problem. (This means that writing programs is a task that requires creativity!)\n", "\n", "- These may achieve the same ’result’ (in this case computation of a number).\n", "\n", "- Different solutions may have different characteristics. They might:\n", "\n", " - be faster or slower\n", "\n", " - use less or more memory\n", "\n", " - are easier or more difficult to understand (when reading the source code)\n", "\n", " - can be considered more or less elegant." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting\n", "-------\n", "\n", "Suppose we need to sort a list of 2-tuples of user-ids and names, i.e." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mylist = [(\"fangohr\", \"Hans Fangohr\",),\n", " (\"admin\", \"The Administrator\"),\n", " (\"guest\", \"The Guest\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which we want to sort in increasing order of user-ids. If there are two or more identical user-ids, they should be ordered by the order of the names associated with these user-ids. This behaviour is just the default behaviour of sort (which goes back to how to sequences are compared)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('admin', 'The Administrator'), ('fangohr', 'Hans Fangohr'), ('guest', 'The Guest')]\n" ] } ], "source": [ "stuff = mylist # collect your data\n", "stuff.sort() # sort the data in place\n", "print(stuff) # inspect the sorted data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sequences are compared by initially comparing the first elements only. If they differ, then a decision is reached on the basis of those elements only. If the elements are equal, only then are the next elements in the sequence compared ... and so on, until a difference is found, or we run out of elements. For example:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2,0) > (1,0)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2,1) > (1,3)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2,1) > (2,1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2,2) > (2,1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to do:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stuff = sorted(stuff)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Where the list is not particularly large, it is generally advisable to use the `sorted` function (which *returns a sorted copy* of the list) over the `sort` method of a list (which changes the list into sorted order of elements, and returns None).\n", "\n", "However, what if the data we have is stored such that in each tuple in the list, the name comes first, followed by the id, i.e.:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mylist2 = [(\"Hans Fangohr\", \"fangohr\"),\n", " (\"The Administrator\", \"admin\"),\n", " (\"The Guest\", \"guest\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to sort with the id as the primary key. The first approach to do this is to change the order of `mylist2` to that of `mylist`, and use `sort` as shown above.\n", "\n", "The second, neater approach relies on being able to decypher the cryptic help for the sorted function. `list.sort()` has the same options, but its help is less helpful." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function sorted in module builtins:\n", "\n", "sorted(...)\n", " sorted(iterable, key=None, reverse=False) --> new sorted list\n", "\n" ] } ], "source": [ "help(sorted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should notice that `sorted` and `list.sort` have two keyword parameters. The first of these is called key. You can use this to supply a *key function*, which will be used to transform the items for sort to compare.\n", "\n", "Let’s illustrate this in the context of our exercise, by assuming that we have stored a list of pairs like this\n", "\n", " pair = name, id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(i.e. as in `mylist2`) and that we want to sort according to id and ignore name. We can achieve this by writing a function that retrieves only the second element of the pair it receives:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def my_key(pair):\n", " return pair[1]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mylist2.sort(key=my_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This also works with an anonymous function:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mylist2.sort(key=lambda p: p[1]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Efficiency\n", "\n", "The `key` function will be called exactly once for every element in the list. This is much more efficient than calling a function on every *comparison* (which was how you customised sorting in older versions of Python). But if you have a large list to sort, the overhead of calling a Python function (which is relatively large compared to the C function overhead) might be noticeable.\n", "\n", "If efficiency is really important (and you have proven that a significant proportion of time is spent in these functions) then you have the option of re-coding them in C (or another low-level language)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.4" } }, "nbformat": 4, "nbformat_minor": 1 }