{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Numerical Computation\n", "=====================\n", "\n", "Numbers and numbers\n", "-------------------\n", "\n", "We have already seen ([Numbers](03-data-types-structures.ipynb#Numbers)) that Python knows different *types* of numbers:\n", "\n", "- `float`ing point numbers such as 3.14\n", "\n", "- `int`egers such as 42\n", "\n", "- `complex` numbers such as 3.14 + 1*j*\n", "\n", "### Limitations of number types\n", "\n", "#### Limitations of `int`s\n", "\n", "Mathematics provides the infinite set of natural numbers ℕ = {1, 2, 3, …}. Because the computer has *finite* size, it is impossible to represent all of these numbers in the computer. Instead, only a small subset of numbers is represented.\n", "\n", "The `int`-type can (usually[3]) represent numbers between -2147483648 and +2147483647 and corresponds to 4 bytes (that’s 4\\*8 bit, and 232 = 4294967296 which is the range from -2147483648 and +2147483647).\n", "\n", "You can imagine that the hardware uses a table like this to encode integers using bits (suppose for simplicity we use only 8 bits for this):\n", "\n", "| natural number| bit-representation|\n", "|---------------:|-------------------:|\n", "| 0| 00000000|\n", "| 1| 00000001|\n", "| 2| 00000010|\n", "| 3| 00000011|\n", "| 4| 00000100|\n", "| 5| 00000101|\n", "| ⋮| ⋮|\n", "| 254| 11111110|\n", "| 255| 11111111|\n", "\n", "Using 8 bit we can represent 256 natural numbers (for example from 0 to 255) because we have 28 = 256 different ways of combining eight 0s and 1s.\n", "\n", "We could also use a slightly different table to describe 256 integer numbers ranging, for example, from -127 to +128.\n", "\n", "This is *in principle* how integers are represented in the computer. Depending on the number of bytes used, only integer numbers between a minimum and a maximum value can be represented. On today’s hardware, it is common to use 4 or 8 bytes to represent one integer, which leads exactly to the minimum and maximum values of -2147483648 and +2147483647 as shown above for 4 bytes, and +9223372036854775807 as the maximum integer for 8 bytes (that’s ≈9.2 ⋅ 1018).\n", "\n", "#### Limitations of `float`s\n", "\n", "The floating point numbers in a computer are not the same as the mathematical floating point numbers. (This is exactly the same as the (mathematical) integer numbers not being the same as the integer numbers in a computer: only a *subset* of the infinite set of integer numbers can be represented by the `int` data type as shown in [Numbers and numbers](#Numbers-and-numbers)). So how are floating point numbers represented in the computer?\n", "\n", "- Any real number *x* can be written as\n", " *x* = *a* ⋅ 10*b*\n", " where *a* is the mantissa and *b* the exponent.\n", "\n", "- Examples:\n", "\n", "| x | a | b |\n", "|-----------------------------------|---------|----|\n", "| 123.45 = 1.23456 ⋅ 102 | 1.23456 | 2 |\n", "| 1000000 = 1.0 ⋅ 106 | 1.00000 | 6 |\n", "| 0.0000024 = 2.4 ⋅ 10-6 | 2.40000 | -6 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Therefore, we can use 2 integers to encode one floating point number!\n", "\n", " *x* = *a* ⋅ 10*b*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Following (roughly) the IEEE-754 standard, one uses 8 bytes for one float *x*: these 64 bits are split as\n", "\n", " - 10 bit for the exponent *b* and\n", "\n", " - 54 bit for the mantissa *a*.\n", "\n", "This results in\n", "\n", "- largest possible float ≈10308 (quality measure for *b*)\n", "\n", "- smallest possible (positive) float ≈10−308 (quality measure for *b*)\n", "\n", "- distance between 1.0 and next larger number ≈10−16 (quality measure for *a*)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this is *in principle* how floating point numbers are stored (it is actually a bit more complicated).\n", "\n", "#### Limitations of `complex` numbers\n", "\n", "The `complex` number type has essentially the same limitations as the `float` data type (see [limitations of floats](#Limitations-of-floats)) because a complex number consists of two `floats`: one represents the real part, the other one the imaginary part.\n", "\n", "#### …are these number types of practical value?\n", "\n", "In practice, we do not usually find numbers in our daily life that exceed 10300 (this is a number with 300 zeros!), and therefore the floating point numbers cover the range of numbers we usually need.\n", "\n", "However, be warned that in scientific computation small and large numbers are used which may (often in intermediate results) exceed the range of floating point numbers.\n", "\n", "- Imagine for example, that we have to take the fourth power of the constant ℏ = 1.0545716 ⋅ 10−34*k**g**m*2/*s*:\n", "\n", "- ℏ4 = 1.2368136958909421 ⋅ 10−136*k**g*4*m*8/*s*4 which is “halfway” to our representable smallest positive float of the order of 10−308.\n", "\n", "If there is any danger that we might exceed the range of the floating point numbers, we have to *rescale* our equations so that (ideally) all numbers are of order unity. Rescaling our equations so that all relevant numbers are approximately 1 is also useful in debugging our code: if numbers much greater or smaller than 1 appear, this may be an indication of an error.\n", "\n", "### Using floating point numbers (carelessly)\n", "\n", "We know already that we need to take care that our floating point values do not exceed the range of floating point numbers that can be represented in the computer.\n", "\n", "There is another complication due to the way floating point numbers have to be represented internally: not all floating point numbers can be represented exactly in the computer. The number 1.0 can be represented exactly but the numbers 0.1, 0.2 and 0.3 cannot:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'1.00000000000000000000'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'%.20f' % 1.0" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'0.10000000000000000555'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'%.20f' % 0.1" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'0.20000000000000001110'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'%.20f' % 0.2" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'0.29999999999999998890'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'%.20f' % 0.3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead, the floating point number “nearest” to the real number is chosen.\n", "\n", "This can cause problems. Suppose we need a loop where x takes values 0.1, 0.2, 0.3, …, 0.9, 1.0. We might be tempted to write something like this:\n", "\n", "```python\n", "x = 0.0\n", "while not x == 1.0:\n", " x = x + 0.1\n", " print ( \" x =%19.17f\" % ( x ))\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, this loop will never terminate. Here are the first 11 lines of output of the program:\n", "\n", " x=0.10000000000000001\n", " x=0.20000000000000001\n", " x=0.30000000000000004\n", " x=0.40000000000000002\n", " x= 0.5\n", " x=0.59999999999999998\n", " x=0.69999999999999996\n", " x=0.79999999999999993\n", " x=0.89999999999999991\n", " x=0.99999999999999989\n", " x=1.09999999999999987" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because the variable `x` never takes exactly the value 1.0, the while loop will continue forever.\n", "\n", "Thus: *Never compare two floating point numbers for equality.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using floating point numbers carefully 1\n", "\n", "There are a number of alternative ways to solve this problem. For example, we can compare the distance between two floating point numbers:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " x =0.10000000000000001\n", " x =0.20000000000000001\n", " x =0.30000000000000004\n", " x =0.40000000000000002\n", " x =0.50000000000000000\n", " x =0.59999999999999998\n", " x =0.69999999999999996\n", " x =0.79999999999999993\n", " x =0.89999999999999991\n", " x =0.99999999999999989\n" ] } ], "source": [ "x = 0.0\n", "while abs(x - 1.0) > 1e-8:\n", " x = x + 0.1\n", " print ( \" x =%19.17f\" % ( x ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using floating point numbers carefully 2\n", "\n", "Alternatively, we can (for this example) iterate over a sequence of integers and compute the floating point number from the integer:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " x =0.10000000000000001\n", " x =0.20000000000000001\n", " x =0.30000000000000004\n", " x =0.40000000000000002\n", " x =0.50000000000000000\n", " x =0.60000000000000009\n", " x =0.70000000000000007\n", " x =0.80000000000000004\n", " x =0.90000000000000002\n", " x =1.00000000000000000\n" ] } ], "source": [ "for i in range (1 , 11):\n", " x = i * 0.1\n", " print(\" x =%19.17f\" % ( x ))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x=0.10000000000000001\n", "x=0.20000000000000001\n", "x=0.30000000000000004\n", "x=0.40000000000000002\n", "x= 0.5\n", "x=0.60000000000000009\n", "x=0.70000000000000007\n", "x=0.80000000000000004\n", "x=0.90000000000000002\n", "x= 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we compare this with the output from the program in [Using floating point numbers (carelessly)](#Using-floating-point-numbers-(carelessly)), we can see that the floating point numbers differ. This means that – in a numerical calculation – it is not true that 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 = 1.0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Symbolic calculation\n", "\n", "Using the sympy package we have arbitrary precision. Using `sympy.Rational`, we can define the fraction 1/10 exactly symbolically. Adding this 10 times will lead exactly to the value 1, as demonstrated by this script" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Current x=1/10 = 0.1 \n", " Reached x=1/10 \n", "Current x= 1/5 = 0.2 \n", " Reached x=1/5 \n", "Current x=3/10 = 0.3 \n", " Reached x=3/10 \n", "Current x= 2/5 = 0.4 \n", " Reached x=2/5 \n", "Current x= 1/2 = 0.5 \n", " Reached x=1/2 \n", "Current x= 3/5 = 0.6 \n", " Reached x=3/5 \n", "Current x=7/10 = 0.7 \n", " Reached x=7/10 \n", "Current x= 4/5 = 0.8 \n", " Reached x=4/5 \n", "Current x=9/10 = 0.9 \n", " Reached x=9/10 \n", "Current x= 1 = 1.0 \n", " Reached x=1 \n" ] } ], "source": [ "from sympy import Rational\n", "dx = Rational (1 ,10)\n", "x = 0\n", "while x != 1.0:\n", " x = x + dx\n", " print(\"Current x=%4s = %3.1f \" % (x , x . evalf ()))\n", " print(\" Reached x=%s \" % x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, this symbolic calculation is much slower as it is done through software rather than the CPU-based floating point operations. The next program approximates the relative performances:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loop using float dx:\n", " deviation is -1.88483681995422e-08\n", "float loop n=100000 takes 0.00778 seconds\n", "loop using sympy symbolic dx:\n", " deviation is 0\n", "sympy loop n = 100000 takes 1.61389 seconds\n", "Symbolic loop is a factor 207.4 slower.\n" ] } ], "source": [ "from sympy import Rational\n", "dx_symbolic = Rational (1 ,10)\n", "dx = 0.1\n", "\n", "def loop_sympy (n):\n", " x = 0\n", " for i in range(n):\n", " x = x + dx_symbolic\n", " return x\n", "\n", "def loop_float(n):\n", " x =0\n", " for i in range(n):\n", " x = x + dx\n", " return x\n", "\n", "def time_this (f, n):\n", " import time\n", " starttime = time.time()\n", " result = f(n)\n", " stoptime = time.time()\n", " print(\" deviation is %16.15g\" % ( n * dx_symbolic - result ))\n", " return stoptime - starttime\n", "\n", "n = 100000\n", "print(\"loop using float dx:\")\n", "time_float = time_this(loop_float, n)\n", "print(\"float loop n=%d takes %6.5f seconds\" % (n, time_float))\n", "print(\"loop using sympy symbolic dx:\")\n", "time_sympy = time_this (loop_sympy, n)\n", "print(\"sympy loop n =% d takes %6.5f seconds\" % (n , time_sympy ))\n", "print(\"Symbolic loop is a factor %.1f slower.\" % ( time_sympy / time_float ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is of course an artificial example: we have added the symbolic code to demonstrate that these round off errors originate from the approximative representation of floating point numbers in the hardware (and thus programming languages). We can, in principle, avoid these complications by computing using symbolic expressions, but this is in practice too slow.[4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "In summary, we have learned that\n", "\n", "- floating point numbers and integers used in numeric computation are generally quite different from “mathematical numbers” (symbolic calculations are exact and use the “mathematical numbers”):\n", "\n", " - there is a maximum number and a minimum number that can be represented (for both integers and floating point numbers)\n", "\n", " - within this range, not every floating point number can be represented in the computer.\n", "\n", "- We deal with this limitation by:\n", "\n", " - never comparing two floating point numbers for equality (instead we compute the absolute value of the difference)\n", "\n", " - use of algorithms that are *stable* (this means that small deviations from correct numbers can be corrected by the algorithm. We have not yet shown any such examples this document.)\n", "\n", "- Note that there is a lot more to be said about numerical and algorithmic tricks and methods to make numeric computation as accurate as possible but this is outside the scope of this section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: infinite or finite loop\n", "\n", "1. What does the following piece of code compute? Will the loop ever finish? Why?\n", "\n", "```python\n", "eps = 1.0\n", "while 1.0 + eps > 1.0:\n", " eps = eps / 2.0\n", "print(eps)\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.4" } }, "nbformat": 4, "nbformat_minor": 1 }