Data Types and Data Structures

What type is it?

Python knows different data types. To find the type of a variable, use the type() function:

In [1]:
a = 45
type(a)
Out[1]:
int
In [2]:
b = 'This is a string'
type(b)
Out[2]:
str
In [3]:
c = 2 + 1j
type(c)
Out[3]:
complex
In [4]:
d = [1, 3, 56]
type(d)
Out[4]:
list

Numbers

Further information

The in-built numerical types are integers and floating point numbers (see floating point numbers) and complex floating point numbers (complex numbers).

Integers

We have seen the use of integer numbers already in Chapter 2. Be aware of integer division problems (integer division).

If we need to convert string containing an integer number to an integer we can use int() function:

In [5]:
a = '34'       # a is a string containing the characters 3 and 4
x = int(a)     # x is in integer number

The function int() will also convert floating point numbers to integers:

In [6]:
int(7.0)
Out[6]:
7
In [7]:
int(7.9)
Out[7]:
7

Note than int will truncate any non-integer part of a floating point number. To round an floating point number to an integer, use the round() command:

In [8]:
round(7.9)
Out[8]:
8

Integer limits

Integers in Python 3 are unlimited; Python will automatically assign more memory as needed as the numbers get bigger. This means we can calculate very large numbers with no special steps.

In [9]:
35**42
Out[9]:
70934557307860443711736098025989133248003781773149967193603515625

In many other programming languages, such as C and FORTRAN, integers are a fixed size—most frequently 4 bytes, which allows $2^{32}$ different values—but different types are available with different sizes. For numbers that fit into these limits, calculations can be faster, but you may need to check that the numbers don't go beyond the limits. Calculating a number beyond the limits is called integer overflow, and may produce bizarre results.

Even in Python, we need to be aware of this when we use numpy (see Chapter 14). Numpy uses integers with a fixed size, because it stores many of them together and needs to calculate with them efficiently. Numpy data types include a range of integer types named for their size, so e.g. int16 is a 16-bit integer, with $2^{16}$ possible values.

Integer types can also be signed or unsigned. Signed integers allow positive or negative values, unsigned integers only allow positive ones. For instance:

  • uint16 (unsigned) ranges from 0 to $2^{16}-1$
  • int16 (signed) ranges from $-2^{15}$ to $2^{15}-1$

Floating Point numbers

A string containing a floating point number can be converted into a floating point number using the float() command:

In [10]:
a = '35.342'
b = float(a)
b
Out[10]:
35.342
In [11]:
type(b)
Out[11]:
float

Complex numbers

Python (as Fortran and Matlab) has built-in complex numbers. Here are some examples how to use these:

In [12]:
x = 1 + 3j
x
Out[12]:
(1+3j)
In [13]:
abs(x)               # computes the absolute value
Out[13]:
3.1622776601683795
In [14]:
x.imag
Out[14]:
3.0
In [15]:
x.real
Out[15]:
1.0
In [16]:
x * x
Out[16]:
(-8+6j)
In [17]:
x * x.conjugate()
Out[17]:
(10+0j)
In [18]:
3 * x
Out[18]:
(3+9j)

Note that if you want to perform more complicated operations (such as taking the square root, etc) you have to use the cmath module (Complex MATHematics):

In [19]:
import cmath
cmath.sqrt(x)
Out[19]:
(1.442615274452683+1.0397782600555705j)

Functions applicable to all types of numbers

The abs() function returns the absolute value of a number (also called modulus):

In [20]:
a = -45.463
abs(a)
Out[20]:
45.463

Note that abs() also works for complex numbers (see above).

Sequences

Strings, lists and tuples are sequences. They can be indexed and sliced in the same way.

Tuples and strings are “immutable” (which basically means we can’t change individual elements within the tuple, and we cannot change individual characters within a string) whereas lists are “mutable” (.i.e we can change elements in a list.)

Sequences share the following operations

`a[i]`returns *i*-th element of `a`
`a[i:j]`returns elements *i* up to *j* − 1
`len(a)`returns number of elements in sequence
`min(a)`returns smallest value in sequence
`max(a)`returns largest value in sequence
`x in a`returns `True` if `x` is element in `a`
`a + b`concatenates `a` and `b`
`n * a`creates `n` copies of sequence `a`

Sequence type 1: String

Further information

A string is a (immutable) sequence of characters. A string can be defined using single quotes:

In [21]:
a = 'Hello World'

double quotes:

In [22]:
a = "Hello World"

or triple quotes of either kind

In [23]:
a = """Hello World"""
a = '''Hello World'''

The type of a string is str and the empty string is given by "":

In [24]:
a = "Hello World"
type(a)
Out[24]:
str
In [25]:
b = ""
type(b)
Out[25]:
str
In [26]:
type("Hello World")
Out[26]:
str
In [27]:
type("")
Out[27]:
str

The number of characters in a string (that is its length) can be obtained using the len()-function:

In [28]:
a = "Hello Moon"
len(a)
Out[28]:
10
In [29]:
a = 'test'
len(a)
Out[29]:
4
In [30]:
len('another test')
Out[30]:
12

You can combine (“concatenate”) two strings using the + operator:

In [31]:
'Hello ' + 'World'
Out[31]:
'Hello World'

Strings have a number of useful methods, including for example upper() which returns the string in upper case:

In [32]:
a = "This is a test sentence."
a.upper()
Out[32]:
'THIS IS A TEST SENTENCE.'

A list of available string methods can be found in the Python reference documentation. If a Python prompt is available, one should use the dir and help function to retrieve this information, i.e. dir() provides the list of methods, help can be used to learn about each method.

A particularly useful method is split() which converts a string into a list of strings:

In [33]:
a = "This is a test sentence."
a.split()
Out[33]:
['This', 'is', 'a', 'test', 'sentence.']

The split() method will separate the string where it finds white space. White space means any character that is printed as white space, such as one space or several spaces or a tab.

By passing a separator character to the split() method, a string can split into different parts. Suppose, for example, we would like to obtain a list of complete sentences:

In [34]:
a = "The dog is hungry. The cat is bored. The snake is awake."
a.split(".")
Out[34]:
['The dog is hungry', ' The cat is bored', ' The snake is awake', '']

The opposite string method to split is join which can be used as follows:

In [35]:
a = "The dog is hungry. The cat is bored. The snake is awake."
s = a.split('.')
s
Out[35]:
['The dog is hungry', ' The cat is bored', ' The snake is awake', '']
In [36]:
".".join(s)
Out[36]:
'The dog is hungry. The cat is bored. The snake is awake.'
In [37]:
" STOP".join(s)
Out[37]:
'The dog is hungry STOP The cat is bored STOP The snake is awake STOP'

Sequence type 2: List

Further information

A list is a sequence of objects. The objects can be of any type, for example integers:

In [38]:
a = [34, 12, 54]

or strings:

In [39]:
a = ['dog', 'cat', 'mouse']

An empty list is presented by []:

In [40]:
a = []

The type is list:

In [41]:
type(a)
Out[41]:
list
In [42]:
type([])
Out[42]:
list

As with strings, the number of elements in a list can be obtained using the len() function:

In [43]:
a = ['dog', 'cat', 'mouse']
len(a)
Out[43]:
3

It is also possible to mix different types in the same list:

In [44]:
a = [123, 'duck', -42, 17, 0, 'elephant']

In Python a list is an object. It is therefor possible for a list to contain other lists (because a list keeps a sequence of objects):

In [45]:
a = [1, 4, 56, [5, 3, 1], 300, 400]

You can combine (“concatenate”) two lists using the + operator:

In [46]:
[3, 4, 5] + [34, 35, 100]
Out[46]:
[3, 4, 5, 34, 35, 100]

Or you can add one object to the end of a list using the append() method:

In [47]:
a = [34, 56, 23]
a.append(42)
a
Out[47]:
[34, 56, 23, 42]

You can delete an object from a list by calling the remove() method and passing the object to delete. For example:

In [48]:
a = [34, 56, 23, 42]
a.remove(56)
a
Out[48]:
[34, 23, 42]

The range() command

A special type of list is frequently required (often together with for-loops) and therefor a command exists to generate that list: the range(n) command generates integers starting from 0 and going up to but not including n. Here are a few examples:

In [49]:
list(range(3))
Out[49]:
[0, 1, 2]
In [50]:
list(range(10))
Out[50]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This command is often used with for loops. For example, to print the numbers 02,12,22,32,…,102, the following program can be used:

In [51]:
for i in range(11):
    print(i ** 2)
0
1
4
9
16
25
36
49
64
81
100

The range command takes an optional parameter for the beginning of the integer sequence (start) and another optional parameter for the step size. This is often written as range([start],stop,[step]) where the arguments in square brackets (i.e. start and step) are optional. Here are some examples:

In [52]:
list(range(3, 10))            # start=3
Out[52]:
[3, 4, 5, 6, 7, 8, 9]
In [53]:
list(range(3, 10, 2))         # start=3, step=2
Out[53]:
[3, 5, 7, 9]
In [54]:
list(range(10, 0, -1))        # start=10,step=-1
Out[54]:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Why are we calling list(range())?

In Python 3, range() generates the numbers on demand. When you use range() in a for loop, this is more efficient, because it doesn't take up memory with a list of numbers. Passing it to list() forces it to generate all of its numbers, so we can see what it does.

To get the same efficient behaviour in Python 2, use xrange() instead of range().

Sequence type 3: Tuples

A tuple is a (immutable) sequence of objects. Tuples are very similar in behaviour to lists with the exception that they cannot be modified (i.e. are immutable).

For example, the objects in a sequence can be of any type:

In [55]:
a = (12, 13, 'dog')
a
Out[55]:
(12, 13, 'dog')
In [56]:
a[0]
Out[56]:
12

The parentheses are not necessary to define a tuple: just a sequence of objects separated by commas is sufficient to define a tuple:

In [57]:
a = 100, 200, 'duck'
a
Out[57]:
(100, 200, 'duck')

although it is good practice to include the paranthesis where it helps to show that tuple is defined.

Tuples can also be used to make two assignments at the same time:

In [58]:
x, y = 10, 20
x
Out[58]:
10
In [59]:
y
Out[59]:
20

This can be used to swap to objects within one line. For example

In [60]:
x = 1
y = 2
x, y = y, x
x
Out[60]:
2
In [61]:
y
Out[61]:
1

The empty tuple is given by ()

In [62]:
t = ()
len(t)
Out[62]:
0
In [63]:
type(t)
Out[63]:
tuple

The notation for a tuple containing one value may seem a bit odd at first:

In [64]:
t = (42,)
type(t)
Out[64]:
tuple
In [65]:
len(t)
Out[65]:
1

The extra comma is required to distinguish (42,) from (42) where in the latter case the parenthesis would be read as defining operator precedence: (42) simplifies to 42 which is just a number:

In [66]:
t = (42)
type(t)
Out[66]:
int

This example shows the immutability of a tuple:

In [67]:
a = (12, 13, 'dog')
a[0]
Out[67]:
12
In [68]:
a[0] = 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-71b8ae05f2fb> in <module>()
----> 1 a[0] = 1

TypeError: 'tuple' object does not support item assignment

The immutability is the main difference between a tuple and a list (the latter being mutable). We should use tuples when we don’t want the content to change.

Note that Python functions that return more than one value, return these in tuples (which makes sense because you don’t want these values be changed).

Indexing sequences

Further information

Individual objects in lists can be accessed by using the index of the object and square brackets ([ and ]):

In [69]:
a = ['dog', 'cat', 'mouse']
a[0]
Out[69]:
'dog'
In [70]:
a[1]
Out[70]:
'cat'
In [71]:
a[2]
Out[71]:
'mouse'

Note that Python (like C but unlike Fortran and unlike Matlab) starts counting indices from zero!

Python provides a handy shortcut to retrieve the last element in a list: for this one uses the index “-1” where the minus indicates that it is one element from the back of the list. Similarly, the index “-2” will return the 2nd last element:

In [72]:
a = ['dog', 'cat', 'mouse']
a[-1]
Out[72]:
'mouse'
In [73]:
a[-2]
Out[73]:
'cat'

If you prefer, you can think of the index a[-1] to be a shorthand notation for a[len(a) - 1].

Remember that strings (like lists) are also a sequence type and can be indexed in the same way:

In [74]:
a = "Hello World!" 
a[0]
Out[74]:
'H'
In [75]:
a[1]
Out[75]:
'e'
In [76]:
a[10]
Out[76]:
'd'
In [77]:
a[-1]
Out[77]:
'!'
In [78]:
a[-2]
Out[78]:
'd'

Slicing sequences

Further information

Slicing of sequences can be used to retrieve more than one element. For example:

In [79]:
a = "Hello World!"
a[0:3]
Out[79]:
'Hel'

By writing a[0:3] we request the first 3 elements starting from element 0. Similarly:

In [80]:
a[1:4]
Out[80]:
'ell'
In [81]:
a[0:2]
Out[81]:
'He'
In [82]:
a[0:6]
Out[82]:
'Hello '

We can use negative indices to refer to the end of the sequence:

In [83]:
a[0:-1]
Out[83]:
'Hello World'

It is also possible to leave out the start or the end index and this will return all elements up to the beginning or the end of the sequence. Here are some examples to make this clearer:

In [84]:
a = "Hello World!"
a[:5]
Out[84]:
'Hello'
In [85]:
a[5:]
Out[85]:
' World!'
In [86]:
a[-2:]
Out[86]:
'd!'
In [87]:
a[:]
Out[87]:
'Hello World!'

Note that a[:] will generate a copy of a. The use of indices in slicing is by some people experienced as counter intuitive. If you feel uncomfortable with slicing, have a look at this quotation from the Python tutorial (section 3.1.2):

The best way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of 5 characters has index 5, for example:

 +---+---+---+---+---+ 
 | H | e | l | l | o |
 +---+---+---+---+---+ 
 0   1   2   3   4   5   <-- use for SLICING
-5  -4  -3  -2  -1       <-- use for SLICING 
                                 from the end

The first row of numbers gives the position of the slicing indices 0...5 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labelled i and j, respectively.

So the important statement is that for slicing we should think of indices pointing between characters.

For indexing it is better to think of the indices referring to characters. Here is a little graph summarising these rules:

   0   1   2   3   4    <-- use for INDEXING 
  -5  -4  -3  -2  -1    <-- use for INDEXING 
 +---+---+---+---+---+          from the end
 | H | e | l | l | o |
 +---+---+---+---+---+ 
 0   1   2   3   4   5  <-- use for SLICING
-5  -4  -3  -2  -1      <-- use for SLICING 
                         from the end

If you are not sure what the right index is, it is always a good technique to play around with a small example at the Python prompt to test things before or while you write your program.

Dictionaries

Dictionaries are also called “associative arrays” and “hash tables”. Dictionaries are unordered sets of key-value pairs.

An empty dictionary can be created using curly braces:

In [88]:
d = {}

Keyword-value pairs can be added like this:

In [89]:
d['today'] = '22 deg C'    # 'today' is the keyword
In [90]:
d['yesterday'] = '19 deg C'

d.keys() returns a list of all keys:

In [91]:
d.keys()
Out[91]:
dict_keys(['yesterday', 'today'])

We can retrieve values by using the keyword as the index:

In [92]:
d['today']
Out[92]:
'22 deg C'

Other ways of populating a dictionary if the data is known at creation time are:

In [93]:
d2 = {2:4, 3:9, 4:16, 5:25}
d2
Out[93]:
{2: 4, 3: 9, 4: 16, 5: 25}
In [94]:
d3 = dict(a=1, b=2, c=3)
d3
Out[94]:
{'a': 1, 'b': 2, 'c': 3}

The function dict() creates an empty dictionary.

Other useful dictionary methods include values(), items() and get(). You can use in to check for the presence of values.

In [95]:
d.values()
Out[95]:
dict_values(['19 deg C', '22 deg C'])
In [96]:
d.items()
Out[96]:
dict_items([('yesterday', '19 deg C'), ('today', '22 deg C')])
In [97]:
d.get('today','unknown')
Out[97]:
'22 deg C'
In [98]:
d.get('tomorrow','unknown')
Out[98]:
'unknown'
In [99]:
'today' in d
Out[99]:
True
In [100]:
'tomorrow' in d
Out[100]:
False

The method get(key,default) will provide the value for a given key if that key exists, otherwise it will return the default object.

Here is a more complex example:

In [101]:
order = {}        # create empty dictionary

#add orders as they come in
order['Peter'] = 'Pint of bitter'
order['Paul'] = 'Half pint of Hoegarden'
order['Mary'] = 'Gin Tonic'

#deliver order at bar
for person in order.keys():
    print(person, "requests", order[person])
Peter requests Pint of bitter
Paul requests Half pint of Hoegarden
Mary requests Gin Tonic

Some more technicalities:

  • The keyword can be any (immutable) Python object. This includes:

    • numbers

    • strings

    • tuples.

  • dictionaries are very fast in retrieving values (when given the key)

An other example to demonstrate an advantage of using dictionaries over pairs of lists:

In [102]:
dic = {}                        #create empty dictionary

dic["Hans"]   = "room 1033"     #fill dictionary
dic["Andy C"] = "room 1031"     #"Andy C" is key
dic["Ken"]    = "room 1027"     #"room 1027" is value

for key in dic.keys():
    print(key, "works in", dic[key])
Andy C works in room 1031
Hans works in room 1033
Ken works in room 1027

Without dictionary:

In [103]:
people = ["Hans","Andy C","Ken"]
rooms  = ["room 1033","room 1031","room 1027"]

#possible inconsistency here since we have two lists
if not len( people ) == len( rooms ):
    raise RuntimeError("people and rooms differ in length")

for i in range( len( rooms ) ):
    print(people[i],"works in",rooms[i])
Hans works in room 1033
Andy C works in room 1031
Ken works in room 1027

Passing arguments to functions

This section contains some more advanced ideas and makes use of concepts that are only later introduced in this text. The section may be more easily accessible at a later stage.

When objects are passed to a function, Python always passes (the value of) the reference to the object to the function. Effectively this is calling a function by reference, although one could refer to it as calling by value (of the reference).

We review argument passing by value and reference before discussing the situation in Python in more detail.

Call by value

One might expect that if we pass an object by value to a function, that modifications of that value inside the function will not affect the object (because we don’t pass the object itself, but only its value, which is a copy). Here is an example of this behaviour (in C):

#include <stdio.h>

void pass_by_value(int m) {
  printf("in pass_by_value: received m=%d\n",m);
  m=42;
  printf("in pass_by_value: changed to m=%d\n",m);
}

int main(void) {
  int global_m = 1;
  printf("global_m=%d\n",global_m);
  pass_by_value(global_m);
  printf("global_m=%d\n",global_m);
  return 0;
}

together with the corresponding output:

global_m=1
in pass_by_value: received m=1
in pass_by_value: changed to m=42
global_m=1


The value 1 of the global variable global_m is not modified when the function pass_by_value changes its input argument to 42.

Call by reference

Calling a function by reference, on the other hand, means that the object given to a function is a reference to the object. This means that the function will see the same object as in the calling code (because they are referencing the same object: we can think of the reference as a pointer to the place in memory where the object is located). Any changes acting on the object inside the function, will then be visible in the object at the calling level (because the function does actually operate on the same object, not a copy of it).

Here is one example showing this using pointers in C:

#include <stdio.h>

void pass_by_reference(int *m) {
  printf("in pass_by_reference: received m=%d\n",*m);
  *m=42;
  printf("in pass_by_reference: changed to m=%d\n",*m);
}

int main(void) {
  int global_m = 1;
  printf("global_m=%d\n",global_m);
  pass_by_reference(&global_m);
  printf("global_m=%d\n",global_m);
  return 0;
}

together with the corresponding output:

global_m=1
in pass_by_reference: received m=1
in pass_by_reference: changed to m=42
global_m=42

C++ provides the ability to pass arguments as references by adding an ampersand in front of the argument name in the function definition:

#include <stdio.h>

void pass_by_reference(int &m) {
  printf("in pass_by_reference: received m=%d\n",m);
  m=42;
  printf("in pass_by_reference: changed to m=%d\n",m);
}

int main(void) {
  int global_m = 1;
  printf("global_m=%d\n",global_m);
  pass_by_reference(global_m);
  printf("global_m=%d\n",global_m);
  return 0;
}

together with the corresponding output:

global_m=1
in pass_by_reference: received m=1
in pass_by_reference: changed to m=42
global_m=42

Argument passing in Python

In Python, objects are passed as the value of a reference (think pointer) to the object. Depending on the way the reference is used in the function and depending on the type of object it references, this can result in pass-by-reference behaviour (where any changes to the object received as a function argument, are immediately reflected in the calling level).

Here are three examples to discuss this. We start by passing a list to a function which iterates through all elements in the sequence and doubles the value of each element:

In [104]:
def double_the_values(l):
    print("in double_the_values: l = %s" % l)
    for i in range(len(l)):
        l[i] = l[i] * 2
    print("in double_the_values: changed l to l = %s" % l)

l_global = [0, 1, 2, 3, 10]
print("In main: s=%s" % l_global)
double_the_values(l_global)
print("In main: s=%s" % l_global)
In main: s=[0, 1, 2, 3, 10]
in double_the_values: l = [0, 1, 2, 3, 10]
in double_the_values: changed l to l = [0, 2, 4, 6, 20]
In main: s=[0, 2, 4, 6, 20]

The variable l is a reference to the list object. The line l[i] = l[i] * 2 first evaluates the right-hand side and reads the element with index i, then multiplies this by two. A reference to this new object is then stored in the list object l at position with index i. We have thus modified the list object, that is referenced through l.

The reference to the list object does never change: the line l[i] = l[i] * 2 changes the elements l[i] of the list l but never changes the reference l for the list. Thus both the function and calling level are operating on the same object through the references l and global_l, respectively.

In contrast, here is an example where do not modify the elements of the list within the function: which produces this output:

In [105]:
def double_the_list(l):
    print("in double_the_list: l = %s" % l)
    l = l + l
    print("in double_the_list: changed l to l = %s" % l)

l_global = "Hello"
print("In main: l=%s" % l_global)
double_the_list(l_global)
print("In main: l=%s" % l_global)
In main: l=Hello
in double_the_list: l = Hello
in double_the_list: changed l to l = HelloHello
In main: l=Hello

What happens here is that during the evaluation of l = l + l a new object is created that holds l + l, and that we then bind the name l to it. In the process, we lose the references to the list object l that was given to the function (and thus we do not change the list object given to the function).

Finally, let’s look at which produces this output:

In [106]:
def double_the_value(l):
    print("in double_the_value: l = %s" % l)
    l = 2 * l
    print("in double_the_values: changed l to l = %s" % l)

l_global = 42
print("In main: s=%s" % l_global)
double_the_value(l_global)
print("In main: s=%s" % l_global)
In main: s=42
in double_the_value: l = 42
in double_the_values: changed l to l = 84
In main: s=42

In this example, we also double the value (from 42 to 84) within the function. However, when we bind the object 84 to the python name l (that is the line l = l * 2) we have created a new object (84), and we bind the new object to l. In the process, we lose the reference to the object 42 within the function. This does not affect the object 42 itself, nor the reference l_global to it.

In summary, Python’s behaviour of passing arguments to a function may appear to vary (if we view it from the pass by value versus pass by reference point of view). However, it is always call by value, where the value is a reference to the object in question, and the behaviour can be explained through the same reasoning in every case.

Performance considerations

Call by value function calls require copying of the value before it is passed to the function. From a performance point of view (both execution time and memory requirements), this can be an expensive process if the value is large. (Imagine the value is a numpy.array object which could be several Megabytes or Gigabytes in size.)

One generally prefers call by reference for large data objects as in this case only a pointer to the data objects is passed, independent of the actual size of the object, and thus this is generally faster than call-by-value.

Python’s approach of (effectively) calling by reference is thus efficient. However, we need to be careful that our function do not modify the data they have been given where this is undesired.

Inadvertent modification of data

Generally, a function should not modify the data given as input to it.

For example, the following code demonstrates the attempt to determine the maximum value of a list, and – inadvertently – modifies the list in the process:

In [107]:
def mymax(s):  # demonstrating side effect
    if len(s) == 0:
        raise ValueError('mymax() arg is an empty sequence')
    elif len(s) == 1:
        return s[0]
    else:
        for i in range(1, len(s)):
            if s[i] < s[i - 1]:
                s[i] = s[i - 1]
        return s[len(s) - 1]

s = [-45, 3, 6, 2, -1]
print("in main before caling mymax(s): s=%s" % s)
print("mymax(s)=%s" % mymax(s))
print("in main after calling mymax(s): s=%s" % s)
in main before caling mymax(s): s=[-45, 3, 6, 2, -1]
mymax(s)=6
in main after calling mymax(s): s=[-45, 3, 6, 6, 6]

The user of the mymax() function would not expect that the input argument is modified when the function executes. We should generally avoid this. There are several ways to find better solutions to the given problem:

  • In this particular case, we could use the Python in-built function max() to obtain the maximum value of a sequence.

  • If we felt we need to stick to storing temporary values inside the list [this is actually not necessary], we could create a copy of the incoming list s first, and then proceed with the algorithm (see below on Copying objects).

  • Use another algorithm which uses an extra temporary variable rather than abusing the list for this. For example:

  • We could pass a tuple (instead of a list) to the function: a tuple is immutable and can thus never be modified (this would result in an exception being raised when the function tries to write to elements in the tuple).

Copying objects

Python provides the id() function which returns an integer number that is unique for each object. (In the current CPython implementation, this is the memory address.) We can use this to identify whether two objects are the same.

To copy a sequence object (including lists), we can slice it, i.e. if a is a list, then a[:] will return a copy of a. Here is a demonstration:

In [108]:
a = list(range(10))
a
Out[108]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [109]:
b = a
b[0] = 42
a              # changing b changes a
Out[109]:
[42, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [110]:
id(a)
Out[110]:
4401052104
In [111]:
id(b)
Out[111]:
4401052104
In [112]:
c = a[:] 
id(c)          # c is a different object
Out[112]:
4401074568
In [113]:
c[0] = 100       
a              # changing c does not affect a
Out[113]:
[42, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Python’s standard library provides the copy module, which provides copy functions that can be used to create copies of objects. We could have used import copy; c = copy.deepcopy(a) instead of c = a[:].

Equality and Identity/Sameness

A related question concerns the equality of objects.

Equality

The operators <, >, ==, >=, <=, and != compare the values of two objects. The objects need not have the same type. For example:

In [114]:
a = 1.0; b = 1
type(a)
Out[114]:
float
In [115]:
type(b)
Out[115]:
int
In [116]:
a == b
Out[116]:
True

So the == operator checks whether the values of two objects are equal.

Identity / Sameness

To see check whether two objects a and b are the same (i.e. a and b are references to the same place in memory), we can use the is operator (continued from example above):

In [117]:
a is b
Out[117]:
False

Of course they are different here, as they are not of the same type.

We can also ask the id function which, according to the documentation string in Python 2.7 “Returns the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (Hint: it’s the object’s memory address.)

In [118]:
id(a)
Out[118]:
4400776752
In [119]:
id(b)
Out[119]:
4297331648

which shows that a and b are stored in different places in memory.

Example: Equality and identity

We close with an example involving lists:

In [120]:
x = [0, 1, 2]
y = x
x == y
Out[120]:
True
In [121]:
x is y
Out[121]:
True
In [122]:
id(x)
Out[122]:
4400763208
In [123]:
id(y)
Out[123]:
4400763208

Here, x and y are references to the same piece of memory, they are thus identical and the is operator confirms this. The important point to remember is that line 2 (y=x) creates a new reference y to the same list object that x is a reference for.

Accordingly, we can change elements of x, and y will change simultaneously as both x and y refer to the same object:

In [124]:
x
Out[124]:
[0, 1, 2]
In [125]:
y
Out[125]:
[0, 1, 2]
In [126]:
x is y
Out[126]:
True
In [127]:
x[0] = 100
y
Out[127]:
[100, 1, 2]
In [128]:
x
Out[128]:
[100, 1, 2]

In contrast, if we use z=x[:] (instead of z=x) to create a new name z, then the slicing operation x[:] will actually create a copy of the list x, and the new reference z will point to the copy. The value of x and z is equal, but x and z are not the same object (they are not identical):

In [129]:
x
Out[129]:
[100, 1, 2]
In [130]:
z = x[:]            # create copy of x before assigning to z
z == x              # same value
Out[130]:
True
In [131]:
z is x              # are not the same object
Out[131]:
False
In [132]:
id(z)               # confirm by looking at ids
Out[132]:
4400927624
In [133]:
id(x)
Out[133]:
4400763208
In [134]:
x
Out[134]:
[100, 1, 2]
In [135]:
z
Out[135]:
[100, 1, 2]

Consequently, we can change x without changing z, for example (continued)

In [136]:
x[0] = 42
x
Out[136]:
[42, 1, 2]
In [137]:
z
Out[137]:
[100, 1, 2]