Prof. Andrea Gallegati
are not like integers, floats, and booleans.
Strings are sequences: an ordered collection of other values.
We can access the characters one at a time with the bracket operator:
fruit = 'banana'
letter = fruit[1]
The expression in brackets is called an index: which character in the sequence you want (hence the name).
But we might not get what we expect!
letter
'a'
'banana'
is b
, not a
.... and the offset of the first letter is zero!
letter = fruit[0]
letter
'b'
b
is the 0th letter (“zero-eth”) of 'banana'
a
is the 1th letter (“one-eth”) of 'banana'
n
is the 2th letter (“two-eth”) of 'banana'
We can use an expression with variables and operators for the index.
i = 1
fruit[i]
'a'
fruit[i+1]
'n'
But it has to be an integer, otherwise ...
letter = fruit[1.5]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-2bd018f0559d> in <module>() ----> 1 letter = fruit[1.5] TypeError: string indices must be integers
a built-in function that returns the number of characters in a string.
fruit = 'banana'
len(fruit)
6
To get the last letter, we might be tempted by:
length = len(fruit)
last = fruit[length]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-10-15d81308bbc3> in <module>() 1 length = len(fruit) ----> 2 last = fruit[length] IndexError: string index out of range
But there is no letter in 'banana'
with index 6
.
If we start counting at zero, the six letters are numbered 0 to 5.
To get the last character
last = fruit[length-1]
last
'a'
We can use negative indices, which count backward from the end of the string:
fruit[-1]
yields the last letterfruit[-2]
yields the second to lastSome algorithms (e.g. Cryptography) involve processing a string one character at a time:
This pattern of processing is called a traversal.
One way to write a traversal is with a while loop:
index = 0
while index < len(fruit):
letter = fruit[index]
print(letter)
index = index + 1
b a n a n a
index < len(fruit)
thus, when index == len(fruit)
the condition is false
.index == len(fruit) - 1
.Another way to write a traversal is with a for loop:
for letter in fruit:
print(letter)
b a n a n a
Each iteration, the next character is assigned to the variable letter
, until no characters are left.
We can use concatenation (string addition) and a for loop to generate an abecedarian series:
prefixes = 'JKLMNOPQ'
suffix = 'ack'
for letter in prefixes:
print(letter + suffix)
Jack Kack Lack Mack Nack Oack Pack Qack
where the names of the ducklings are in alphabetical order.
segments of a string. Selecting a slice is similar to selecting a character:
s = 'Monty Python'
s[0:5]
'Monty'
s[6:12]
'Python'
The operator [n:m]
returns the part of the string:
This behavior is counterintuitive.
It might help to imagine indices in between one character and the other.
(Draw a banana as if it is a python string, for example an array of characters)
Omitting the first index (before the colon :
), the slice starts at the beginning of the string:
fruit = 'banana'
fruit[:3]
'ban'
Omitting the second index (after the colon :
), the slice goes to the end of the string:
fruit[3:]
'ana'
First index greater than/equal to the second, results in an empty string:
fruit[3:3]
''
no characters (length = 0
), but other than that, it is the same as any other string!
Omitting both indices (between the colon :
) results in the whole string:
fruit[:]
'banana'
We might be tempted to use the []
operator on the left side of an assignment, to change a character in a string:
greeting = 'Hello, world!'
greeting[0] = 'J'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-21-230884f26c44> in <module>() 1 greeting = 'Hello, world!' ----> 2 greeting[0] = 'J' TypeError: 'str' object does not support item assignment
For now, an object is the same thing as a value, but we will refine that definition later!
The reason for this error is that strings are immutable: we can’t change an existing string.
We can create a new string that is a variation on the original, with no effect on the original one:
greeting = 'Hello, world!'
new_greeting = 'J' + greeting[1:]
new_greeting
'Jello, world!'
def find(word, letter):
index = 0
while index < len(word):
if word[index] == letter:
return index
index = index + 1
return -1
In some sense, the find
function here above, is the inverse of the []
operator.
It takes a character and finds the index where that character appears.
Otherwise, it returns -1
.
-1
.This is a search pattern: traversing a sequence and returning, as soon as an occurrence is found.
word = 'banana' ; count = 0
for letter in word:
if letter == 'a':
count = count + 1
print(count)
3
This is a counter pattern:
a
in a given string.count
is initialized to 0
and incremented each time an occurence is found.count
contains the total number of a
.We can encapsulate this in a generalized count
function, using the find
function above.
def find(word, letter, index):
while index < len(word):
if word[index] == letter:
return index
index += 1
return -1
def count(word, target_letter):
count = 0 ; start = 0
while True:
start = find(word, target_letter, start)
if start == -1:
break
count += 1 ; start += 1
return count
word = 'banana' ; target_letter = 'a'
count(word, target_letter)
3
A method is similar to a function: it takes arguments and returns a value.
... the syntax is different:
the upper
method (returning all uppercase letters) uses the method syntax word.upper()
rather than the function syntax upper(word)
word = 'banana'
word.upper()
'BANANA'
With this dot notation we specify:
Actually, there was already a built-in find
string method, similar to our function:
word = 'banana'
word.find('a')
1
we are invoking find
on word
passing the letter to match as a parameter.
It is more general than our function; we can find substrings:
word.find('na')
2
... pass a second (optional) argument: where to start from (0
by default)
word.find('na', 3)
4
or even pass a third (optional) argument: where to stop
name = 'bob'
name.find('b', 1, 2)
-1
find
is consistent with the slice operator: the above search starts from 1
to 2
, not including 2
, thus fails.
a boolean operator that takes two strings and returns True
if the first appears as a substring in the second:
'a' in 'banana'
True
'seed' in 'banana'
False
This function prints all the letters, within the first word, that also appears in the second one:
def in_both(word1, word2):
for letter in word1:
if letter in word2:
print(letter)
With well-chosen variable names, it reads like English:
“for (each) letter in (the first) word, if (the) letter (appears) in (the second) word, print (the) letter.”
in_both('apples', 'oranges')
a e s
relational operators work on strings too
if word == 'banana':
print('All right, bananas.')
All right, bananas.
def order(word):
if word < 'banana':
print('Your word, ' + word + ', comes before banana.')
elif word > 'banana':
print('Your word, ' + word + ', comes after banana.')
else:
print('All right, bananas.')
order('apples')
order('oranges')
Your word, apples, comes before banana. Your word, oranges, comes after banana.
... but pay attention!
for Python
uppercase letters come before lowercase ones.
order('Pineapple')
Your word, Pineapple, comes before banana.
Convert strings to a standard format (e.g. all lowercase) before comparison.
"Keep that in mind in case you have to defend yourself against a man armed with a Pineapple."
chatGPT explanation:
"This joke is likely referencing the classic British comedy sketch show Monty Python's Flying Circus. In one of their sketches, a group of police officers are engaged in a training exercise where they are asked to defend themselves against a man armed with a banana. The absurdity of the situation, as well as the deadpan delivery of the actors, makes for a hilarious scene."
Using indices to traverse it's tricky to get the beginning and end right!
This is a function to compare words: to return True
if one of the words is the reverse of the other:
def is_reverse(word1, word2):
if len(word1) != len(word2):
return False
i = 0
j = len(word2)
while j > 0:
if word1[i] != word2[j]:
return False
i = i+1
j = j-1
return True
... but it contains two errors.
if
statement checks whether the words are the same length. If not, we can return False
immediately.This is an example of the guardian pattern in “Checking Types”.
i
and j
are indices to travers forward/backward the two words. If two letters don’t match, we can return False
immediately.Testing this function with: “pots” and “stop”, we expect the return value True
, but:
is_reverse('pots', 'stop')
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-3-675f5fef8649> in <module>() ----> 1 is_reverse('pots', 'stop') <ipython-input-1-81f6c8cd6a39> in is_reverse(word1, word2) 7 8 while j > 0: ----> 9 if word1[i] != word2[j]: 10 return False 11 i = i+1 IndexError: string index out of range
Let's print indices immediately before the error line.
is_reverse('pots', 'stop')
0 4
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-9-675f5fef8649> in <module>() ----> 1 is_reverse('pots', 'stop') <ipython-input-8-090d5148bbb0> in is_reverse(word1, word2) 8 while j > 0: 9 print(i, j) ---> 10 if word1[i] != word2[j]: 11 return False 12 i = i+1 IndexError: string index out of range
initial value of j
is 4
(out of range) for 'pots'
whose last character 3
is len(word2) - 1
.
def is_reverse(word1, word2):
if len(word1) != len(word2):
return False
i = 0
j = len(word2) - 1
while j > 0:
print(i, j)
if word1[i] != word2[j]:
return False
i = i+1
j = j-1
return True
is_reverse('pots', 'stop')
0 3 1 2 2 1
True
Right answer, but the loop only ran three times (suspicious).
Let's draw a state diagram:
... the second error was in the while condition. This is clearly False
as soon as j == 0
def is_reverse(word1, word2):
if len(word1) != len(word2):
return False
i = 0
j = len(word2) - 1
while j >= 0:
print(i, j)
if word1[i] != word2[j]:
return False
i = i+1
j = j-1
return True
is_reverse('pots', 'stop')
0 3 1 2 2 1 3 0
True