CIS 1051 - Temple Rome Spring 2023¶

Intro to Problem solving and¶

Programming in Python¶

LOGO

LOGO

Strings¶

Prof. Andrea Gallegati

( tuj81353@temple.edu )

A String is a Sequence¶

are not like integers, floats, and booleans.

Strings are sequences: an ordered collection of other values.

We can access the characters one at a time with the bracket operator:

In [2]:
fruit = 'banana'
letter = fruit[1]

The expression in brackets is called an index: which character in the sequence you want (hence the name).

But we might not get what we expect!

In [3]:
letter
Out[3]:
'a'
  • For most people, the first letter of 'banana' is b, not a.
  • For computer scientists, the index is an offset from the beginning of the string.

... and the offset of the first letter is zero!

In [4]:
letter = fruit[0]
letter
Out[4]:
'b'
  • b is the 0th letter (“zero-eth”) of 'banana'
  • a is the 1th letter (“one-eth”) of 'banana'
  • n is the 2th letter (“two-eth”) of 'banana'

We can use an expression with variables and operators for the index.

In [6]:
i = 1
fruit[i]
Out[6]:
'a'
In [7]:
fruit[i+1]
Out[7]:
'n'

But it has to be an integer, otherwise ...

In [8]:
letter = fruit[1.5]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-2bd018f0559d> in <module>()
----> 1 letter = fruit[1.5]

TypeError: string indices must be integers

len¶

a built-in function that returns the number of characters in a string.

In [9]:
fruit = 'banana'
len(fruit)
Out[9]:
6

To get the last letter, we might be tempted by:

In [10]:
length = len(fruit)
last = fruit[length]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-15d81308bbc3> in <module>()
      1 length = len(fruit)
----> 2 last = fruit[length]

IndexError: string index out of range

But there is no letter in 'banana' with index 6.

If we start counting at zero, the six letters are numbered 0 to 5.

To get the last character

In [11]:
last = fruit[length-1]
last
Out[11]:
'a'

We can use negative indices, which count backward from the end of the string:

  • fruit[-1] yields the last letter
  • fruit[-2] yields the second to last
  • and so on...

Traversal with a for Loop¶

Some algorithms (e.g. Cryptography) involve processing a string one character at a time:

  • they start at the beginning
  • select each character in turn
  • do something to it
  • continue until the end

This pattern of processing is called a traversal.

One way to write a traversal is with a while loop:

In [12]:
index = 0
while index < len(fruit):
    letter = fruit[index]
    print(letter)
    index = index + 1
b
a
n
a
n
a
  • The loop condition is index < len(fruit) thus, when index == len(fruit) the condition is false.
  • The last character of the string, accessed, is the one with index == len(fruit) - 1.

Another way to write a traversal is with a for loop:

In [13]:
for letter in fruit:
    print(letter)
b
a
n
a
n
a

Each iteration, the next character is assigned to the variable letter, until no characters are left.

We can use concatenation (string addition) and a for loop to generate an abecedarian series:

In [14]:
prefixes = 'JKLMNOPQ'
suffix = 'ack'

for letter in prefixes:
    print(letter + suffix)
Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack

where the names of the ducklings are in alphabetical order.

String Slices¶

segments of a string. Selecting a slice is similar to selecting a character:

In [15]:
s = 'Monty Python'
s[0:5]
Out[15]:
'Monty'
In [16]:
s[6:12]
Out[16]:
'Python'

The operator [n:m] returns the part of the string:

  • from the “n-eth” character, included
  • to the “m-eth” character, excluded

This behavior is counterintuitive.

It might help to imagine indices in between one character and the other.

(Draw a banana as if it is a python string, for example an array of characters)

Omitting the first index (before the colon :), the slice starts at the beginning of the string:

In [17]:
fruit = 'banana'
fruit[:3]
Out[17]:
'ban'

Omitting the second index (after the colon :), the slice goes to the end of the string:

In [18]:
fruit[3:]
Out[18]:
'ana'

First index greater than/equal to the second, results in an empty string:

In [19]:
fruit[3:3]
Out[19]:
''

no characters (length = 0), but other than that, it is the same as any other string!

Omitting both indices (between the colon :) results in the whole string:

In [20]:
fruit[:]
Out[20]:
'banana'

Strings Are Immutable¶

We might be tempted to use the [] operator on the left side of an assignment, to change a character in a string:

In [21]:
greeting = 'Hello, world!'
greeting[0] = 'J'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-230884f26c44> in <module>()
      1 greeting = 'Hello, world!'
----> 2 greeting[0] = 'J'

TypeError: 'str' object does not support item assignment
  • The object here is the string.
  • The item is the character we tried to assign.

For now, an object is the same thing as a value, but we will refine that definition later!

The reason for this error is that strings are immutable: we can’t change an existing string.

We can create a new string that is a variation on the original, with no effect on the original one:

In [22]:
greeting = 'Hello, world!'
new_greeting = 'J' + greeting[1:]
new_greeting
Out[22]:
'Jello, world!'

Searching¶

In [23]:
def find(word, letter):
    index = 0
    while index < len(word):
        if word[index] == letter:
            return index
        index = index + 1
    return -1

In some sense, the find function here above, is the inverse of the [] operator.

It takes a character and finds the index where that character appears.

Otherwise, it returns -1.

  • With a return statement inside the loop, the function breaks out of the loop and returns immediately.
  • Otherwise, if the character is never matched, the program exits and returns -1.

This is a search pattern: traversing a sequence and returning, as soon as an occurrence is found.

Looping and Counting¶

In [26]:
word = 'banana' ; count = 0
for letter in word:
    if letter == 'a':
        count = count + 1
print(count)
3

This is a counter pattern:

  • it counts the occurrences of a in a given string.
  • count is initialized to 0 and incremented each time an occurence is found.
  • finally count contains the total number of a.

We can encapsulate this in a generalized count function, using the find function above.

In [47]:
def find(word, letter, index):
    while index < len(word):
        if word[index] == letter:
            return index
        index += 1
    return -1

def count(word, target_letter):
    count = 0 ; start = 0
    while True:
        start = find(word, target_letter, start)
        if start == -1:
            break
        count += 1 ; start += 1
    return count

word = 'banana' ; target_letter = 'a'
count(word, target_letter)
Out[47]:
3

String Methods¶

A method is similar to a function: it takes arguments and returns a value.

... the syntax is different:

the upper method (returning all uppercase letters) uses the method syntax word.upper() rather than the function syntax upper(word)

In [49]:
word = 'banana'
word.upper()
Out[49]:
'BANANA'

With this dot notation we specify:

  • the name of the method we are invoking
  • the name of the string the method is invoked on

Actually, there was already a built-in find string method, similar to our function:

In [50]:
word = 'banana'
word.find('a')
Out[50]:
1

we are invoking find on word passing the letter to match as a parameter.

It is more general than our function; we can find substrings:

In [51]:
word.find('na')
Out[51]:
2

... pass a second (optional) argument: where to start from (0 by default)

In [52]:
word.find('na', 3)
Out[52]:
4

or even pass a third (optional) argument: where to stop

In [53]:
name = 'bob'
name.find('b', 1, 2)
Out[53]:
-1

find is consistent with the slice operator: the above search starts from 1 to 2, not including 2, thus fails.

The in String Operator¶

a boolean operator that takes two strings and returns True if the first appears as a substring in the second:

In [54]:
'a' in 'banana'
Out[54]:
True
In [55]:
'seed' in 'banana'
Out[55]:
False

This function prints all the letters, within the first word, that also appears in the second one:

In [56]:
def in_both(word1, word2):
    for letter in word1:
        if letter in word2:
            print(letter)

With well-chosen variable names, it reads like English:

“for (each) letter in (the first) word, if (the) letter (appears) in (the second) word, print (the) letter.”

In [57]:
in_both('apples', 'oranges')
a
e
s

String Comparison¶

relational operators work on strings too

  • to see if two strings are equal
In [58]:
if word == 'banana':
    print('All right, bananas.')
All right, bananas.
  • to put words in alphabetical order
In [61]:
def order(word):
    if word < 'banana':
        print('Your word, ' + word + ', comes before banana.')
    elif word > 'banana':
        print('Your word, ' + word + ', comes after banana.')
    else:
        print('All right, bananas.')
        
order('apples')
order('oranges')
Your word, apples, comes before banana.
Your word, oranges, comes after banana.

... but pay attention!

for Python uppercase letters come before lowercase ones.

In [62]:
order('Pineapple')
Your word, Pineapple, comes before banana.

Convert strings to a standard format (e.g. all lowercase) before comparison.

"Keep that in mind in case you have to defend yourself against a man armed with a Pineapple."

chatGPT explanation:

"This joke is likely referencing the classic British comedy sketch show Monty Python's Flying Circus. In one of their sketches, a group of police officers are engaged in a training exercise where they are asked to defend themselves against a man armed with a banana. The absurdity of the situation, as well as the deadpan delivery of the actors, makes for a hilarious scene."

Debugging¶

Using indices to traverse it's tricky to get the beginning and end right!

This is a function to compare words: to return True if one of the words is the reverse of the other:

In [ ]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    
    i = 0
    j = len(word2)

    while j > 0:
        if word1[i] != word2[j]:
            return False
        i = i+1
        j = j-1

    return True

... but it contains two errors.

  • first if statement checks whether the words are the same length. If not, we can return False immediately.

This is an example of the guardian pattern in “Checking Types”.

  • i and j are indices to travers forward/backward the two words. If two letters don’t match, we can return False immediately.

Testing this function with: “pots” and “stop”, we expect the return value True, but:

In [3]:
is_reverse('pots', 'stop')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-675f5fef8649> in <module>()
----> 1 is_reverse('pots', 'stop')

<ipython-input-1-81f6c8cd6a39> in is_reverse(word1, word2)
      7 
      8     while j > 0:
----> 9         if word1[i] != word2[j]:
     10             return False
     11         i = i+1

IndexError: string index out of range

Let's print indices immediately before the error line.

In [9]:
is_reverse('pots', 'stop')
0 4
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-675f5fef8649> in <module>()
----> 1 is_reverse('pots', 'stop')

<ipython-input-8-090d5148bbb0> in is_reverse(word1, word2)
      8     while j > 0:
      9         print(i, j)
---> 10         if word1[i] != word2[j]:
     11             return False
     12         i = i+1

IndexError: string index out of range

initial value of j is 4 (out of range) for 'pots' whose last character 3 is len(word2) - 1.

In [10]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    
    i = 0
    j = len(word2) - 1

    while j > 0:
        print(i, j)
        if word1[i] != word2[j]:
            return False
        i = i+1
        j = j-1

    return True
In [11]:
is_reverse('pots', 'stop')
0 3
1 2
2 1
Out[11]:
True

Right answer, but the loop only ran three times (suspicious).

Let's draw a state diagram:

... the second error was in the while condition. This is clearly False as soon as j == 0

In [13]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    
    i = 0
    j = len(word2) - 1

    while j >= 0:
        print(i, j)
        if word1[i] != word2[j]:
            return False
        i = i+1
        j = j-1

    return True
In [14]:
is_reverse('pots', 'stop')
0 3
1 2
2 1
3 0
Out[14]:
True