Prof. Andrea Gallegati
programs are transient when they:
If we run the program again, it starts with a clean slate.
contrary, programs are persistent when they:
If they shut down and restart, they pick up where they left off.
The simplest way for programs to maintain their data is by reading/writing text files.
An alternative, is to store the state of the program in a database.
Text file are sequences of characters stored on a permanent medium.
To write a file, open it with 'w'
mode as parameter:
fout = open('output.txt', 'w')
... be careful!
If the file already exists: (opening in write mode) it clears out the old data and starts fresh.
If the file doesn’t exist, a new one is created.
open
returns a file object with methods for working with the file.
The write
method puts data into the file:
line1 = "This here's the wattle,\n"
fout.write(line1)
24
and returns the number of written characters.
The file object keeps track of where it is, to add data at the end of the file when we call write
again.
line2 = "the emblem of our land...\n"
fout.write(line2)
26
When done, close the file.
fout.close()
otherwise, it gets closed when the program ends.
%
¶The argument of write has to be a string.
To put other values in a file, we have to convert them to strings (with str
).
x = 52
fout.write(str(x))
2
An alternative is to use the format operator, %
.
%
is the modulus operator.The format string (first operand) contains format sequences to specify how to fortmat the strings (second operand):
camels = 42
'%d' % camels
'42'
this format sequence '%d'
is to format the second operand as a decimal integer.
The result is always a string. Thus, '42'
is not to be confused with the integer value 42
.
Format sequences can appear anywhere in the string, to embed values in a sentence:
'I have spotted %d camels.' % camels
'I have spotted 42 camels.'
With more than one format sequence, the second argument has to be a tuple to match each one (in order).
'In %d years I have spotted %g %s.' % (3, 0.1, 'camels')
'In 3 years I have spotted 0.1 camels.'
here we use
'%d'
to format an integer'%g'
to format a floating-point number'%s'
to format a stringhas to match the number and types of format sequences in the string.
'%d %d %d' % (1, 2)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-21-76a835ca51c7> in <module>() ----> 1 '%d %d %d' % (1, 2) TypeError: not enough arguments for format string
here there aren’t enough elements
'%d' % 'dollars'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-192ac7d0084e> in <module>() ----> 1 '%d' % 'dollars' TypeError: %d format: a number is required, not str
while here the element is the wrong type.
A more powerful alternative is the string format method
"The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'
to perform string formatting operations on the string it is called on.
The string can contain
{}
Each {}
contains
It returns a copy of the string replacing each {}
with the corresponding argument (string value).
Files are organized into directories (aka “folders”).
Every running program has a “current directory”: the default directory for most of its operations.
... opening a file for reading, Python
looks for it in the current directory.
The os
module (for “operating system”) provides functions for working with files and directories.
os.getcwd
(aka * “current working directory”) returns the current directory* name:
import os
cwd = os.getcwd()
cwd
'/data/CIS1051-python/lectures/notebooks'
A string like this, identifying files/directories, is a path.
Simple filenames are considered relative paths, being related to the current directory.
If the path begins with /
it is an absolute path (not depending on the current directory).
To find it we can use
os.path.abspath('README.md')
'/data/CIS1051-python/lectures/notebooks/README.md'
os.path
provides other functions for working with filenames/paths.
os.path.exists('README.md')
True
this checks whether a file or a directory exists.
os.path.isdir('README.md')
False
If it exists, this checks whether it’s a directory
os.path.isdir('/data/CIS1051-python/lectures/notebooks')
True
or a file
os.path.isfile('README.md')
True
os.listdir("./doc")
['document.txt', 'funny_document.txt', 'words.txt']
this returns a list of files/directories in the given directory.
def walk(dirname):
for name in os.listdir(dirname):
path = os.path.join(dirname, name)
if os.path.isfile(path):
print(path)
else:
walk(path)
walk(cwd + "/../../lab-sessions/snake/challenge/lab_5")
/data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_00/fruit.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_00/game.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_00/main.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_00/snake.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_00/wall.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_01/fruit.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_01/game.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_01/main.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_01/snake.py /data/CIS1051-python/lectures/notebooks/../../lab-sessions/snake/challenge/lab_5/level_01/wall.py
this example “walks” through a directory to print the files names and calls itself recursively on all its directories.
os.path.join
takes
and joins them into a complete path.
The os
module already provides a similar (but more versatile) function walk
When trying to read/write files a lot of things can go wrong.
fin = open('bad_file')
FileNotFoundError: [Errno 2] No such file or directory: 'bad_file'
opening a file that doesn’t exist, we get a FileNotFoundError
fout = open('/etc/passwd', 'w')
PermissionError: [Errno 13] Permission denied: '/etc/passwd'
or without the necessary permissions to access it:
fin = open('/home')
IsADirectoryError: [Errno 21] Is a directory: '/home'
or opening a directory for reading it!
To avoid all these errors, it would take a lot of
[Errno 21]
indicates there are at least `21` things that can go wrong!
Better trying to go ahead and deal with problems as they happen.
This is exactly what the try statement does.
The syntax is similar to an if...else
statement.
try:
fin = open('bad_file')
except:
print('Something went wrong.')
Something went wrong.
Python
starts by executing the try clause:
To handle an exception with a try statement is called catching an exception.
Here, the except clause is not that helpful (just a print).
In general, this gives us a chance to:
(aka DB) is a file organized for storing data.
Many are organized like a dictionary, they map from keys to values.
However, a DB persists after the program ends (on permanent storage).
The dbm
module provides an interface for:
database files.
Let's create a DB containining captions for image files.
Opening a DB is similar to opening other files:
import dbm
db = dbm.open('captions', 'c')
The mode 'c'
is to create a database object if it doesn’t exist yet.
The result is to be used (for most operations) like a dictionary.
db['cleese.png'] = 'Photo of John Cleese.'
creating a new item, dbm
updates the database file.
db['cleese.png']
b'Photo of John Cleese.'
accessing one item, dbm
reads the file.
The result is a bytes object (begins with b
), similar to a string in many ways.
db['cleese.png'] = 'Photo of John Cleese doing a silly walk.'
db['cleese.png']
b'Photo of John Cleese doing a silly walk.'
making another assignment (existing key), dbm
replaces the old value.
Some dictionary methods don’t work with database objects, but iteration works:
for key in db:
print(key, db[key])
b'cleese.png' b'Photo of John Cleese doing a silly walk.'
Close the DB when done (similarly to files).
db.close()
A limitation of dbm
is that:
have to be strings or bytes.
With any other type, we get an error.
The pickle
module helps, translating almost any type of object into a string (suitable for storage in a DB).
pickle.dumps
(short for “dump string”) serializes Python
objects into a binary string representation.Then it translates these strings back into objects.
pickle.loads
(short for “load string”) deserializes the binary string representation back into the original Python
object.import pickle
t = [1, 2, 3]
pickle.dumps(t)
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
The format isn’t obvious to human readers: it is meant to be easy for pickle
to interpret.
t1 = [1, 2, 3]
s = pickle.dumps(t1)
t2 = pickle.loads(s)
t2
[1, 2, 3]
This new object has the same value as the old, but it is not (in general) the same object!
... pickling and then unpickling has the same effect as copying the object.
Thanks to pickle
we can store non-strings objects in a dbm
object. This combination has already been encapsulated in the shelve
module!
A Shelf
is a persistent, dictionary-like object.
Contrary to dbm
databases, here the values (not the keys!) can be arbitrary Python
objects, while the keys are ordinary strings.
In general, the process of:
In a few words, serialization/deserialization are the processes of converting back and forth an object into a stream of bytes.
The pickle
module format is not the only one, some other like:
json
yaml
provide alternative methods.
However, it is sometimes preferred for its ability to maintain the state of complex objects/data structures.
Most OS provide a CLI (aka a shell), providing for commands to navigate the file system and launch applications.
Any of these commands is executable from Python
, using a pipe
object (representing a running program).
cmd = 'ls -l ../../lab-sessions/snake/challenge/lab_5/level_01'
fp = os.popen(cmd)
this executes the Unix command ls -l
to display the given directory content.
res = fp.read()
print(res)
total 20 -rwxrwxrwx 1 0 root 330 Apr 4 06:03 fruit.py -rwxrwxrwx 1 0 root 3440 Apr 4 06:03 game.py -rwxrwxrwx 1 0 root 7105 Apr 4 06:03 main.py -rwxrwxrwx 1 0 root 958 Apr 4 06:03 snake.py -rwxrwxrwx 1 0 root 830 Apr 4 06:03 wall.py
We can get the ls
process output:
readline
methodread
methodClose the pipe like a file, when done:
stat = fp.close()
print(stat)
None
Its return value is the final status of the ls
process.
None
means no errors.
Note, popen
is now deprecated: one is supposed to stop using it and start using the subprocess
module.
import subprocess
cmd = ['ls', '-l', '../../lab-sessions/snake/challenge/lab_5/level_01']
# create a subprocess - capture output and error using a pipe
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate() # read these streams
print(out.decode('utf-8'))
total 20 -rwxrwxrwx 1 0 root 330 Apr 4 06:03 fruit.py -rwxrwxrwx 1 0 root 3440 Apr 4 06:03 game.py -rwxrwxrwx 1 0 root 7105 Apr 4 06:03 main.py -rwxrwxrwx 1 0 root 958 Apr 4 06:03 snake.py -rwxrwxrwx 1 0 root 830 Apr 4 06:03 wall.py
For simple cases, the subprocess
module is more complicated than necessary.
In Unix systems the md5sum
command computes a “checksum” based on file contents.
filename = 'output.txt'
cmd = 'md5sum ' + filename
fp = os.popen(cmd)
res = fp.read()
print(res)
stat = fp.close()
d41d8cd98f00b204e9800998ecf8427e output.txt
It's almost impossible that different contents yield the same checksum:
This is an efficient way to check whether two files have the **same contents**