##########################
ZODB Persistent Components
##########################

Most Zope components live in the Zope Object DataBase (ZODB).
Components that are stored in ZODB are said to be *persistent*.
Creating persistent components is, for the most part, a trivial
exercise, but ZODB does impose a few rules that persistent components
must obey in order to work properly.  This chapter describes the
persistence model and the interfaces that persistent objects can use
to live inside the ZODB.

Persistent Objects
==================

Persistent objects are Python objects that live for a long time. Most
objects are created when a program is run and die when the program
finishes. Persistent objects are not destroyed when the program ends,
they are saved in a database.

A great benefit of persistent objects is their transparency.  As a
developer, you do not need to think about loading and unloading the
state of the object from memory.  Zope's persistent machinery handles
all of that for you.

This is also a great benefit for application designers; you do not
need to create your own kind of "data format" that gets saved to a
file and reloaded again when your program stops and starts.  Zope's
persistence machinery works with *any* kind of Python objects (within
the bounds of a few simple rules) and as your types of objects grow,
your database simply grows transparently with it.

Persistence Example
-------------------

Here is a simple example of using ZODB outside of Zope.  If all you
plan on doing is using persistent objects with Zope, you can skip
this section if you wish.

The first thing you need to do to start working with ZODB is to
create a "root object".  This process involves first opening a
"storage" , which is the actual backend storage location for your
data.

ZODB supports many pluggable storage back-ends, but for the purposes
of this article we're going to show you how to use the 'FileStorage'
back-end storage, which stores your object data in a file.  Other
storages include storing objects in relational databases, Berkeley
databases, and a client to server storage that stores objects on a
remote storage server.


To set up a ZODB, you must first install it.  ZODB comes with Zope,
so the easiest way to install ZODB is to install Zope and use the
ZODB that comes with your Zope installation.  For those of you who
don't want all of Zope, but just ZODB, see the instructions for
downloading ZODB from the `ZODB web page
<http://wiki.zope.org/ZODB>`_.

After installing ZODB, you can start to experiment with it right from
the Python command line interpreter.  If you've installed Zope,
before running this set of commands, shut down your Zope server, and
"cd" to the "lib/python" directory of your Zope instance.  If you're
using a "standalone" version of ZODB, you likely don't need to do
this, and you'll be able to use ZODB by importing it from a standard
Python package directory.  In either case, try the following set of
commands::

  chrism@saints:/opt/zope/lib/python$ python
  Python 2.1.1 (#1, Aug  8 2001, 21:17:50) 
  [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2
  Type "copyright", "credits" or "license" for more information. 
  >>> from ZODB import FileStorage, DB
  >>> storage = FileStorage.FileStorage('mydatabase.fs')
  >>> db = DB( storage )
  >>> connection = db.open()
  >>> root = connection.root()

Here, you create storage and use the 'mydatabse.fs' file to store the
object information.  Then, you create a database that uses that
storage.


Next, the database needs to be "opened" by calling the 'open()'
method.  This will return a connection object to the database.  The
connection object then gives you access to the 'root' of the database
with the 'root()' method.

The 'root' object is the dictionary that holds all of your persistent
objects.  For example, you can store a simple list of strings in the
root object::

      root['employees'] = ['Bob', 'Mary', 'Jo']

Now, you have changed the persistent database by adding a new object,
but this change is so far only temporary.  In order to make the
change permanent, you must commit the current transaction::

      get_transaction().commit()

Transactions are ways to make a lot of changes in one atomic
operation.  In a later article, we'll show you how this is a very
powerful feature.  For now, you can think of committing transactions
as "checkpoints" where you save the changes you've made to your
objects so far.  Later on, we'll show you how to abort those changes,
and how to undo them after they are committed.

If you had used a relational database, you would have had to issue a
SQL query to save even a simple python list like the above example.
You would have also needed some code to convert a SQL query back into
the list when you wanted to use it again.  You don't have to do any
of this work when using ZODB.  Using ZODB is almost completely
transparent, in fact, ZODB based programs often look suspiciously
simple!

Working with simple python types is useful, but the real power of
ZODB comes out when you store your own kinds of objects in the
database.  For example, consider a class that represents a employee::

  from Persistence import Persistent

  class Employee(Persistent):

      def setName(self, name):
          self.name = name


Calling 'setName' will set a name for the employee.  Now, you can put
Employee objects in your database::

  for name in ['Bob', 'Mary', 'Joe']:
      employee = Employee()
      employee.setName(name)
      root['employees'].append(employee)

  get_transaction().commit()

Don't forget to call 'commit()', so that the changes you have made so
far are committed to the database, and a new transaction is begun.

Persistent Rules
================

There are a few rules that must be followed when your objects are
persistent.

- Your objects, and their attributes, must be "pickleable".

- Your object cannot have any attributes that begin with '_p_'.

- Attributes of your object that begin with '_v_' are "volatile" and
  are not saved to the database (see next section).

- You must explicitly signal any changes made to mutable attributes
  (such as instances, lists, and dictionaries) or use persistent
  versions of mutable objects, like 'ZODB.PersistentMapping' (see
  below for more information on 'PersistentMapping'.)

In this section, we'll look at each of these special rules one by
one.

The first rules says that your objects must be pickleable.  This
means that they can be serialized into a data format with the
"pickle" module.  Most python data types (numbers, lists,
dictionaries) can be pickled.  Code objects (method, functions,
classes) and file objects (files, sockets) *cannot* be pickled.
Instances can be persistent objects if:

- They subclass 'Persistence.Persistent'

- All of their attributes are pickleable

The second rule is that none of your objects attributes can begin
with '_p_'.  For example, '_p_b_and_j' would be an illegal object
attribute.  This is because the persistence machinery reserves all of
these names for its own purposes.

The third rule is that all object attributes that begin with '_v_'
are "volatile" and are not saved to the database.  This means that as
long as the persistent object is in Zope memory cache, volatile
attributes can be used.  When the object is deactivated (removed from
memory) volatile attributes are thrown away.

Volatile attributes are useful for data that is good to cache for a
while but can often be thrown away and easily recreated.  File
connections, cached calculations, rendered templates, all of these
kinds of things are useful applications of volatile attributes. You
must exercise care when using volatile attributes.  Since you have
little control over when your objects are moved in and out of memory,
you never know when your volatile attributes may disappear.

The fourth rule is that you must signal changes to mutable types.
This is because persistent objects can't detect when mutable types
change, and therefore, doesn't know whether or not to save the
persistent object or not.

For example, say you had a list of names as an attribute of your
object called 'departments' that you changed in a method called
'addDepartment'::

  class DepartmentManager(Persistent):

      def __init__(self):
          self.departments = []

      def addDepartment(self, department):
          self.departments.append(department)

When you call the 'addDepartment' method you change a mutable type,
'departments' but your persistent object will not save that change.

There are two solutions to this problem.  First, you can assign a
special flag, '_p_changed'::

  def addDepartment(self, department):
      self.department.append(department)
      self._p_changed = 1

Remember, '_p_' attributes do something special to the persistence
machinery and are reserved names. Assigning 1 to '_p_changed' tells
the persistence machinery that you changed the object, and that it
should be saved.

Another technique is to use the mutable attribute as though it were
immutable. In other words, after you make changes to a mutable
object, reassign it::

  def addDepartment(self, department):
      departments = self.departments
      departments.append(department)
      self.department = departments

Here, the 'self.departments' attribute was re-assigned at the end of
the function to the "working copy" object 'departments'.  This
technique is cleaner because it doesn't have any explicit
'_p_changed' settings in it, but this implicit triggering of the
persistence machinery should always be understood, otherwise use the
explicit syntax.

A final option is to use persistence-aware mutable attributes such as
'PersistentMapping', and 'IOBTree'. 'PersistentMapping' is a mapping
class that notifies ZODB when you change the mapping. You can use
instances of 'PersistentMapping' in place of standard Python
dictionaries and not worry about signaling change by reassigning the
attribute or using '_p_changed'. Zope's Btree classes are also
persistent-aware mutable containers. This solution can be cleaner
than using mutable objects immutably, or signaling change manually
assuming that there is a persistence-aware class available that meets
your needs.

Transactions and Persistent Objects
===================================

When changes are saved to ZODB, they are saved in a *transaction*.
This means that either all changes are saved, or none are saved.  The
reason for this is data consistency.  Imagine the following scenario:

1. A user makes a credit card purchase at the sandwich.com website.

2. The bank debits their account.

3. An electronic payment is made to sandwich.com.

Now imagine that an error happens during the last step of this
process, sending the payment to sandwich.com.  Without transactions,
this means that the account was debited, but the payment never went
to sandwich.com!  Obviously this is a bad situation.  A better
solution is to make all changes in a transaction:

1. A user makes a credit card purchase at the sandwich.com website.

2. The transaction begins

3. The bank debits their account.

4. An electronic payment is made to sandwich.com.

5. The transaction commits

Now, if an error is raised anywhere between steps 2 and 5, *all*
changes made are thrown away, so if the payment fails to go to
sandwich.com, the account won't be debited, and if debiting the
account raises an error, the payment won't be made to sandwich.com,
so your data is always consistent.

When using your persistent objects with Zope, Zope will automatically
*begin* a transaction when a web request is made, and *commit* the
transaction when the request is finished.  If an error occurs at any
time during that request, then the transaction is *aborted*, meaning
all the changes made are thrown away.

If you want to *intentionally* abort a transaction in the middle of a
request, then just raise an error at any time.  For example, this
snippet of Python will raise an error and cause the transaction to
abort::

  raise SandwichError('Not enough peanut butter.')

A more likely scenario is that your code will raise an exception when
a problem arises. The great thing about transactions is that you
don't have to include cleanup code to catch exceptions and undo
everything you've done up to that point. Since the transaction is
aborted the changes made in the transaction will not be saved.

Because Zope does transaction management for you, most of the time
you do not need to explicitly begin, commit or abort your own
transactions.  For more information on doing transaction management
manually, see the links at the end of this chapter that lead to more
detailed tutorials of doing your own ZODB programming.


Subtransactions
---------------

Zope waits until the transaction is committed to save all the changes
to your objects.  This means that the changes are saved in memory.
If you try to change more objects than you have memory in your
computer, your computer will begin to swap and thrash, and maybe even
run you out of memory completely.  This is bad. The easiest solution
to this problem is to not change huge quantities of data in one
transaction.

If you need to spread a transaction out of lots of data, however, you
can use subtransactions.  Subtransactions allow you to manage Zope's
memory usage yourself, so as to avoid swapping during large
transactions.

Subtransactions allow you to make huge transactions. Rather than
being limited by available memory, you are limited by available disk
space.  Each subtransaction commit writes the current changes out to
disk and frees memory to make room for more changes.

To commit a subtransaction, you first need to get a hold of a
transaction object.  Zope adds a function to get the transaction
objects in your global namespace, 'get_transaction', and then call
'commit(1)' on the transaction::

  get_transaction().commit(1)

You must balance speed, memory, and temporary storage concerns when
deciding how frequently to commit subtransactions. The more
subtransactions, the less memory used, the slower the operation, and
the more temporary space used. Here's and example of how you might
use subtransactions in your Zope code::

  tasks_per_subtransaction = 10
  i = 0
  for task in tasks:
      process(task)
      i = i + 1
      if i % tasks_per_subtransaction == 0:
          get_transaction().commit(1)

This example shows how to commit a subtransaction at regular
intervals while processing a number of tasks.

Threads and Conflict Errors
---------------------------

Zope is a multi-threaded server.  This means that many different
clients may be executing your Python code in different threads.  For
most cases, this is not an issue and you don't need to worry about
it, but there are a few cases you should look out for.

The first case involves threads making lots of changes to objects and
writing to the database.  The way ZODB and threading works is that
each thread that uses the database gets its own *connection* to the
database.  Each connection gets its own *copy* of your object.  All
of the threads can read and change any of the objects.  ZODB keeps
all of these objects synchronized between the threads. The upshot is
that you don't have to do any locking or thread synchronization
yourself. Your code can act as though it is single threaded.

However, synchronization problems can occur when objects are changed
by two different threads at the same time.

Imagine that thread 1 gets its own copy of object A, as does thread
2.  If thread 1 changes its copy of A, then thread 2 will not see
those changes until thread 1 commits them.  In cases where lots of
objects are changing, this can cause thread 1 and 2 to try and commit
changes to object 1 at the same time.

When this happens, ZODB lets one transaction do the commit (it
"wins") and raises a 'ConflictError' in the other thread (which
"looses"). The looser can elect to try again, but this may raise yet
another 'ConflictError' if many threads are trying to change object
A. Zope does all of its own transaction management and will retry a
losing transaction three times before giving up and raising the
'ConflictError' all the way up to the user.


Resolving Conflicts
-------------------

If a conflict happens, you have two choices. The first choice is that
you live with the error and you try again. Statistically, conflicts
are going to happen, but only in situations where objects are
"hot-spots".  Most problems like this can be "designed away"; if you
can redesign your application so that the changes get spread around
to many different objects then you can usually get rid of the hot
spot.


Your second choice is to try and *resolve* the conflict. In many
situations, this can be done. For example, consider the following
persistent object::

  class Counter(Persistent):

      self.count = 0

      def hit(self):
          self.count = self.count + 1

This is a simple counter.  If you hit this counter with a lot of
requests though, it will cause conflict errors as different threads
try to change the count attribute simultaneously.

But resolving the conflict between conflicting threads in this case
is easy.  Both threads want to increment the self.count attribute by
a value, so the resolution is to increment the attribute by the sum
of the two values and make both commits happy; no 'ConflictError' is
raised.


To resolve a conflict, a class should define an '_p_resolveConflict'
method. This method takes three arguments.

'oldState' -- The state of the object that the changes made by the
current transaction were based on. The method is permitted to modify
this value.

'savedState' -- The state of the object that is currently stored in
the database. This state was written after 'oldState' and reflects
changes made by a transaction that committed before the current
transaction. The method is permitted to modify this value.

'newState' -- The state after changes made by the current
transaction.  The method is *not* permitted to modify this
value. This method should compute a new state by merging changes
reflected in 'savedState' and 'newState', relative to 'oldState'.

The method should return the state of the object after resolving the
differences.

Here is an example of a '_p_resolveConflict' in the 'Counter' class::

  class Counter(Persistent):

      self.count = 0

      def hit(self):
          self.count = self.count + 1

      def _p_resolveConflict(self, oldState, savedState, newState):

          # Figure out how each state is different:
          savedDiff= savedState['count'] - oldState['count']
          newDiff= newState['count']- oldState['count']

          # Apply both sets of changes to old state:
          oldState['count'] = oldState['count'] + savedDiff + newDiff

          return oldState

In the above example, '_p_resolveConflict' resolves the difference
between the two conflicting transactions.

Threadsafety of Non-Persistent Objects
======================================

ZODB takes care of threadsafety for persistent objects. However, you
must handle threadsafey yourself for non-persistent objects which are
shared between threads.

Mutable Default Arguments
-------------------------

One tricky type of non-persistent, shared objects are mutable default
arguments to functions, and methods.  Default arguments are useful
because they are cached for speed, and do not need to be recreated
every time the method is called.  But if these cached default
arguments are mutable, one thread may change (mutate) the object when
another thread is using it, and that can be bad.  So, code like::

        def foo(bar=[]):
            bar.append('something')


Could get in trouble if two threads execute this code because lists
are mutable.  There are two solutions to this problem:

- Don't use mutable default arguments. (Good)

- If you use them, you cannot change them.  If you want to change
  them, you will need to implement your own locking. (Bad)

We recommend the first solution because mutable default arguments are
confusing, generally a bad idea in the first place.

Shared Module Data
------------------

Objects stored in modules but not in the ZODB are not persistent and
not-thread safe. In general it's not a good idea to store data (as
opposed to functions, and class definitions) in modules when using
ZODB.


If you decide to use module data which can change you'll need to
protect it with a lock to ensure that only one thread at a time can
make changes.


For example::

  from threading import Lock
  queue=[]
  l=Lock()

  def put(obj):
      l.acquire()
      try:
          queue.append(obj)
      finally:
          l.release()

  def get():
      l.acquire()
      try:
          return queue.pop()
      finally:
          l.release()

Note, in most cases where you are tempted to use shared module data,
you can likely achieve the same result with a single persistent
object. For example, the above queue could be replaced with a single
instance of this class::

  class Queue(Persistent):

      def __init__(self):
          self.list=[]

      def put(self, obj):
          self.list=self.list + [obj]

      def get(self):
          obj=self.list[-1]
          self.list=self.list[0:-1]
          return obj

Notice how this class uses the mutable object 'self.list'
immutably. If this class used 'self.list.pop' and 'self.list.append',
then the persistence machinary would not notice that 'self.list' had
changed.

Shared External Resources
=========================

A final category of data for which you'll need to handle
thread-safety is external resources such as files in the filesystem,
and other processes. In practice, these concerns rarely come up.

Other ZODB Resources
====================

This chapter has only covered the most important features of ZODB
from a Zope developer's perspective. Check out some of these sources
for more in depth information:

- Andrew Kuchling's `ZODB pages <http://www.kuchling.com/zodb/>`_
  include lots of information included a programmer's guide and links
  to ZODB mailing lists.

- `ZODB Wiki <http://wiki.zope.org/ZODB>`_ has information about
  current ZODB projects.

- `ZODB UML
  Model <http://www.zope.org/Documentation/Developer/Models/ZODB>`_ has
  the nitty gritty details on ZODB.

- Paper `Introduction to the Zope Object Database
  <http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html>`_
  by Jim Fulton, presented at the 8th Python Conference.

Summary
=======

The ZODB is a complex and powerful system. However using persistent
objects is almost completely painless. Seldom do you need to concern
yourself with thread safety, transactions, conflicts, memory
management, and database replication. ZODB takes care of these things
for you. By following a few simple rules you can create persistent
objects that just work.