Data Structures

Norman data structures are build on four objects: Database, Table, Field and Join. In overview, a Database is a collections of Table subclasses. Table subclasses represent a tabular data structure where each column is defined by a Field and each row is an instance of the subclass. A Join is similar to a Field, but behaves as a collection of related records:

class Branch(Table):

    # Each branch knows its parent branch
    parent = Field(index=True)

    # Children are determined on the fly by searching for matching parents.
    children = Join(parent)

AutoTable is a special type of Table which automatically creates fields dynamically. This is used in conjunction with AutoDatabase, is is particularly useful when de-serialising from a source without knowing details of data in the source.

Database

class norman.Database

Database instances act as containers of Table objects, which are identified by name. Database supports the following operations.

Operation Description
db[name] Return a Table by name
name in db Return True if a Table named name is in the database.
table in db Return True if a Table object is in the database.
iter(db) Return an iterator over Table objects in the database.

Databases are mainly provided for convenience, as a way to group related tables. Tables may beloong to multiple databases, or no database at all.

add(table)

Add a Table class to the database.

This is the same as including the database argument in the class definition. The table is returned so this can be used as a class decorator.

>>> db = Database()
>>> @db.add
... class MyTable(Table):
...     name = Field()
tablenames()

Return an list of the names of all tables managed by the database.

reset()

Delete all records from all tables in the database.

delete(record)

Delete a record from the database. This is a convenience function which simply calls record.__class__.delete(record), but also checks that the record does actually belong to the database. If not, a NormanWarning is raised, and the record is still deleted.

class norman.AutoDatabase

A subclass of Database which automatically creates AutoTable subclasses when a table is looked up by name. For example:

>>> adb = AutoDatabase()
>>> newtable = adb['NewTable']
>>> issubclass(newtable, AutoTable)
True

Apart from this, it behaves exactly the same as Database.

Tables

Tables are implemented as a class, with records as instances of the class. Accordingly, there are many class-level operations which are only applicable to a Table, and others which only apply to records. The class methods shown in Table are not visible to instances.

class norman.Table(**kwargs)

Records are created by instantiating a Table subclass. Tables are defined by subclassing Table and adding fields to it. For example:

>>> class MyTable(Table):
...     field1 = Field()
...     field2 = Field()

Field names should not start with _, as these names are generally reserved for internal use. Fields and Joins may also be added to a Table after the Table is created, but cannot be shared between tables. If a Field which already belongs to a table is assigned to another table, a copy of it is created. The same cannot be done with a Join, since the behaviour of this would be unclear.

Records are created by simply instantiating the table, optionally with field values as keyword arguments:

>>> record = MyTable(field1='value', field2='other value')

The following class methods are supported by Table objects, but not by instances. Tables also act as a collection of records, and support the following sequence operations:

Operation Description
len(t) Return the number of records in t.
iter(table) Return an iterator over all records in t.
r in t Return True if the record r is an instance of (i.e. contained by) table t. This should always return True unless the record has been deleted from the table, which usually means that it is a dangling reference which should be deleted.

Boolean operations on tables evaluate to True if the table contains any records.

_store

A Store instance used as a storage backend. This may be overridden when the class is created to use a custom Store object. Usually there is no need to use this.

hooks

A dict containing lists of callables to be run when an event occurs.

Two events are supported: validation on setting a field value and deletion, identified by keys 'validate' and 'delete' respectively. When a triggering event occurs, each hook in the list is called in order with the affected table instance as a single argument until an exception occurs. If the exception is an AssertionError it is converted to a ValueError. If no exception occurs, the event is considered to have passed, otherwise it fails and the table record rolls back to its previous state.

These hooks are called before Table.validate and Table.validate_delete, and behave in the same way. They may be set at any time, but do not affect records already created until the record is next validated.

delete([records=None])

Delete delete all instances in records. If records is omitted then all records in the table are deleted.

fields()

Return an iterator over field names in the table

class norman.AutoTable(**kwargs)

This is a special type of Table which automatically creates a new field whenever a value is assigned to an attribute which does not yet exist. This only occurs for attributes which do not start with '_'. This should be subclassed in exactly the same was as Table. Attempting to instantiate AutoTable directly will result in a TypeError being raised.

>>> class MyTable(AutoTable): pass
>>> record = MyTable(a=1)
>>> record.a
1
>>> isinstance(MyTable.a, Field)
True
>>> record.b = 2
>>> isinstance(MyTable.b, Field)
True

However:

>>> record._c = 3
>>> MyTable._c
Traceback (most recent call last):
    ...
AttributeError: '_c'

As with other Table classes, it is also possible to manually add fields or joins:

>>> MyTable.d = Field()

Records

Table instances, or records, are created by specifying field values as keyword arguments. Missing fields will use the default value (see Field). In addition to the defined fields, records have the following properties and methods.

Table._uid

This contains an id which is unique in the session.

It’s primary use is as an identity key during serialisation. Valid values are any integer except 0, or a valid uuid. The default value is calculated using uuid.uuid4 upon its first call. It is not necessary that the value be unique outside the session, unless required by the serialiser.

Table.validate()

Raise an exception if the record contains invalid data.

This is usually re-implemented in subclasses, and checks that all data in the record is valid. If not, an exception should be raised. Internal validate (e.g. uniqueness checks) occurs before this method is called, and a failure will result in a ValidationError being raised. For convenience, any AssertionError which is raised here is considered to indicate invalid data, and is re-raised as a ValidationError. This allows all validation errors (both from this function and from internal checks) to be captured in a single except statement.

Values may also be changed in the method. The default implementation does nothing.

Table.validate_delete()

Raise an exception if the record cannot be deleted.

This is called just before a record is deleted and is usually re-implemented to check for other referring instances. This method can also be used to propogate deletions and can safely modify this or other tables.

Exceptions are handled in the same was as for validate.

Notes on Validation and Deletion

Data is validated whenever a record is added or removed, and there is the opportunity to influence this process through validation hooks. When a new record is created, there are three sets of validation criteria which must pass in order for the record to actually be created. The first step is to run the validators specified in Field.validators. These can change or verify the value in each field independently of context. The second validation check is applied whenever there are unique fields, and confirms that the combination of values in unique fields in actually unique. The final stage is to run all the validation hooks in Table.hooks. These affect the entire record, and may be used to perform changes across multiple fields. If at any stage an Exception is raised, the record will not be created.

The following example illustrates how the validation occurs. When a new record is created, the value is first converted to a string by the field validator, then checked for uniqueness, and finally the validate method creates the extra parts value.

>>> class TextTable(Table):
...     'A Table of text values.'
...
...     # A text value stored in the table
...     value = Field(unique=True, validators=[str])
...     # A pre-populated, calculated value.
...     parts = Field()
...
...     def validate(self):
...         self.parts = self.value.split()
...
>>> r = TextTable(value='a string')
>>> r.value
'a string'
>>> r.parts
['a', 'string']
>>> r = TextTable(value=3)
>>> r.value
'3'
>>> r = TextTable(value='3')
Traceback (most recent call last):
    ...
norman._except.ValidationError: Not unique: TextTable(parts=['3'], value='3')

When deleting a record, Table.validate_delete is first called. This should be used to ensure that any dependent records are dealt with. For example, the following code ensures that all children are deleted when a parent is deleted.

>>> class Child(Table):
...     parent = Field()
...
>>> class Parent(Table):
...     children = Join(Child.parent)
...
...     def validate_delete(self):
...         for child in self.children:
...             Child.delete(child)
...
>>> parent = Parent()
>>> child = Child(parent=parent)
>>> Parent.delete(parent)
>>> len(Child)
0

Fields

Fields are defined inside a Table definition as class attributes, and are used as record properties for instances of a Table. If the value of a field has not been set, then the special object NotSet is used to indicate this.

norman.NotSet

A sentinel object indicating that the field value has not yet been set. This evaluates to False in conditional statements.

class norman.Field(unique=False, default=NotSet, readonly=False, validators=None, key=None)

A Field is used in tables to define attributes.

>>> class MyTable(Table):
...     name = Field()

Fields may be created with a combination of properties as keyword arguments, including default, key, readonly, unique and validators.

Fields can be used with comparison operators to return a Query object containing matching records. For example:

>>> class MyTable(Table):
...     oid = Field(unique=True)
...     value = Field()
>>> t0 = MyTable(oid=0, value=1)
>>> t1 = MyTable(oid=1, value=2)
>>> t2 = MyTable(oid=2, value=1)
>>> Table.value == 1
Query(MyTable(oid=0, value=1), MyTable(oid=2, value=1))

The following comparisons are supported for a Field object, provided the data stored supports them: ==, <, >, <=, >==, !=. The & operator is used to test for containment, e.g. `` Table.field & mylist`` returns all records where the value of field is in mylist.

See also

validate
For some pre-build validators.
Queries
For more information of queries in Norman.
default

The value to use when nothing has been set (default: NotSet).

key

A key function used for indexing, similar to that used by sorted. All values returned by this function should be sortable in the same list. For example, if the field is known to contain a mixture of strings and integers, str would be a valid function, but lambda x: x would not, since a list of strings and integers cannot be sorted. key should raise TypeError for any value it cannot handle. These will be indexed separately, so that equality lookups are still optimised, but comparisons will not be supported. As an illustrative example, consider the following case which orders values by length:

>>> class T(Table):
...     value = Field(key=len)
...
>>> t1 = T(value='abc')
>>> t2 = T(value='defg')
>>> t3 = T(value=42)
>>> (T.value > 'xxx').one()  # Find values longer than 3 characters
T(value='abc')
>>> (T.value == 42).one()  # Find the numerical value 42
T(value=42)
>>> (T.value() > 42).one()  # len(42) raises TypeError
Traceback (most recent call last)
    ...
TypeError

The default implementation orders data by type first, then value, for the following types: numbers.Real, str, bytes. This might lead to unexpected results, since 42 < 'text' will evaluate True.

NotSet values are handled slightly differently, and are never passed through this function. Comparison queries on NotSet will always fail.

name

This is the assigned name of the field and is set when it is added to the Table. This attribute is read-only.

owner

This is the owning Table of the field and is set when it is added to the Table. This attribute is read-only.

readonly

If True, prohibits setting the variable, unless its value is NotSet (default: False). This can be used with default to simulate a constant. This can be toggled to effectively lock and unlock the field.

unique

True if records should be unique on this field (default: False). If more than one field in the table have this set then they are evaluated together as a tuple. If this is set after the field is created, all existing records in the table are evaluated and a ValidationError raised if there are duplicates.

validators

A list of functions which are used as validators for the field. Each function should accept and return a single value (i.e. the value to be set), and should raise an exception if the value is invalid. The validators are called sequentially in the order specified, i.e. newvalue = validator3(validator2(validator1(oldvalue))).

Joins

A Join dynamically creates Queries for a specific record. This is best explained through an example:

>>> class Child(Table):
...     parent = Field()
...
>>> class Parent(Table):
...     children = Join(Child.parent)
...
>>> p = Parent()
>>> c1 = Child(parent=p)
>>> c2 = Child(parent=p)
>>> set(p.children) == {c1, c2}
True

In this example, Parent.children returns a Query for all Child records where child.parent == parent_instance for a specific parent_instance. Joins have a query attribute which is a Query factory function, returning a Query for a given instance of the owning table.

class norman.Join(*args, **kwargs)

Joins can be created in several ways:

Join(query=queryfactory)
Explicitly set the query factory. queryfactory is a callable which accepts a single argument (i.e. the owning record) and returns a Query.
Join(table.field)

This is the most common form, since most joins simply involve looking up a field value in another table. This is equivalent to specifying the following query factory:

def queryfactory(value):
    return table.field == value
Join(db, 'table.field`)
This has the same affect as the previous example, but is used when the foreign field has not yet been created. In this case, the query factory first locates 'table.field' in the Database db.
Join(other.join[, jointable])
It is possible set the target of a join to another join, creating a many-to-many relationship. When used in this way, a join table is automatically created, and can be accessed from Join.jointable. If the optional keyword parameter jointable is used, it is the name of the new join table.

Joins have the following attributes, all of which are read-only.

jointable

The join table in a many-to-many join.

This is None if the join is not a many-to-many join, and is read only. If a jointable does not yet exist then it is created, but not added to any database. If the two joins which define it have conflicting information, a ConsistencyError is raise.

name

This is the assigned name of the join and is set when it is added to the Table.

owner

This is the owning Table of the join and is set when it is added to the Table.

query

A function which accepts an instance of owner and returns a Query.

target

The target of the join, or None if the target cannot be found. This attribute is read only.

Exceptions and Warnings

Exceptions

class norman.NormanError

Base class for all Norman exceptions.

class norman.ConsistencyError

Raised on a fatal inconsistency in the data structure.

class norman.ValidationError

Raised when an operation resulting in table validation failing.

For now this inherits from NormanError, ValueError and TypeError to keep it backwardly compatible. This will change in version 0.7.0

Warnings

class norman.NormanWarning

Base class for all Norman warnings.

Currently all warnings use this class. In the future, this behaviour will change, and subclasses will be used.

Advanced API

Two structures, Store and Index manage the data internally. These are documented for completeness, but should seldom need to be used directly.

class norman.Store

Stores are designed to hide the implementation details and expose a consistent API, so that they can be switched out without any other changes to the table.

Tables are exposed as an array of cells, where each cell is identified by Table and Field instances. Cells are unordered, although implementations may order them internally.

The Store is tolerant of missing values. get will return defaults if the record requested does not exist. set will add a new record if the record does not exist.

add_field(field)

Called whenever a new field is added to the table.

add_record(record)

Called whenever a new record is created.

clear()

Delete all records in the store.

get(record, field)

Return the value in a cell specified by record and field. This should respect any field defaults. If this is called with a record that has not been added, it will be added.

has_record(record)

Return True if the record has an entry in the data store.

iter_field(field)

Iterate over pairs of (record, value) for the specified field. This should respect any field defaults. If this is called with a field that has not been added, the behaviour is unspecified.

iter_records()

Return an iterator over all records in the data store.

iter_unset(field)

Iterate over records which do not have a value set on field, that is, those for which store.get(record, field) will return field.default. This is used for managing indexes.

record_count()

Return the number of records in the table.

remove_field(field)

Remove a field.

remove_record(record)

Remove a record.

set(record, field, value)

Set the data in a record.

setdefault(field, value)

Called when the default value of a field in changed.

class norman.Index(field)

An index stores records as sorted lists of (keyvalue, record) pairs, where keyvalue is a key based on the data cell value, determined by the return value of Field.key, which should always return the same, sortable type. If a return value cannot be sorted, then it is stored separately by its hash, and comparisons (except for equality checks) cannot be used with it. It is is not hashable, then it is stored by id, so equality checks will actually return identity matches. Note that NotSet is handled separately, and is never evaluated with Field.key. The default Field.key returns a tuple of (type, keyvalue) for recognised types. The implementation is:

def key(value):
    if isinstance(value, numbers.Real):
        return '0Real', value
    elif isinstance(value, str):
        return '1str', value
    elif isinstance(value, bytes):
        return '2bytes', value
    else:
        raise TypeError

The following examples show a few example of how this can be used:

>>> import re
>>> from norman import Table, Field
>>> class MyTable(Table):
...    numbers = Field(key=lambda x: re.findall('\d+', x))
...
>>> r1 = MyTable(numbers='number 1, numbers 2 and 3')
>>> r2 = MyTable(numbers='45 and 46')
>>> r3 = MyTable(numbers='a, b, c = 5, 6, 7')
>>> r4 = MyTable(numbers='no numbers here')
>>> set(MyTable.numbers > 'number 3') == set((r2, r3))
True
>>> set(MyTable.numbers < '1 or 2') == set((r4,))
True
clear()

Delete all items from the index.

insert(value, record)

Insert a new item. If equal keys are found, add to the right.

remove(value, record)

Remove first occurrence of (value, record).