Data Structures¶
Contents
Norman data structures are build on four objects: Database
, Table
, Field
and Join
. In overview, a Database
is a collections of Table
subclasses.
Table
subclasses represent a tabular data structure where each column is
defined by a Field
and each row is an instance of the subclass. A Join
is similar to a Field
, but behaves as a collection of related records:
class Branch(Table):
# Each branch knows its parent branch
parent = Field(index=True)
# Children are determined on the fly by searching for matching parents.
children = Join(parent)
AutoTable
is a special type of Table
which automatically creates
fields dynamically. This is used in conjunction with AutoDatabase
,
is is particularly useful when de-serialising from a source without knowing
details of data in the source.
Database¶
-
class
norman.
Database
¶ Database
instances act as containers ofTable
objects, which are identified by name.Database
supports the following operations.Operation Description db[name]
Return a Table
by namename in db
Return True
if aTable
named name is in the database.table in db
Return True
if aTable
object is in the database.iter(db)
Return an iterator over Table
objects in the database.Databases are mainly provided for convenience, as a way to group related tables. Tables may beloong to multiple databases, or no database at all.
-
add
(table)¶ Add a
Table
class to the database.This is the same as including the database argument in the class definition. The table is returned so this can be used as a class decorator.
>>> db = Database() >>> @db.add ... class MyTable(Table): ... name = Field()
-
tablenames
()¶ Return an list of the names of all tables managed by the database.
-
reset
()¶ Delete all records from all tables in the database.
-
delete
(record)¶ Delete a record from the database. This is a convenience function which simply calls record.__class__.delete(record), but also checks that the record does actually belong to the database. If not, a
NormanWarning
is raised, and the record is still deleted.
-
-
class
norman.
AutoDatabase
¶ A subclass of
Database
which automatically createsAutoTable
subclasses when a table is looked up by name. For example:>>> adb = AutoDatabase() >>> newtable = adb['NewTable'] >>> issubclass(newtable, AutoTable) True
Apart from this, it behaves exactly the same as
Database
.
Tables¶
Tables are implemented as a class, with records as instances of the class.
Accordingly, there are many class-level operations which are only applicable
to a Table
, and others which only apply to records. The class methods shown
in Table
are not visible to instances.
-
class
norman.
Table
(**kwargs)¶ Records are created by instantiating a
Table
subclass. Tables are defined by subclassingTable
and addingfields
to it. For example:>>> class MyTable(Table): ... field1 = Field() ... field2 = Field()
Field
names should not start with_
, as these names are generally reserved for internal use.Fields
andJoins
may also be added to aTable
after theTable
is created, but cannot be shared between tables. If aField
which already belongs to a table is assigned to another table, a copy of it is created. The same cannot be done with aJoin
, since the behaviour of this would be unclear.Records are created by simply instantiating the table, optionally with field values as keyword arguments:
>>> record = MyTable(field1='value', field2='other value')
The following class methods are supported by
Table
objects, but not by instances. Tables also act as a collection of records, and support the following sequence operations:Operation Description len(t)
Return the number of records in t
.iter(table)
Return an iterator over all records in t
.r in t
Return True
if the recordr
is an instance of (i.e. contained by) tablet
. This should always returnTrue
unless the record has been deleted from the table, which usually means that it is a dangling reference which should be deleted.Boolean operations on tables evaluate to
True
if the table contains any records.-
_store
¶ A
Store
instance used as a storage backend. This may be overridden when the class is created to use a customStore
object. Usually there is no need to use this.
-
hooks
¶ A
dict
containing lists of callables to be run when an event occurs.Two events are supported: validation on setting a field value and deletion, identified by keys
'validate'
and'delete'
respectively. When a triggering event occurs, each hook in the list is called in order with the affected table instance as a single argument until an exception occurs. If the exception is anAssertionError
it is converted to aValueError
. If no exception occurs, the event is considered to have passed, otherwise it fails and the table record rolls back to its previous state.These hooks are called before
Table.validate
andTable.validate_delete
, and behave in the same way. They may be set at any time, but do not affect records already created until the record is next validated.
-
delete
([records=None])¶ Delete delete all instances in records. If records is omitted then all records in the table are deleted.
-
fields
()¶ Return an iterator over field names in the table
-
-
class
norman.
AutoTable
(**kwargs)¶ This is a special type of
Table
which automatically creates a new field whenever a value is assigned to an attribute which does not yet exist. This only occurs for attributes which do not start with'_'
. This should be subclassed in exactly the same was asTable
. Attempting to instantiateAutoTable
directly will result in aTypeError
being raised.>>> class MyTable(AutoTable): pass >>> record = MyTable(a=1) >>> record.a 1 >>> isinstance(MyTable.a, Field) True >>> record.b = 2 >>> isinstance(MyTable.b, Field) True
However:
>>> record._c = 3 >>> MyTable._c Traceback (most recent call last): ... AttributeError: '_c'
As with other
Table
classes, it is also possible to manually add fields or joins:>>> MyTable.d = Field()
Records¶
Table instances, or records, are created by specifying field values as
keyword arguments. Missing fields will use the default value (see Field
).
In addition to the defined fields, records have the following properties and
methods.
-
Table.
_uid
¶ This contains an id which is unique in the session.
It’s primary use is as an identity key during serialisation. Valid values are any integer except 0, or a valid
uuid
. The default value is calculated usinguuid.uuid4
upon its first call. It is not necessary that the value be unique outside the session, unless required by the serialiser.
-
Table.
validate
()¶ Raise an exception if the record contains invalid data.
This is usually re-implemented in subclasses, and checks that all data in the record is valid. If not, an exception should be raised. Internal validate (e.g. uniqueness checks) occurs before this method is called, and a failure will result in a
ValidationError
being raised. For convenience, anyAssertionError
which is raised here is considered to indicate invalid data, and is re-raised as aValidationError
. This allows all validation errors (both from this function and from internal checks) to be captured in a single except statement.Values may also be changed in the method. The default implementation does nothing.
-
Table.
validate_delete
()¶ Raise an exception if the record cannot be deleted.
This is called just before a record is deleted and is usually re-implemented to check for other referring instances. This method can also be used to propogate deletions and can safely modify this or other tables.
Exceptions are handled in the same was as for
validate
.
Notes on Validation and Deletion¶
Data is validated whenever a record is added or removed, and there is the
opportunity to influence this process through validation hooks. When a
new record is created, there are three sets of validation criteria which
must pass in order for the record to actually be created. The first step
is to run the validators specified in Field.validators
. These can change
or verify the value in each field independently of context. The second
validation check is applied whenever there are unique fields, and confirms
that the combination of values in unique fields in actually unique. The
final stage is to run all the validation hooks in Table.hooks
. These affect
the entire record, and may be used to perform changes across multiple fields.
If at any stage an Exception is raised, the record will not be created.
The following example illustrates how the validation occurs. When a new
record is created, the value is first converted to a string by the field
validator, then checked for uniqueness, and finally the validate
method creates the extra parts value.
>>> class TextTable(Table):
... 'A Table of text values.'
...
... # A text value stored in the table
... value = Field(unique=True, validators=[str])
... # A pre-populated, calculated value.
... parts = Field()
...
... def validate(self):
... self.parts = self.value.split()
...
>>> r = TextTable(value='a string')
>>> r.value
'a string'
>>> r.parts
['a', 'string']
>>> r = TextTable(value=3)
>>> r.value
'3'
>>> r = TextTable(value='3')
Traceback (most recent call last):
...
norman._except.ValidationError: Not unique: TextTable(parts=['3'], value='3')
When deleting a record, Table.validate_delete
is first called. This
should be used to ensure that any dependent records are dealt with. For
example, the following code ensures that all children are deleted when
a parent is deleted.
>>> class Child(Table):
... parent = Field()
...
>>> class Parent(Table):
... children = Join(Child.parent)
...
... def validate_delete(self):
... for child in self.children:
... Child.delete(child)
...
>>> parent = Parent()
>>> child = Child(parent=parent)
>>> Parent.delete(parent)
>>> len(Child)
0
Fields¶
Fields are defined inside a Table
definition as class attributes, and
are used as record properties for instances of a Table
. If the value of
a field has not been set, then the special object NotSet
is used to
indicate this.
-
norman.
NotSet
¶ A sentinel object indicating that the field value has not yet been set. This evaluates to
False
in conditional statements.
-
class
norman.
Field
(unique=False, default=NotSet, readonly=False, validators=None, key=None)¶ A
Field
is used in tables to define attributes.>>> class MyTable(Table): ... name = Field()
Fields may be created with a combination of properties as keyword arguments, including
default
,key
,readonly
,unique
andvalidators
.Fields can be used with comparison operators to return a
Query
object containing matching records. For example:>>> class MyTable(Table): ... oid = Field(unique=True) ... value = Field() >>> t0 = MyTable(oid=0, value=1) >>> t1 = MyTable(oid=1, value=2) >>> t2 = MyTable(oid=2, value=1) >>> Table.value == 1 Query(MyTable(oid=0, value=1), MyTable(oid=2, value=1))
The following comparisons are supported for a
Field
object, provided the data stored supports them:==
,<
,>
,<=
,>==
,!=
. The&
operator is used to test for containment, e.g. `` Table.field & mylist`` returns all records where the value offield
is inmylist
.-
key
¶ A key function used for indexing, similar to that used by
sorted
. All values returned by this function should be sortable in the same list. For example, if the field is known to contain a mixture of strings and integers,str
would be a valid function, butlambda x: x
would not, since a list of strings and integers cannot be sorted.key
should raiseTypeError
for any value it cannot handle. These will be indexed separately, so that equality lookups are still optimised, but comparisons will not be supported. As an illustrative example, consider the following case which orders values by length:>>> class T(Table): ... value = Field(key=len) ... >>> t1 = T(value='abc') >>> t2 = T(value='defg') >>> t3 = T(value=42) >>> (T.value > 'xxx').one() # Find values longer than 3 characters T(value='abc') >>> (T.value == 42).one() # Find the numerical value 42 T(value=42) >>> (T.value() > 42).one() # len(42) raises TypeError Traceback (most recent call last) ... TypeError
The default implementation orders data by type first, then value, for the following types:
numbers.Real
,str
,bytes
. This might lead to unexpected results, since42 < 'text'
will evaluate True.NotSet
values are handled slightly differently, and are never passed through this function. Comparison queries onNotSet
will always fail.
-
name
¶ This is the assigned name of the field and is set when it is added to the
Table
. This attribute is read-only.
-
owner
¶ This is the owning
Table
of the field and is set when it is added to theTable
. This attribute is read-only.
-
readonly
¶ If
True
, prohibits setting the variable, unless its value isNotSet
(default:False
). This can be used withdefault
to simulate a constant. This can be toggled to effectively lock and unlock the field.
-
unique
¶ True
if records should be unique on this field (default:False
). If more than one field in the table have this set then they are evaluated together as a tuple. If this is set after the field is created, all existing records in the table are evaluated and aValidationError
raised if there are duplicates.
-
validators
¶ A list of functions which are used as validators for the field. Each function should accept and return a single value (i.e. the value to be set), and should raise an exception if the value is invalid. The validators are called sequentially in the order specified, i.e.
newvalue = validator3(validator2(validator1(oldvalue)))
.
-
Joins¶
A Join
dynamically creates Queries for a specific record. This is best
explained through an example:
>>> class Child(Table):
... parent = Field()
...
>>> class Parent(Table):
... children = Join(Child.parent)
...
>>> p = Parent()
>>> c1 = Child(parent=p)
>>> c2 = Child(parent=p)
>>> set(p.children) == {c1, c2}
True
In this example, Parent.children
returns a Query
for all Child
records where child.parent == parent_instance
for a specific
parent_instance
. Joins have a query
attribute which is a Query
factory function, returning a Query
for a given instance of the owning table.
-
class
norman.
Join
(*args, **kwargs)¶ Joins can be created in several ways:
Join(query=queryfactory)
- Explicitly set the query factory.
queryfactory
is a callable which accepts a single argument (i.e. the owning record) and returns aQuery
. Join(table.field)
This is the most common form, since most joins simply involve looking up a field value in another table. This is equivalent to specifying the following query factory:
def queryfactory(value): return table.field == value
Join(db, 'table.field`)
- This has the same affect as the previous example, but is used when the
foreign field has not yet been created. In this case, the query
factory first locates
'table.field'
in theDatabase
db
. Join(other.join[, jointable])
- It is possible set the target of a join to another join, creating a
many-to-many
relationship. When used in this way, a join table is automatically
created, and can be accessed from
Join.jointable
. If the optional keyword parameter jointable is used, it is the name of the new join table.
Joins have the following attributes, all of which are read-only.
-
jointable
¶ The join table in a many-to-many join.
This is
None
if the join is not a many-to-many join, and is read only. If a jointable does not yet exist then it is created, but not added to any database. If the two joins which define it have conflicting information, aConsistencyError
is raise.
Exceptions and Warnings¶
Exceptions¶
-
class
norman.
NormanError
¶ Base class for all Norman exceptions.
-
class
norman.
ConsistencyError
¶ Raised on a fatal inconsistency in the data structure.
-
class
norman.
ValidationError
¶ Raised when an operation resulting in table validation failing.
For now this inherits from
NormanError
,ValueError
andTypeError
to keep it backwardly compatible. This will change in version 0.7.0
Advanced API¶
Two structures, Store
and Index
manage the data internally. These are
documented for completeness, but should seldom need to be used directly.
-
class
norman.
Store
¶ Stores are designed to hide the implementation details and expose a consistent API, so that they can be switched out without any other changes to the table.
Tables are exposed as an array of cells, where each cell is identified by
Table
andField
instances. Cells are unordered, although implementations may order them internally.The Store is tolerant of missing values.
get
will return defaults if the record requested does not exist.set
will add a new record if the record does not exist.-
add_field
(field)¶ Called whenever a new field is added to the table.
-
add_record
(record)¶ Called whenever a new record is created.
-
clear
()¶ Delete all records in the store.
-
get
(record, field)¶ Return the value in a cell specified by record and field. This should respect any field defaults. If this is called with a record that has not been added, it will be added.
-
has_record
(record)¶ Return True if the record has an entry in the data store.
-
iter_field
(field)¶ Iterate over pairs of
(record, value)
for the specified field. This should respect any field defaults. If this is called with a field that has not been added, the behaviour is unspecified.
-
iter_records
()¶ Return an iterator over all records in the data store.
-
iter_unset
(field)¶ Iterate over records which do not have a value set on field, that is, those for which
store.get(record, field)
will returnfield.default
. This is used for managing indexes.
-
record_count
()¶ Return the number of records in the table.
-
remove_field
(field)¶ Remove a field.
-
remove_record
(record)¶ Remove a record.
-
set
(record, field, value)¶ Set the data in a record.
-
setdefault
(field, value)¶ Called when the default value of a field in changed.
-
-
class
norman.
Index
(field)¶ An index stores records as sorted lists of
(keyvalue, record)
pairs, where keyvalue is a key based on the data cell value, determined by the return value ofField.key
, which should always return the same, sortable type. If a return value cannot be sorted, then it is stored separately by its hash, and comparisons (except for equality checks) cannot be used with it. It is is not hashable, then it is stored byid
, so equality checks will actually return identity matches. Note thatNotSet
is handled separately, and is never evaluated withField.key
. The defaultField.key
returns a tuple of(type, keyvalue)
for recognised types. The implementation is:def key(value): if isinstance(value, numbers.Real): return '0Real', value elif isinstance(value, str): return '1str', value elif isinstance(value, bytes): return '2bytes', value else: raise TypeError
The following examples show a few example of how this can be used:
>>> import re >>> from norman import Table, Field >>> class MyTable(Table): ... numbers = Field(key=lambda x: re.findall('\d+', x)) ... >>> r1 = MyTable(numbers='number 1, numbers 2 and 3') >>> r2 = MyTable(numbers='45 and 46') >>> r3 = MyTable(numbers='a, b, c = 5, 6, 7') >>> r4 = MyTable(numbers='no numbers here') >>> set(MyTable.numbers > 'number 3') == set((r2, r3)) True >>> set(MyTable.numbers < '1 or 2') == set((r4,)) True
-
clear
()¶ Delete all items from the index.
-
insert
(value, record)¶ Insert a new item. If equal keys are found, add to the right.
-
remove
(value, record)¶ Remove first occurrence of
(value, record)
.
-