XOS>> hnf>> 返回
项目作者: flyconnectome

项目描述 :
Documentation for the hierarchical neuron format
高级语言:
项目地址: git://github.com/flyconnectome/hnf.git
创建时间: 2021-02-15T13:35:41Z
项目社区:https://github.com/flyconnectome/hnf

开源协议:GNU General Public License v3.0

下载


Hierarchical Neuron Format

The Hierarchical Neuron Format (HNF) is a schema for storing neuron morphologies
and meta data in Hdf5 files.

We provide read/write implementations for R and Python:

Preamble

There are a few file formats that can store neuron morphology. To name but a few:

  • SWC
    for simple skeletons
  • NeuroML is an XML-based format
    primarily used for modelling but can store compartment models (i.e. skeletons)
    of neurons and meta data
  • NWB (neurodata without borders) is an HDF5-based format
    focused on physiological data
  • NRRD files can be used to
    store dotprops

Why then start a new format?

Because none of the existing formats tick all the boxes! We need a file format
that can hold:

  1. thousands of neurons
  2. multiple representations (mesh, skeleton, dotprops) of a given neuron
  3. annotations (e.g. synapses) associated with each neuron
  4. meta data such as names, soma positions, etc.

Enter HDF5: basically a filesystem-in-a-file. The important thing is that we
don’t have to worry about how data is en-/decoded because other libraries
(like h5py for Python or hdf5r for R) take care of that. All we have to
do is come up with a schema.

Schema

HDF5 knows “groups” (=folders), “datasets” and “attributes”. The basic idea for
our schema is this:

  • the root contains info about the format as attributes
  • each group in root represents a neuron and the group’s name is the neuron’s ID
  • a neuron group holds and meta data, and the neuron’s representations (mesh,
    skeleton and/or dotprops) and annotations in separate sub-groups

To illustrate the basic principle:

  1. .
  2. ├── attrs: format-related meta data
  3. ├── group: neuron1
  4. ├── attrs: neuron-related meta data
  5. ├── group: skeleton
  6. | ├── attrs: skeleton-related meta data
  7. | | └── datasets: node table, etc
  8. ├── group: dotprops
  9. | ├── attrs: dotprops-related meta data
  10. | | └── datasets: points, tangents, alpha, etc
  11. ├── group: mesh
  12. | ├── attrs: mesh-related meta data
  13. | | └── datasets: vertices, faces, etc
  14. | └── group: annotations
  15. | └── group: e.g. connectors
  16. | ├── attrs: connector-related meta data
  17. | └── datasets: connector data
  18. ├── group: neuron2
  19. | ├── ...
  20. ...

Root attributes

The root meta data must contain two attributes:

  • format_spec specifies format and version
  • format_url points to a library or format specifications
  1. .
  2. ├── attr['format_spec']: str = 'hnf_v1'
  3. ├── attr['format_url']: str = 'https://github.com/schlegelp/navis'
  4. ...

Neuron base groups

Each neuron group contains properties that apply to all the neuron’s potential
representations - for example a neuron_name. Note that if an attribute is
defined at the neuron level and again at a deeper level (i.e. the skeleton,
mesh or dotprops), the more proximal attribute takes precedence for a given
representation.

  1. .
  2. └── group['123456'] # note that numeric IDs will be "stringified"
  3. ├── attr["neuron_name"]: str = "some name"
  4. ...

Skeletons

Attributes:

  • units_nm (float | int | tuple, optional): specifies the units in
    nanometer space - can be a tuple of (x, y, z) if units are
    non-isotropic
  • soma (int, optional): the node ID of the soma

Datasets:

  • node_id (int): IDs for the nodes
  • parent_id (int): for each node, the ID of it’s parent; nodes with
    out parents (i.e. roots) have parent_id of -1
  • x, y, z (float | int): node coordinates
  • radius (float | int, optional): radius for each node
  1. └── group['123456']
  2. ├── attr['neuron_name'] = "example neuron with a skeleton"
  3. ├── attr['units_nm'] = (4, 4, 40)
  4. └── grp['skeleton']
  5. ├── attr['soma']: 1
  6. ├── ds['node_id']: (N, ) array
  7. ├── ds['parent_id']: (N, ) array
  8. ├── ds['x']: (N, ) array
  9. ├── ds['y']: (N, ) array
  10. ├── ds['z']: (N, ) array
  11. └── ds['radius']: (N, ) array, optional

Meshes

Meshes are principally represented as vertices + triangular faces (navis
is using trimesh under the hood).

Attributes:

  • units_nm (float | int | tuple, optional): specifies the units in
    nanometer space - can be a tuple of (x, y, z) if units are
    non-isotropic
  • soma (tuple, optional): tuple of (x, y, z) coordinates of the soma

Datasets:

  • vertices (int | float): (N, 3) array of vertex positions
  • faces (int): (M, 3) array of vertex indices forming the faces (indices start
    at 0)
  • skeleton_map (int, optional): (N, ) array mapping each vertex to a
    node ID in the skeleton
  1. └── group['4353421']
  2. ├── attr['neuron_name'] = "example neuron with a mesh"
  3. ├── attr['units_nm'] = (4, 4, 40)
  4. └── grp['mesh']
  5. ├── attr['soma']: (1242, 6533, 400)
  6. ├── ds['vertices']: (N, 3) array
  7. ├── ds['faces']: (M, 3) array
  8. └── ds['skeleton_map']: (N, ) array, optional

Dotprops

Attributes:

  • k (int): number of k-nearest neighbours used to calculate the tangent
    vectors from the point cloud
  • units_nm (float | int | tuple, optional): specifies the units in
    nanometer space - can be a tuple of (x, y, z) if units are
    non-isotropic
  • soma (tuple, optional): tuple of (x, y, z) coordinates of the soma

Datasets:

  • points (int | float): (N, 3) array of x/y/z positions
  • vect (int | float, optional): (N, 3) array of tangent vectors -
    generated if not provided
  • alpha (int | float, optional): (N, ) array of alpha values for each
    point in points generated if not provided
  1. └── group['65432']
  2. ├── attr['neuron_name'] = "example neuron with dotprops"
  3. └── grp['dotprops']
  4. ├── attr['k'] = 5
  5. ├── attr['units_nm'] = (4, 4, 40)
  6. ├── attr['soma']: (1242, 6533, 400)
  7. ├── ds['points']: (N, 3) array
  8. ├── ds['vect']: (N, 3) array
  9. └── ds['alpha']: (N, ) array

Annotations

Annotations are meant to be flexible and are principally parsed into
pandas DataFrames. Because they won’t follow a common format, it is
good practice to leave some (optional) meta data pointing to columns
containing data relevant for e.g. plotting:

Attributes:

  • point_col (str | list thereof): pointer to the column(s) containing
    x/y/z positions
  • type_col (str): pointer to a column specifying types
  • skeleton_map (str): pointer to a column associating the row with
    a node ID in the skeleton

Let’s illustrate this with a mock synapse table:

  1. └── group['32434566']
  2. ├── attr['neuron_name'] = "example neuron with synapse annotations"
  3. ├── attr['units_nm'] = 1
  4. └── grp['annotations']
  5. └── grp['synapses']
  6. ├── attr['points']: ['x', 'y', 'z']
  7. ├── attr['types']: 'prepost'
  8. ├── attr['skeleton_map']: 'node_id'
  9. ├── ds['x']: (N, ) array
  10. ├── ds['x']: (N, ) array
  11. ├── ds['z']: (N, ) array
  12. ├── ds['prepost']: (N, ) array of [0, 1, 2, 3, 4]
  13. └── ds['node_id']: (N, )

“Hidden” attributes & datasets

It can be useful to have attributes and datasets that contain information that’s
only pertinent for the reader/writer but does not directly relate to the neuron.

For this, we prefix the attribute/dataset with a .:

  1. └── group['4353421']
  2. ├── attr['neuron_name'] = "example neuron with a mesh"
  3. ├── attr['units_nm'] = (4, 4, 40)
  4. ├── attr['.hidden_attribute'] = "typically ignored when reading"
  5. └── grp['mesh']
  6. ├── attr['soma']: (1242, 6533, 400)

We use hidden attributes to e.g. store a serialized version of a neuron instead/
in addition to the raw data to speed up reading the data.

A final remark

The above schema describes a “minimal” layout - i.e. we expect no less
data than that. However, e.g. the navis implementations for reading/writing
the schema are flexible: you can add more attributes or datasets
and navis will by default try to read and attach them to the neuron.

Is this stable?

Ish? The format is versioned and I will maintain readers/writers for
past versions in navis. In other good news: the HDF5 backend is
stable - so even if navis acts up when parsing your file, you can
always read it manually using h5py.

Changelog

The current version of the format is 1.0.

Changes:

  • 2021/02/01: Version 1.0