项目作者: mims-harvard

项目描述 :
scikit-fusion: Data fusion via collective latent factor models
高级语言: Python
项目地址: git://github.com/mims-harvard/scikit-fusion.git
创建时间: 2015-04-27T15:18:49Z
项目社区:https://github.com/mims-harvard/scikit-fusion

开源协议:Other

下载


scikit-fusion

build: passing
BSD license

scikit-fusion is a Python module for data fusion and learning over heterogeneous datasets. The core of scikit-fusion are recent collective latent factor models and large-scale joint matrix factorization algorithms.

[News:] Fast CPU and GPU-accelerated implementatons of some of our methods.

[News:] Scikit-fusion, collective latent factor models, matrix factorization for data fusion and learning over hetnets.

[News:] fastGNMF, fast implementation of graph-regularized non-negative matrix factorization using Facebook FAISS.



Dependencies

scikit-fusion is tested to work under Python 3.

The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12,
PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.

Install

This package uses distutils, which is the default way of installing
python modules. To install in your home directory, use:

  1. python setup.py install --user

To install for all users on Unix/Linux:

  1. python setup.py build
  2. sudo python setup.py install

For development mode use:

  1. python setup.py develop

Use

Let’s generate three random data matrices describing three different object types:

  1. >>> import numpy as np
  2. >>> R12 = np.random.rand(50, 100)
  3. >>> R13 = np.random.rand(50, 40)
  4. >>> R23 = np.random.rand(100, 40)

Next, we define our data fusion graph:

  1. >>> from skfusion import fusion
  2. >>> t1 = fusion.ObjectType('Type 1', 10)
  3. >>> t2 = fusion.ObjectType('Type 2', 20)
  4. >>> t3 = fusion.ObjectType('Type 3', 30)
  5. >>> relations = [fusion.Relation(R12, t1, t2),
  6. fusion.Relation(R13, t1, t3),
  7. fusion.Relation(R23, t2, t3)]
  8. >>> fusion_graph = fusion.FusionGraph()
  9. >>> fusion_graph.add_relations_from(relations)

and then collectively infer the latent data model:

  1. >>> fuser = fusion.Dfmf()
  2. >>> fuser.fuse(fusion_graph)
  3. >>> print(fuser.factor(t1).shape)
  4. (50, 10)

Afterwards new data might arrive:

  1. >>> new_R12 = np.random.rand(10, 100)
  2. >>> new_R13 = np.random.rand(10, 40)

for which we define the fusion graph:

  1. >>> new_relations = [fusion.Relation(new_R12, t1, t2),
  2. fusion.Relation(new_R13, t1, t3)]
  3. >>> new_graph = fusion.FusionGraph(new_relations)

and transform new objects to the latent space induced by the fuser:

  1. >>> transformer = fusion.DfmfTransform()
  2. >>> transformer.transform(t1, new_graph, fuser)
  3. >>> print(transformer.factor(t1).shape)
  4. (10, 10)

scikit-fusion contains several applications of data fusion:

  1. >>> from skfusion import datasets
  2. >>> dicty = datasets.load_dicty()
  3. >>> print(dicty)
  4. FusionGraph(Object types: 3, Relations: 3)
  5. >>> print(dicty.object_types)
  6. {ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)}
  7. >>> print(dicty.relations)
  8. {Relation(ObjectType(Gene), ObjectType(GO term)),
  9. Relation(ObjectType(Gene), ObjectType(Gene)),
  10. Relation(ObjectType(Gene), ObjectType(Experimental condition))}

Selected publications (Methods)

Selected publications (Applications)

Tutorials

  • Large-scale data fusion by collective matrix factorization, Basel Computational Biology Conference, [BC]^2 [Slides] [Handouts]
  • Data fusion of everything, 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC [Slides] [Handouts]