项目作者: mikulatomas

项目描述 :
🧠 Formal Concept Analysis with focus on Cognitive Psychology
高级语言: Python
项目地址: git://github.com/mikulatomas/fcapy.git
创建时间: 2019-07-02T10:57:10Z
项目社区:https://github.com/mikulatomas/fcapy

开源协议:MIT License

下载


FCApy

Travis (.com)
Read the Docs (version)
Codecov
GitHub

A library to work with formal (and pattern) contexts, concepts, lattices

Created under the guidance of S.O.Kuznetsov and A.A.Neznanov of HSE Moscow.

Install

FCApy can be installed from PyPI:

  1. pip install fcapy

The library has no strict dependencies. However one would better install it with all the additional packages:

  1. pip install fcapy[all]

Current state

The library provides an implementation of the Formal Context idea from FCA. An example of this is given in here.

The library consists of 4 main subpackages:

  • context
  • lattice
  • mvcontext
  • ml

Context

An implementation of Formal Context from FCA theory.

Formal Context K = (G, M, I) is a triple of set of objects G, set of attributes M and a mapping I between them. A natural way to represent a Formal Context is a binary table.

Formal Context provides two main functions:

  • extension(attributes) - return a maximal set of objects which share attributes
  • intention(objects) - return a maximal set of attributes shared by objects

These functions are also known as “prime (‘) operations”, “arrow operations”

For example, ‘animal_movement’ context shows the connection between animals (objects) and actions (attributes)

  1. !wget https://raw.githubusercontent.com/EgorDudyrev/FCApy/main/data/animal_movement.csv
  2. ctx = read_csv('animal_movement.csv')
  3. print(ctx[:5])
  4. > FormalContext (5 objects, 4 attributes, 7 connections)
  5. > |fly|hunt|run|swim|
  6. > dove | X| | | |
  7. > hen | | | | |
  8. > duck | X| | | X|
  9. > goose| X| | | X|
  10. > owl | X| X| | |
  11. print(ctx.extension(['fly', 'swim']))
  12. > ['duck', 'goose']
  13. print(ctx.intention(['dove', 'goose']))
  14. > ['fly']

Thus we can state that all the animals who can both ‘fly’ and ‘swim’ are ‘duck’ and ‘goose’.
The only action both ‘dove’ and ‘goose’ can performs if ‘fly’.
At least this is formally true in ‘animal_movement’ context.

A detailed example is given this notebook.

Lattice

An implementation of Concept Lattice object from FCA theory. That is a partially ordered set of Formal concepts.

A Formal Concept is a pair (A, B) of objects A and attributes B s.t. A contains all the objects which share attributes B and B contains all the attributes which shared by objects A.

In other words:

  • A = extension(B)
  • B = intention(A)

A concept (A1, B1) is bigger (more general) than a concept (A2, B2) if it describes the bigger set of objects (i.e. A2 is a subset of A1, or (which is the same) B1 is a subset of B2)

Applied to ‘animal_movement’ context we get this ConceptLattice:

  1. from fcapy.lattice import ConceptLattice
  2. ltc = ConceptLattice.from_context(ctx)
  3. print(len(ltc.concepts))
  4. > 8
  5. import matplotlib.pyplot as plt
  6. from fcapy.visualizer import Visualizer
  7. plt.figure(figsize=(10, 5))
  8. vsl = Visualizer(ltc)
  9. vsl.draw_networkx(max_new_extent_count=5)
  10. plt.xlim(-1,1.5)
  11. plt.show()



In this Concept Lattice a concept #3 contains all the objects which can ‘fly’. These are ‘dove’ plus objects from more specific concept #6: ‘goose’ and ‘duck’.

A concept #4 represents all the animals who can ‘run’ (acc. to more general concept #2) and ‘hunt’ (acc. to more general concept #1).

MVContext

An implementation of Many Valued Context from FCA theory.

MVContext is a generalization of Formal Context. It allows FCA to work with any kind of object description defined by Pattern Structures.

Pattern Structure D is a set of descriptions s.t. we can use to it to run extension and intention operations.

At this moment, only numerical features are supported.

  1. #load data from sci-kit learn
  2. from sklearn.datasets import fetch_california_housing
  3. california_data = fetch_california_housing(as_frame=True)
  4. df = california_data['data'].round(3)
  5. from fcapy.mvcontext import MVContext, PS
  6. # define a specific type of PatternStructure for each column of a dataframe
  7. pattern_types = {f: PS.IntervalPS for f in df.columns}
  8. # create a MVContext
  9. mvctx = MVContext(df.values, pattern_types=pattern_types, attribute_names=df.columns)
  10. print( mvctx )
  11. > ManyValuedContext (20640 objects, 8 attributes)
  12. # Get the common description of the first 2 houses
  13. print( mvctx.intention(['0', '1']) )
  14. > {'MedInc': (8.301, 8.325), 'HouseAge': (21.0, 41.0), 'AveRooms': (6.238, 6.984),
  15. > 'AveBedrms': (0.972, 1.024), 'Population': (322.0, 2401.0), 'AveOccup': (2.11, 2.556),
  16. > 'Latitude': (37.86, 37.88), 'Longitude': (-122.23, -122.22)}
  17. # Get a number of houses with an age in a closed interval [10, 21]
  18. print( len(mvctx.extension({'HouseAge': (10, 21)})) )
  19. > 5434

ML

A number of algorithms to use FCA in a supervised ML scenario.

  1. #load data from sci-kit learn
  2. from sklearn.datasets import fetch_california_housing
  3. california_data = fetch_california_housing(as_frame=True)
  4. df = california_data['data']
  5. y = california_data['target']
  6. from fcapy.mvcontext import MVContext, PS
  7. # define a specific type of PatternStructure for each column of a dataframe
  8. pattern_types = {f: PS.IntervalPS for f in df.columns}
  9. # create a MVContext
  10. mvctx = MVContext(
  11. df.values, target=y.values,
  12. pattern_types=pattern_types, attribute_names=df.columns
  13. )
  14. print( mvctx )
  15. > ManyValuedContext (20640 objects, 8 attributes)
  16. # split to train and test set
  17. mvctx_train, mvctx_test = mvctx[:16000], mvctx[16000:]
  18. # Initialize a DecisionLattice model (which uses RandomForest in the construction process)
  19. from fcapy.ml.decision_lattice import DecisionLatticeRegressor
  20. rf_params = {'n_estimators':5, 'max_depth':10}
  21. dlr = DecisionLatticeRegressor(algo='RandomForest', algo_params={'rf_params':rf_params})
  22. # Fit the model
  23. %time dlr.fit(mvctx_train, use_tqdm=True)
  24. > CPU times: user 43.1 s, sys: 67.8 ms, total: 43.1 s
  25. > Wall time: 43.1 s
  26. # Predict the values
  27. preds_train_dlr = dlr.predict(mvctx_train)
  28. preds_test_dlr = dlr.predict(mvctx_test)
  29. ## sometimes a test object can not be described by any concept from ConceptLattice
  30. ## in this case the model predicts None. We replace it with mean target value over the train context
  31. preds_test_dlr = [p if p is not None else mvctx_train.target.mean() for p in preds_test_dlr]
  32. # Calculate the MSE
  33. from sklearn.metrics import mean_squared_error
  34. mean_squared_error(mvctx_train.target, preds_train_dlr), mean_squared_error(mvctx_test.target, preds_test_dlr)
  35. > (0.15651125729264054, 0.5543609802892809)
  36. # Fit a Random Forest model for the comparison
  37. from sklearn.ensemble import RandomForestRegressor
  38. rf = RandomForestRegressor(**rf_params)
  39. %time rf.fit(df[:16000], y[:16000])
  40. > CPU times: user 240 ms, sys: 0 ns, total: 240 ms
  41. > Wall time: 238 ms
  42. preds_train_rf = rf.predict(df[:16000])
  43. preds_test_rf = rf.predict(df[16000:])
  44. mean_squared_error(mvctx_train.target, preds_train_rf), mean_squared_error(mvctx_test.target, preds_test_rf)
  45. > (0.16501598118202618, 0.48447718343174856)

DecisionLattice works slower and gives less accurate test predictions than a Random Forest. For now…

Plans

  • Refactor the library to make it more easy-to-use
  • Optimize the library to make it work faster (e.g. add parallelization)