项目作者: plaitpy

项目描述 :
plait.py - a fake data modeler
高级语言: Python
项目地址: git://github.com/plaitpy/plaitpy.git
创建时间: 2017-12-22T01:41:45Z
项目社区:https://github.com/plaitpy/plaitpy

开源协议:MIT License

下载


plait.py

plait.py is a program for generating fake data from composable yaml templates.

The idea behind plait.py is that it should be easy to model fake data that
has an interesting shape. Currently, many fake data generators model their data as a
collection of
IID
variables; with plait.py we can stitch together those variables into a more
coherent model.

some example uses for plait.py are:

  • generating mock application data in test environments
  • validating the usefulness of statistical techniques
  • creating synthetic datasets for performance tuning databases

features

  • declarative syntax
  • use basic faker.rb fields with #{} interpolators
  • sample and join data from CSV files
  • lambda expressions, switch and mixture fields
  • nested and composable templates
  • static variables and hidden fields

an example template

  1. # a person generator
  2. define:
  3. min_age: 10
  4. minor_age: 13
  5. working_age: 18
  6. fields:
  7. age:
  8. random: gauss(25, 5)
  9. # minimum age is $min_age
  10. finalize: max($min_age, value)
  11. gender:
  12. mixture:
  13. - value: M
  14. - value: F
  15. name: "#{name.name}"
  16. job:
  17. value: "#{job.title}"
  18. onlyif: this.age > $working_age
  19. address:
  20. template: address/usa.yaml
  21. phone: # add a phone if the person is older than the minor age
  22. template: device/phone.yaml
  23. onlyif: this.age > ${minor_age}
  24. # we model our height as a gaussian that varies based on
  25. # age and gender
  26. height:
  27. lambda: this._base_height * this._age_factor
  28. _base_height:
  29. switch:
  30. - onlyif: this.gender == "F"
  31. random: gauss(60, 5)
  32. - onlyif: this.gender == "M"
  33. random: gauss(70, 5)
  34. _age_factor:
  35. switch:
  36. - onlyif: this.age < 15
  37. lambda: 1 - (20 - (this.age + 5)) / 20
  38. - default:
  39. value: 1

how its different

some specific examples of what plait.py can do:

  • generate proportional populations using census data and CSVs
  • create realistic zipcodes by state, city or region (also using CSVs)
  • create a taxi trip dataset with a cost model based on geodistance
  • add seasonal patterns (daily, weekly, etc) to data

usage

installation

  1. # install with python
  2. pip install plaitpy
  3. # or with pypy
  4. pypy-pip install plaitpy

cloning the repo for development

  1. git clone https://github.com/plaitpy/plaitpy
  2. # get the fakerb repo
  3. git submodule init
  4. git submodule update

generating records from command line

specify a template as a yaml file, then generate records from that yaml file.

  1. # a simple example (if cloning plait.py repo)
  2. python main.py templates/timestamp/uniform.yaml
  3. # if plait.py is installed via pip
  4. plait.py templates/timestamp/uniform.yaml

generating records from API

  1. import plaitpy
  2. t = plaitpy.Template("templates/timestamp/uniform.yaml")
  3. print t.gen_record()
  4. print t.gen_records(10)

looking up faker fields

plait.py also simplifies looking up faker fields:

  1. # list faker namespaces
  2. plait.py --list
  3. # lookup faker namespaces
  4. plait.py --lookup name
  5. # lookup faker keys
  6. # (-ll is short for --lookup)
  7. plait.py --ll name.suffix

documentation

yaml file commands

  • see docs/FORMAT.md

datasets

  • see docs/EXAMPLES.md
  • also see templates/ dir

troubleshooting

  • see docs/TROUBLESHOOTING.md

Dependent Markov Processes

To simulate data that comes from many markov processes (a markov ecosystem),
see the plaitpy-ipc repository.

future direction

If you have ideas on features to add, open an issue - Feedback is appreciated!

License

MIT