项目作者: evilmartians

项目描述 :
A Gem for creating partial anonymized dumps of your database using your app model relations.
高级语言: Ruby
项目地址: git://github.com/evilmartians/evil-seed.git
创建时间: 2017-04-14T07:20:07Z
项目社区:https://github.com/evilmartians/evil-seed

开源协议:MIT License

下载


Gem Version
Build Status
Cult of Martians

EvilSeed

EvilSeed is a tool for creating partial anonymized dump of your database based on your app models.


Sponsored by Evil Martians

Motivation

Using production-like data in your staging environment could be very useful, especially for debugging intricate production bugs.

The easiest way to achieve this is to use production database backups. But that’s not an option for rather large applications for two reasons:

  • production dump can be extremely large, and it just can’t be dumped and restored in a reasonable time

  • you should care about sensitive data (anonymization).

EvilSeed aims to solve these problems.

Installation

Add this line to your application’s Gemfile:

  1. gem 'evil-seed', require: false

And then execute:

  1. $ bundle

Or install it yourself as:

  1. $ gem install evil-seed

Usage

Configuration

  1. require 'evil_seed'
  2. EvilSeed.configure do |config|
  3. # First, you should specify +root models+ and their +constraints+ to limit the number of dumped records:
  4. # This is like Forum.where(featured: true).all
  5. config.root('Forum', featured: true) do |root|
  6. # You can limit number of records to be dumped
  7. root.limit(100)
  8. # Specify order for records to be selected for dump
  9. root.order(created_at: :desc)
  10. # It's possible to remove some associations from dumping with pattern of association path to exclude
  11. #
  12. # Association path is a dot-delimited string of association chain starting from model itself:
  13. # example: "forum.users.questions"
  14. root.exclude(/\btracking_pixels\b/, 'forum.popular_questions', /\Aforum\.parent\b/)
  15. # Include back only certain association chains
  16. root.include(parent: {questions: %i[answers votes]})
  17. # which is the same as
  18. root.include(/\Aforum(\.parent(\.questions(\.(answers|votes))?)?)?\z/)
  19. # You can also specify custom scoping for associations
  20. root.include(questions: { answers: :reactions }) do
  21. order(created_at: :desc) # Any ActiveRecord query method is allowed
  22. end
  23. # It's possible to limit the number of included into dump has_many and has_one records for every association
  24. # Note that belongs_to records for all not excluded associations are always dumped to keep referential integrity.
  25. root.limit_associations_size(100)
  26. # Or for certain association only
  27. root.limit_associations_size(5, 'forum.questions')
  28. root.limit_associations_size(15, 'forum.questions.answers')
  29. # or
  30. root.limit_associations_size(5, :questions)
  31. root.limit_associations_size(15, questions: :answers)
  32. # Limit the depth of associations to be dumped from the root level
  33. # All traverses through has_many, belongs_to, etc are counted
  34. # So forum.subforums.subforums.questions.answers will be 5 levels deep
  35. root.limit_deep(10)
  36. end
  37. # Everything you can pass to +where+ method will work as constraints:
  38. config.root('User', 'created_at > ?', Time.current.beginning_of_day - 1.day)
  39. # For some system-wide models you may omit constraints to dump all records
  40. config.root("Role") do |root|
  41. # Exclude everything
  42. root.exclude(/.*/)
  43. end
  44. # Transformations allows you to change dumped data e. g. to hide sensitive information
  45. config.customize("User") do |u|
  46. # Reset password for all users to the same for ease of debugging on developer's machine
  47. u["encrypted_password"] = encrypt("qwerty")
  48. # Reset or mutate other attributes at your convenience
  49. u["metadata"].merge!("foo" => "bar")
  50. u["created_at"] = Time.current
  51. # Please note that there you have only hash of record attributes, not the record itself!
  52. end
  53. # Anonymization is a handy DSL for transformations allowing you to transform model attributes in declarative fashion
  54. # Please note that model setters will NOT be called: results of the blocks will be assigned to
  55. config.anonymize("User") do
  56. name { Faker::Name.name }
  57. email { Faker::Internet.email }
  58. login { |login| "#{login}-test" }
  59. end
  60. # You can ignore columns for any model. This is specially useful when working
  61. # with encrypted columns.
  62. #
  63. # This will remove the columns even if the model is not a root node and is
  64. # dumped via an association.
  65. config.ignore_columns("Profile", :name)
  66. # Disable foreign key nullification for records that are not included in the dump
  67. # By default, EvilSeed will nullify foreign keys for records that are not included in the dump
  68. config.dont_nullify = true
  69. # Unscope relations to include soft-deleted records etc
  70. # This is useful when you want to include all records, including those that are hidden by default
  71. # By default, EvilSeed will abide default scope of models
  72. config.unscoped = true
  73. # Verbose mode will print out the progress of the dump to the console along with writing the file
  74. # By default, verbose mode is off
  75. config.verbose = true
  76. config.verbose_sql = true
  77. end

Creating dump

Just call the #dump method and pass a path where you want your SQL dump file to appear!

  1. require 'evil_seed'
  2. EvilSeed.dump('path/to/new_dump.sql')

Caveats, tips, and tricks

  1. Specify roots for dictionaries and system-wide models like Role at the top without constraints and with all associations excluded.

  2. Use exclude aggressively. You will be amazed, how much your app’s models graph is connected. This, in conjunction with the fact that this gem traverses associations in deep-first fashion, sometimes leads to unwanted results: some records will get into dump even if you don’t want them.

  3. Look at the resulted dump: there are some useful debug comments.

Database compatibility

This gem has been tested against:

  • PostgreSQL: any version that works with ActiveRecord should work
  • MySQL: any version that works with ActiveRecord should work
  • SQLite: 3.7.11 or newer is required (with support for inserting multiple rows at a time)

Restoring dump

Resulting dump is a plain SQL file: you can restore it using any SQL client like psql, mysql, sqlite3, etc.

If you need to do it from Ruby, you can use the following code:

  1. ActiveRecord::Base.connection.execute(File.read('path/to/new_dump.sql'))

Restoration tips and tricks

  1. Reset primary key sequences after restoration, so default seeds can be generated afterwards and your app will work as expected:

    1. ActiveRecord::Base.connection.tables.each do |table|
    2. ActiveRecord::Base.connection.reset_pk_sequence!(table)
    3. end
  2. To restore dumps with circular dependencies between records in PostgreSQL you can make all foreign keys deferrable beforehand (by default they are not) and restore the dump in a transaction with all foreign keys deferred.

    Code to defer, restore, undefer:

    ```ruby
    connection = ActiveRecord::Base.connection

    Convert all foreign keys to deferrable to handle circular dependencies

    transaction do
    connection.tables.each do |table|

    1. connection.foreign_keys(table).each do |fk|
    2. connection.execute <<~SQL.squish
    3. ALTER TABLE #{connection.quote_table_name(table)}
    4. ALTER CONSTRAINT #{connection.quote_table_name(fk.options[:name])}
    5. NOT DEFERRABLE
    6. SQL
    7. end

    end
    end

    Load the dump

    connection.transaction do
    connection.execute(“SET CONSTRAINTS ALL DEFERRED”)
    connection.execute(File.read(filepath))
    end

  1. # Convert all foreign keys back to not deferrable
  2. # See https://begriffs.com/posts/2017-08-27-deferrable-sql-constraints.html#reasons-not-to-defer
  3. connection.transaction do
  4. connection.tables.each do |table|
  5. connection.foreign_keys(table).each do |fk|
  6. connection.execute <<~SQL.squish
  7. ALTER TABLE #{connection.quote_table_name(table)}
  8. ALTER CONSTRAINT #{connection.quote_table_name(fk.options[:name])}
  9. NOT DEFERRABLE
  10. SQL
  11. end
  12. end
  13. end
  14. ```
  15. </details>

FIXME (help wanted)

  1. has_and_belongs_to_many associations are traversed in a bit nonintuitive way for end user:

    Association path for User.has_and_belongs_to_many :roles is user.users_roles.role, but should be user.roles

  2. Test coverage is poor

  3. Some internal refactoring is required

Standalone usage

If you want to use it as a standalone application, you can place exerything in a single file like this:

  1. #!/usr/bin/env ruby
  2. require 'bundler/inline'
  3. gemfile do
  4. source 'https://rubygems.org'
  5. gem 'activerecord'
  6. gem 'evil-seed'
  7. gem 'mysql2'
  8. end
  9. # Describe your database layout with ActiveRecord models.
  10. # See http://guides.rubyonrails.org/active_record_basics.html
  11. class Category < ActiveRecord::Base
  12. has_many :translations, class_name: "Category::Translation"
  13. end
  14. class Category::Translation < ActiveRecord::Base
  15. belongs_to :category, inverse_of: :translations
  16. end
  17. # Configure evil-seed itself
  18. EvilSeed.configure do |config|
  19. config.root("Category", "id < ?", 1000)
  20. end
  21. # Connect to your database.
  22. # See http://guides.rubyonrails.org/configuring.html#configuring-a-database)
  23. ActiveRecord::Base.establish_connection(ENV.fetch("DATABASE_URL"))
  24. # Create dump in dump.sql file in the same directory as this script
  25. EvilSeed.dump(File.join(__dir__, "dump.sql").to_s)

And launch it like so:

  1. DATABASE_URL=mysql2://user:pass@host/db ruby path/to/your/script.rb

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/palkan/evil-seed.

License

The gem is available as open source under the terms of the MIT License.