项目作者: IstatCooperation

项目描述 :
CSPro2sql offers a set of functionalities to store CSPro data in a relational database
高级语言: Java
项目地址: git://github.com/IstatCooperation/CSPro2sql.git
创建时间: 2019-05-23T10:10:22Z
项目社区:https://github.com/IstatCooperation/CSPro2sql

开源协议:European Union Public License 1.1

下载


" class="reference-link">CSPro2Sql New release

CsPro2Sql is a Java application to migrate questionnaires from CsPro 7.0 to a MySQL database.

The MySQL database will contain the microdata ie. a column per each variable (Item) defined in the CsPro-Dictionary.

Requirements

Environment:

  • Java 1.7+
  • MySQL 5.7+

Libraries:

Installation

CsPro2Sql is simple to install: all you need is to download and unzip the CsPro2Sql.zip. Depending on your system execute from the command line CsPro2Sql.bat or CsPro2Sql.sh.

" class="reference-link">Database configuration New release

This new release of the software solves loader engine performance issues. To get the best performances from your InnoDB you should set the following properties in your my.ini (or my.cnf):

  1. innodb_flush_log_at_trx_commit=2
  2. innodb_log_buffer_size=128M
  3. innodb_buffer_pool_size=512M
  4. innodb_log_file_size=256M

The first setting will strongly impact the performance on the loaded engine!

In the db folder there are two example files for Mysql 5.7 and Mysql 8.0

Usage

CsPro2Sql is composed of several engines (run CsPro2Sql to get usage info):

  1. CsPro2Sql -e generate -p SURVEY_NAME [-hh HOUSEHOLD_QUEST] [-l LISTING_QUEST] [-ea EA_QUEST]
  2. CsPro2Sql -e scan -p PROPERTIES_FILE
  3. CsPro2Sql -e schema -p PROPERTIES_FILE [-fk] [-o OUTPUT_FILE]
  4. CsPro2Sql -e loader -p PROPERTIES_FILE [-a] [-cc] [-co] [-f|-r] [-o OUTPUT_FILE]
  5. CsPro2Sql -e monitor -p PROPERTIES_FILE [-o OUTPUT_FILE]
  6. CsPro2Sql -e update -p PROPERTIES_FILE
  7. CsPro2Sql -e status -p PROPERTIES_FILE
  8. CsPro2Sql -e territory -p PROPERTIES_FILE
  9. CsPro2Sql -e LU -p PROPERTIES_FILE
  10. CsPro2Sql -e connection -p PROPERTIES_FILE

Engines description:

  • generate: generates a cspro2sql project (files and folders needed to execute cspro2sql engines)
  • scan: check input data, metadata, territory structure and database connections
  • schema: to create the microdata MySQL script
  • loader: to transfer data from the CsPro 7.0 database to the microdata MySQL database
  • monitor: to create the dashboard MySQL script
  • update: to update the dashboard report data
  • status: to check the loader engine status
  • territory: generates the territory table and uploads data from territory.csv file
  • LU: load & update (invoked the loader & update engines)
  • connection: tests source/destination database connection

Parameters:

  1. -a,--all transfer all the questionnaires
  2. -cc,--check-constraints perform constraints check
  3. -co,--check-only perform only constraints check (no data transfer)
  4. -e,--engine <arg> select engine: [loader|schema|monitor|update|status]
  5. -ea,--enum area <arg> name of enumeration area dictionary file
  6. -f,--force skip check of loader multiple running instances
  7. -fk,--foreign-keys create foreign keys to value sets
  8. -h,--help display this help
  9. -hh,--household <arg> name of household dictionary file (dafault value is 'household')
  10. -l,--listing <arg> name of listing dictionary file
  11. -o,--output <arg> name of the output file
  12. -p,--properties <arg> properties file
  13. -r,--recovery recover a broken session of the loader
  14. -v,--version print the version of the programm

Execution steps

new engine Engine generate

Suppose that you are using CSPro to manage data collection process in your pilot survey and that you have two dictionaries (household.dcf, listing.dcf). In order to setup a cspro2sql project to manage your pilot survey data, execute the following command:

  1. > cspro2sql -e generate -s pilot -hh household -l listing

Cspro2sql will generate a set of files and folders to support the configuration activities.
The output of the command will be:

  1. Starting generation of project pilot
  2. Created folder pilot
  3. Created folder pilot/dictionary
  4. Created folder pilot/territory
  5. Created file pilot/pilot.properties
  6. Created file pilot/README.txt
  7. Created file pilot/dictionary/household_template.dcf
  8. Created file pilot/dictionary/listing_template.dcf
  9. Created file pilot/dictionary/README.txt
  10. Created file pilot/territory/territory_template.csv
  11. Created file pilot/territory/README.txt
  12. Project pilot successfully created.
  13. Now you are ready to start processing your data!
  14. Please open the file pilot/README.txt

The README.txt file in the root folder of the project, provides a step by step guide.

The files Household_template.dcf and Listing_template.dcf, in the dictionary folder, provide examples on cspro2sql metadata (a detailed description in provided in section Metadata).

The file territory_template.dcf, in the territory folder, provides examples on the territory data (a detailed description in provided in section Territory).

new engine Engine scan

At the end of the [PRELIMINARY STEPS] described in the README.txt, execute the scan engine:

  1. > cspro2sql -e scan -p test/test.properties

If you have set everything according to the step-by-step guide, you should get the following output:

  1. Starting property file scan...
  2. [DICTIONARIES]
  3. - File popstandashboard/dictionary/household.dcf OK
  4. - File popstandashboard/dictionary/listing.dcf OK
  5. - File popstandashboard/dictionary/carto.dcf OK
  6. [METADATA]
  7. Tag #household OK (HOUSEHOLD)
  8. Tag #listing OK (LISTING)
  9. Tag #expected OK (CARTOGRAPHY)
  10. Tag #individual OK (HOUSEHOLD)
  11. Tag #age OK (HOUSEHOLD)
  12. Tag #sex OK (HOUSEHOLD)
  13. Tag #religion MISSING
  14. Tag #expectedQuestionnaires OK (GEOCODES_DICT)
  15. Tag #lat OK (LISTING)
  16. Tag #lon OK (LISTING)
  17. Tag #territory OK (HOUSEHOLD, LISTING, CARTOGRAPHY)
  18. [TERRITORY]
  19. - File popstandashboard/territory/territory.csv: OK
  20. Parsing territory structure...
  21. #Dictionary
  22. PROVINCE[Province] -> DISTRICT[District] -> EA[EA]
  23. #Territory file
  24. Province -> Province_NAME -> District -> District_NAME -> EA -> EA_NAME
  25. Territory file matches metadata. It is possible to generate the territory table!
  26. [Database]
  27. Connecting to jdbc:mysql://localhost:3307/csweb
  28. Connection successful!
  29. Connecting to jdbc:mysql://localhost:3307/dashboard
  30. Connection successful!
  31. ...scanning completed!

Engine schema & loader

Now you are ready to generate the microdata database and store CSPro data.

  1. > cspro2sql -e schema -p Household.properties o microdata.sql
  2. > mysql -u dstUsername -p < microdata.sql
  3. > cspro2sql -e loader -p Household.properties cc

Engine territory

Generate and populate the territory table.

  1. > cspro2sql -e territory -p test/test.properties

Engine monitor & update

Generate report tables and calculate reports.

  1. > cspro2sql -e monitor -p test/test.properties -o test/dashboard_report.sql
  2. > mysql -u dstUsername -p < test/dashboard_report.sql
  3. > cspro2sql -e update -p test/test.properties

To monitor the loader activity run:

  1. > CsPro2Sql -e status -p Household.properties

Metadata

In order to generate dashboard reports it is necessary to add metadata to CSPro dictionaries. Metadata are classified in:

  • dictionary: these metadata are used to mark dictionaries (household, listing, ea) and to mark individual record
  • variable: these metadata are used to mark variables (i.e. sex, age, latitude, longitude, etc.)
  • territory: these metadata are used to mark the territory structure

The list of metadata is provided below:

Dictionary metadata

  • household: use this tag to mark the household dictionary [MANDATORY]
  • individual: use this tag to mark the individual table [MANDATORY]
  • listing: use this tag to mark the listing dictionary
  • expected: use this tag to mark the EA code dictionary

Variable metadata

  • age: use this tag to mark the age variable. It is also necessary to specify the range of variable [MANDATORY]
  • sex: use this tag to mark the sex variable. It is also necessary to mark in the valueset the Male/Female values [MANDATORY]
  • religion: use this tag to mark the religion variable
  • expectedQuestionnaires: use this tag to mark the expected households from cartograhpy
  • lat: use this tag to mark the latitude of the household
  • lon: use this tag to mark the longitude of the household

Territory metadata

The territory metadata allow to specify the territorial hierarchy. Suppose that your hierarchy is the following:

Region -> Province -> Commune -> EA

Further let us suppose that in your Household dictionary the variables related to your territory structure are:

  1. ID101 -> Region
  2. ID102 -> Province
  3. ID103 -> Commune
  4. ID104 -> EA

In order to bind variables and territorial hiedarchy it is necessary to add the following notes (check the Household_template.dcf file):

  1. [Item]
  2. Label=101 Region
  3. Name=ID101
  4. Note=#territory[Region]
  5. [Item]
  6. Label=102 Province
  7. Name=ID102
  8. Note=#territory[Province, ID101]
  9. [Item]
  10. Label=103 Commune
  11. Name=ID103
  12. Note=#territory[Commune, ID102]
  13. [Item]
  14. Label=104 EA
  15. Name=ID104
  16. Note=#territory[EA, ID103]

The territory.csv file should have the following columns:

  1. Region; Region_NAME; Province; Province_NAME; Commune; Commune_NAME; EA; EA_NAME

Configuration

In order to run CsPro2Sql engines it is necessary to configure a properties file. Such file must contain the following properties:

  1. #[CSPro] List of CSPro dictionaries (household, freshlist, EA)
  2. dictionary=test/dictionary/household.dcf
  3. #[Dashboard] Table prefixes in Dashboard database (household, freshlist, EA)
  4. dictionary.prefix=h
  5. #[Territory] File containing territory data (codes, values)
  6. territory=test/territory/territory.csv
  7. #[CSPro] Specify CSWEB database connection parameters
  8. db.source.server=localhost
  9. db.source.port=3307
  10. db.source.schema=bdcivcensus2019
  11. db.source.username=root
  12. db.source.password=root
  13. #[Dashboard] Specify Dashboard database connection parameters
  14. db.dest.server=localhost
  15. db.dest.port=3307
  16. db.dest.schema=test2
  17. db.dest.username=root
  18. db.dest.password=root

Optional properties are:

  • multiple.response: list of items to be considered as a multiple answer (comma separated)
  • ignore.items: list of items to be ignored (comma separated)

Note: the source CsPro 7.0 database and the microdata MySQL could be the same

Example of properties file (eg. household.properties):

  1. #[CSPro] List of CSPro dictionaries (household, freshlist, EA)
  2. dictionary=survey/dictionary/household.dcf, survey/dictionary/listing.dcf
  3. #[Dashboard] Table prefixes in Dashboard database (household, freshlist, EA)
  4. dictionary.prefix=h,l
  5. #[Territory] File containing territory data (codes, values)
  6. territory=test/territory/territory.csv
  7. #[CSPro] Specify CSWEB database connection parameters
  8. db.source.server=localhost
  9. db.source.port=3307
  10. db.source.schema=csweb
  11. db.source.username=root
  12. db.source.password=root
  13. #[Dashboard] Specify Dashboard database connection parameters
  14. db.dest.server=localhost
  15. db.dest.port=3307
  16. db.dest.schema=dashboard
  17. db.dest.username=root
  18. db.dest.password=root

Warnings

  • The CsPro tag [Relation] is ignored
  • A ValueSet with more than 1000 elements is ignored (the threshold will be parameterized in future realesed)

Acknowledgement

The team responsible of Census and Survey Processing System (CSPro).

The first release of cspro2sql has been developed in the framework of the Capacity building project in Ethiopia (fourth Population and Housing Census), funded by AICS

License

CSPro2Sql is EUPL-licensed