项目作者: danielgribel

项目描述 :
Exploratory data analysis of the Cadastro Unico dataset (2018): pre-processing, basic statistics, data visualization, and variables correlation.
高级语言: Jupyter Notebook
项目地址: git://github.com/danielgribel/CadastroUnico.git
创建时间: 2021-05-06T20:52:38Z
项目社区:https://github.com/danielgribel/CadastroUnico

开源协议:

下载


CadastroUnico: Exploratory data analysis

In this notebook, we explore the Cadastro Unico dataset of 2018. The Cadastro Unico of the Federal Government’s Social Programs provides information about low-income families, presenting detailed information regarding their income, homes, and other socioeconomic indicators. We consider the dataset of 2018, which consists of unidentified samples, i.e., the information provided in the dataset assures the security of personal information.

The Cadastro Unico datasets present 30 variables related to families and 34 related to individuals. The datasets are available at the Ministry of Citizenship webpage, with data from 2012 to 2018:

https://aplicacoes.mds.gov.br/sagi/portal/index.php?grupo=212.

Scope and objective. This notebook analyzes the data related to the families only (more than 4M entries), even though we will examine it together with the individuals’ dataset in the future. The main goal of this notebook is to provide an overall view of the dataset, indicating the most prominent findings through data visualization and basic statistics. Therefore, this notebook can be used as the first step in a more detailed data analysis.