Data Analysis project in R Programming using concepts of Linear Model, Regressions, Data Visualization
Problem Statement : A nationwide survey of hospital costs conducted by the US Agency for Healthcare
consists of hospital records of inpatient samples. The given data is restricted to
the city of Wisconsin and relates to patients in the age group 0-17 years. The
agency wants to analyze the data to research on the healthcare costs and their
utilization.
About the Dataset used :
A detailed description of the given dataset:
AGE : Age of the patient discharged
FEMALE : Binary variable that indicates if the patient is female
LOS : Length of stay, in days
RACE : Race of the patient (specified numerically)
TOTCHG : Hospital discharge costs
APRDRG : All Patient Refined Diagnosis Related Groups
Objective / Goal Statement Achieved :
To record the patient statistics, the agency wants to find the age category
of people who frequent the hospital and has the maximum expenditure.
In order of severity of the diagnosis and treatments and to find out the
expensive treatments, the agency wants to find the diagnosis related group
that has maximum hospitalization and expenditure.
To make sure that there is no malpractice, the agency needs to analyze if
the race of the patient is related to the hospitalization costs.
To properly utilize the costs, the agency has to analyze the severity of the
hospital costs by age and gender for proper allocation of resources.
Since the length of stay is the crucial factor for inpatients, the agency wants
to find if the length of stay can be predicted from age, gender, and race.
To perform a complete analysis, the agency wants to find the variable that
mainly affects the hospital costs.