Title: | Assesses the Quality of Estimates Made by Complex Sample Designs |
---|---|
Description: | Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>) and by Economic Commission for Latin America and Caribbean (2024, chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content). |
Authors: | Klaus Lehmann [aut, cre], Ricardo Pizarro [aut], Ignacio Agloni [ctb], Andrea López [ctb], Javiera Preuss [ctb] |
Maintainer: | Klaus Lehmann <[email protected]> |
License: | GPL-3 |
Version: | 0.6.0 |
Built: | 2024-11-16 03:50:11 UTC |
Source: | https://github.com/cran/calidad |
assess
evaluates the quality of mean estimations using the
methodology created by INE Chile, which considers sample size, degrees of freedom, and
coefficient of variation.
assess( table, publish = FALSE, scheme = c("chile", "eclac_2020", "eclac_2023"), domain_info = FALSE, ... )
assess( table, publish = FALSE, scheme = c("chile", "eclac_2020", "eclac_2023"), domain_info = FALSE, ... )
table |
|
publish |
|
scheme |
|
domain_info |
Logical. If |
... |
additional parameters for the evaluation. The complete list of parameters is: 1. General Parameters
2. chile Parameters
3. CEPAL 2020 Parameters
4. CEPAL 2023 Parameters
|
dataframe
with all the columns included in the input table, plus a new column
containing a label indicating the evaluation of each estimation: reliable, bit reliable, or unreliable.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) assess(create_mean("gastot_hd", domains = "zona+sexo", design = dc))
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) assess(create_mean("gastot_hd", domains = "zona+sexo", design = dc))
CASEN data for the year 2020. Contains only a few variables.
casen
casen
dataframe with 185.437 rows y 6 columns
household id
1 = man; 2 = woman
age
Economic activity status
Household Income
poverty status: 1 = extreme poverty, 2 = non-extreme poverty, 3 = non-poverty
regional sample weights
strata
PSU
http://observatorio.ministeriodesarrollosocial.gob.cl/encuesta-casen-en-pandemia-2020
data(casen)
data(casen)
Create html table with the results of the evaluation
create_html(table)
create_html(table)
table |
|
html
table
library(survey) library(dplyr) hogar <- epf_personas %>% group_by(folio) %>% slice(1) dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = hogar, weights = ~fe) table <- assess(create_prop("ocupado", domains = "zona+sexo", design = dc))
library(survey) library(dplyr) hogar <- epf_personas %>% group_by(folio) %>% slice(1) dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = hogar, weights = ~fe) table <- assess(create_prop("ocupado", domains = "zona+sexo", design = dc))
create_mean
generates a dataframe
with the following elements: mean,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
create_mean( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )
create_mean( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
eclac_input |
|
dataframe
that contains the inputs and all domains to be evaluated.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_mean("gastot_hd", "zona+sexo", design = dc)
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_mean("gastot_hd", "zona+sexo", design = dc)
create_prop
generates a dataframe
with the following elements: sum,
degrees of freedom, sample size, standard error, and coefficient of variation. The function allows
grouping in several domains.
create_prop( var, denominator = NULL, domains = NULL, subpop = NULL, design, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = FALSE, eclac_input = FALSE )
create_prop( var, denominator = NULL, domains = NULL, subpop = NULL, design, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = FALSE, eclac_input = FALSE )
var |
numeric variable within the |
denominator |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
rel_error |
|
log_cv |
|
unweighted |
|
standard_eval |
|
eclac_input |
|
dataframe
that contains the inputs and all domains to be evaluated.
library(survey) library(dplyr) epf <- mutate(epf_personas, gasto_zona1 = if_else(zona == 1, gastot_hd, 0)) dc <- svydesign(ids = ~varunit, strata = ~varstrat, data = epf, weights = ~fe) old_options <- options() options(survey.lonely.psu = "certainty") create_prop(var = "gasto_zona1", denominator = "gastot_hd", design = dc) enusc <- filter(enusc, Kish == 1) dc <- svydesign(ids = ~Conglomerado, strata = ~VarStrat, data = enusc, weights = ~Fact_Pers) options(survey.lonely.psu = "certainty") create_prop(var = "VP_DC", denominator = "hom_insg_taxi", design = dc) options(old_options)
library(survey) library(dplyr) epf <- mutate(epf_personas, gasto_zona1 = if_else(zona == 1, gastot_hd, 0)) dc <- svydesign(ids = ~varunit, strata = ~varstrat, data = epf, weights = ~fe) old_options <- options() options(survey.lonely.psu = "certainty") create_prop(var = "gasto_zona1", denominator = "gastot_hd", design = dc) enusc <- filter(enusc, Kish == 1) dc <- svydesign(ids = ~Conglomerado, strata = ~VarStrat, data = enusc, weights = ~Fact_Pers) options(survey.lonely.psu = "certainty") create_prop(var = "VP_DC", denominator = "hom_insg_taxi", design = dc) options(old_options)
internal function to calculate proportion estimations
create_prop_internal( var, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = TRUE, rm.na = FALSE, env = parent.frame() )
create_prop_internal( var, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = TRUE, rm.na = FALSE, env = parent.frame() )
var |
integer dummy variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe |
disenio |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
rel_error |
|
log_cv |
|
unweighted |
|
standard_eval |
|
rm.na |
|
env |
parent environment to get some variables |
dataframe
that contains the inputs and all domains to be evaluated
internal function to calculate ratios estimations
create_ratio_internal( var, denominator, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, unweighted = FALSE, rel_error = FALSE, rm.na = FALSE )
create_ratio_internal( var, denominator, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, unweighted = FALSE, rel_error = FALSE, rm.na = FALSE )
var |
numeric variable within the |
denominator |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe |
disenio |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
unweighted |
|
rel_error |
|
rm.na |
|
dataframe
that contains the inputs and all domains to be evaluated
create_size
generates a dataframe
with the following elements: sum,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
create_size( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, df_type = c("chile", "eclac"), eclac_input = FALSE )
create_size( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, df_type = c("chile", "eclac"), eclac_input = FALSE )
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
df_type |
|
eclac_input |
|
dataframe
that contains the inputs and all domains to be evaluated.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_size("ocupado", "zona+sexo", design = dc)
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_size("ocupado", "zona+sexo", design = dc)
create_total
generates a dataframe
with the following elements: sum,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
create_total( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )
create_total( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
eclac_input |
|
dataframe
that contains the inputs and all domains to be evaluated.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_total("gastot_hd", "zona+sexo", subpop = "ocupado", design = dc)
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_total("gastot_hd", "zona+sexo", subpop = "ocupado", design = dc)
This function activates the appropriate indicators based on the selected eclac standard and whether proportion indicators are needed.
eclac_standard(eclac, env = parent.frame(), proportion = FALSE)
eclac_standard(eclac, env = parent.frame(), proportion = FALSE)
eclac |
A logical value indicating the eclac standard. |
env |
The environment from which to retrieve the existing indicator values. Defaults to the parent frame. |
proportion |
A logical value indicating whether proportion indicators should be turned on. Defaults to FALSE. |
A list of logical values indicating which indicators are turned on.
Reduced version of the ENE database. Contains some sociodemographic variables and the necessary information to work with complex design
ene
ene
dataframe with 87.842 rows y 7 columns
1 = man; 2 = woman
region
Economic activity status
sample weights
PSU
strata
It shows if the person belongs to labour force: 1 = yes; 0 = no
1 = employed; 0 = non-employed
1 = non-employed; 0 = employed
https://www.ine.cl/estadisticas/sociales/mercado-laboral/ocupacion-y-desocupacion
data(ene)
data(ene)
ENUSC data for the year 2019. Contains only a few variables.
enusc
enusc
dataframe with 24.465 rows y 22 columns
1 = man; 2 = woman
16 regions
person sample weights
household sample weights
PSU
strata
Individual victimization. It works combined with Fact_Pers
Household victimization. It works combined with Fact_Hog
age
Perception of increased crime in the country. It works combined with Fact_Pers
Cause of increased crime in the neighborhood. It works combined with Fact_Pers
Female perception of insecurity inside taxis. Variable elaborated with variables P9 and rph_sexo . It works combined with Fact_Pers
Male perception of insecurity inside taxis. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity inside buses. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity inside buses. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity inside malls. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity inside malls. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity public transport. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity public transport. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity neighborhood. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity neighborhood. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
data(enusc)
data(enusc)
Reduced version of the VIII EPF database. Contains some sociodemographic variables and the necessary information to work with complex design.
epf_personas
epf_personas
dataframe compuesto por 48.308 observaciones y 8 variables
1 = male; 2 = female
1 = metropolitan area; 2 = rest of the regional capitals
marital status
sample weights
PSU
strata
household expenditure
1 = employed; 0 = non-employed
https://www.ine.cl/estadisticas/sociales/ingresos-y-gastos/encuesta-de-presupuestos-familiares
data(epf_personas)
data(epf_personas)
Receive a table created with survey and return the coefficient of variation for each cell
get_cv(table, design, domains, type_est = "all", env = parent.frame())
get_cv(table, design, domains, type_est = "all", env = parent.frame())
table |
|
design |
design |
domains |
|
type_est |
type of estimation: all or size. |
env |
parent environment |
dataframe
with results including including CV
Receive data and domains. Returns a data frame with the psu, strata and df for each cell
get_df(data, domains, df_type = "eclac")
get_df(data, domains, df_type = "eclac")
data |
|
domains |
|
df_type |
|
dataframe
with results including degrees of freedom
Generates a table with estimates for a given aggregation
get_survey_table( var, domains, complex_design, estimation = "mean", env = parent.frame(), fun, denom = NULL, type_est = "all" )
get_survey_table( var, domains, complex_design, estimation = "mean", env = parent.frame(), fun, denom = NULL, type_est = "all" )
var |
|
domains |
|
complex_design |
design from |
estimation |
|
env |
parent environment |
fun |
function required regarding the estimation |
denom |
denominator. This parameter works for the ratio estimation |
type_est |
type of estimation: all or size |
dataframe
containing main results from survey
quadratic
returns the output of a particular function created by INE Chile, which
is assessed at the value of the estimated proportion from a sample. If the output of the
function is higher than the standard error, it is interpreted as a signal that the
estimation is not reliable.
quadratic(p)
quadratic(p)
p |
numeric vector with the values of the estimations for proportions |
numeric vector
Receive the survey table in raw state and sort it
standardize_columns(data, var, denom)
standardize_columns(data, var, denom)
data |
|
var |
|
denom |
denominator |
dataframe
with standardized data
Rename design variables, so we can use the later
standardize_design_variables(design)
standardize_design_variables(design)
design |
|
design survey