| Title: | Assesses the Quality of Estimates Made by Complex Sample Designs |
|---|---|
| Description: | Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>), (Economics Survey Standard 2024, <https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2>) and by Economic Commission for Latin America and Caribbean (2020, <https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf>), (2024, <https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content>). |
| Authors: | Klaus Lehmann [aut, cre], Ricardo Pizarro [aut], Ignacio Agloni [ctb], Andrea López [ctb], Javiera Preuss [ctb] |
| Maintainer: | Klaus Lehmann <[email protected]> |
| License: | GPL-3 |
| Version: | 0.8.2 |
| Built: | 2026-05-13 07:58:04 UTC |
| Source: | https://github.com/cran/calidad |
assess evaluates the quality of mean estimations using the
methodology created by INE Chile, which considers sample size, degrees of freedom, and
coefficient of variation.
assess( table, publish = FALSE, scheme = c("chile", "eclac_2020", "eclac_2023", "chile_economics"), domain_info = FALSE, low_df_justified = FALSE, table_n_obj = NULL, ratio_between_0_1 = TRUE, ... )assess( table, publish = FALSE, scheme = c("chile", "eclac_2020", "eclac_2023", "chile_economics"), domain_info = FALSE, low_df_justified = FALSE, table_n_obj = NULL, ratio_between_0_1 = TRUE, ... )
table |
|
publish |
|
scheme |
|
domain_info |
Logical. If |
low_df_justified |
Logical. If |
table_n_obj |
Default |
ratio_between_0_1 |
|
... |
additional parameters for the evaluation. The complete list of parameters is: 1. General Parameters
2. chile Parameters
3. CEPAL 2020 Parameters
4. CEPAL 2023 Parameters
5. Chile Economic Survey Standard Parameters
|
dataframe with all the columns included in the input table, plus a new column
containing a label indicating the evaluation of each estimation: reliable, bit reliable, or unreliable.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) assess(create_mean("gastot_hd", domains = "zona+sexo", design = dc))dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) assess(create_mean("gastot_hd", domains = "zona+sexo", design = dc))
CASEN data for the year 2020. Contains only a few variables.
casencasen
dataframe with 185.437 rows y 6 columns
household id
1 = man; 2 = woman
age
Economic activity status
Household Income
poverty status: 1 = extreme poverty, 2 = non-extreme poverty, 3 = non-poverty
regional sample weights
strata
PSU
http://observatorio.ministeriodesarrollosocial.gob.cl/encuesta-casen-en-pandemia-2020
data(casen)data(casen)
Create html table with the results of the evaluation
create_html(table)create_html(table)
table |
|
html table
library(survey) library(dplyr) hogar <- epf_personas %>% group_by(folio) %>% slice(1) dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = hogar, weights = ~fe) table <- assess(create_prop("ocupado", domains = "zona+sexo", design = dc))library(survey) library(dplyr) hogar <- epf_personas %>% group_by(folio) %>% slice(1) dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = hogar, weights = ~fe) table <- assess(create_prop("ocupado", domains = "zona+sexo", design = dc))
create_mean generates a dataframe with the following elements: mean,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
create_mean( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )create_mean( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
eclac_input |
|
dataframe that contains the inputs and all domains to be evaluated.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_mean("gastot_hd", "zona+sexo", design = dc)dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_mean("gastot_hd", "zona+sexo", design = dc)
create_prop generates a dataframe with the following elements: sum,
degrees of freedom, sample size, standard error, and coefficient of variation. The function allows
grouping in several domains.
create_prop( var, denominator = NULL, domains = NULL, subpop = NULL, design, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = FALSE, eclac_input = FALSE, ci_logit = FALSE, scheme = c("eclac_2020", "eclac_2023") )create_prop( var, denominator = NULL, domains = NULL, subpop = NULL, design, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = FALSE, eclac_input = FALSE, ci_logit = FALSE, scheme = c("eclac_2020", "eclac_2023") )
var |
numeric variable within the |
denominator |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
rel_error |
|
log_cv |
|
unweighted |
|
standard_eval |
|
eclac_input |
|
ci_logit |
|
scheme |
|
dataframe that contains the inputs and all domains to be evaluated.
library(survey) library(dplyr) epf <- mutate(epf_personas, gasto_zona1 = if_else(zona == 1, gastot_hd, 0)) dc <- svydesign(ids = ~varunit, strata = ~varstrat, data = epf, weights = ~fe) old_options <- options() options(survey.lonely.psu = "certainty") create_prop(var = "gasto_zona1", denominator = "gastot_hd", design = dc) enusc <- filter(enusc, Kish == 1) dc <- svydesign(ids = ~Conglomerado, strata = ~VarStrat, data = enusc, weights = ~Fact_Pers) options(survey.lonely.psu = "certainty") create_prop(var = "VP_DC", denominator = "hom_insg_taxi", design = dc) options(old_options)library(survey) library(dplyr) epf <- mutate(epf_personas, gasto_zona1 = if_else(zona == 1, gastot_hd, 0)) dc <- svydesign(ids = ~varunit, strata = ~varstrat, data = epf, weights = ~fe) old_options <- options() options(survey.lonely.psu = "certainty") create_prop(var = "gasto_zona1", denominator = "gastot_hd", design = dc) enusc <- filter(enusc, Kish == 1) dc <- svydesign(ids = ~Conglomerado, strata = ~VarStrat, data = enusc, weights = ~Fact_Pers) options(survey.lonely.psu = "certainty") create_prop(var = "VP_DC", denominator = "hom_insg_taxi", design = dc) options(old_options)
internal function to calculate proportion estimations
create_prop_internal( var, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = TRUE, rm.na = FALSE, env = parent.frame(), ci_logit = FALSE )create_prop_internal( var, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, rel_error = FALSE, log_cv = FALSE, unweighted = FALSE, standard_eval = TRUE, rm.na = FALSE, env = parent.frame(), ci_logit = FALSE )
var |
integer dummy variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe |
disenio |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
rel_error |
|
log_cv |
|
unweighted |
|
standard_eval |
|
rm.na |
|
env |
parent environment to get some variables |
ci_logit |
|
dataframe that contains the inputs and all domains to be evaluated
internal function to calculate ratios estimations
create_ratio_internal( var, denominator, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, unweighted = FALSE, rel_error = FALSE, log_cv = FALSE, rm.na = FALSE )create_ratio_internal( var, denominator, domains = NULL, subpop = NULL, disenio, ci = FALSE, deff = FALSE, ess = FALSE, ajuste_ene = FALSE, unweighted = FALSE, rel_error = FALSE, log_cv = FALSE, rm.na = FALSE )
var |
numeric variable within the |
denominator |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe |
disenio |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
unweighted |
|
rel_error |
|
log_cv |
|
rm.na |
|
dataframe that contains the inputs and all domains to be evaluated
create_size generates a dataframe with the following elements: sum,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
create_size( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, df_type = c("chile", "eclac"), eclac_input = FALSE )create_size( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, df_type = c("chile", "eclac"), eclac_input = FALSE )
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
df_type |
|
eclac_input |
|
dataframe that contains the inputs and all domains to be evaluated.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_size("ocupado", "zona+sexo", design = dc)dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_size("ocupado", "zona+sexo", design = dc)
create_total generates a dataframe with the following elements: sum,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
create_total( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )create_total( var, domains = NULL, subpop = NULL, design, ci = FALSE, ess = FALSE, ajuste_ene = FALSE, standard_eval = FALSE, rm.na = FALSE, deff = FALSE, rel_error = FALSE, unweighted = FALSE, eclac_input = FALSE )
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
eclac_input |
|
dataframe that contains the inputs and all domains to be evaluated.
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_total("gastot_hd", "zona+sexo", subpop = "ocupado", design = dc)dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe) create_total("gastot_hd", "zona+sexo", subpop = "ocupado", design = dc)
This function activates the appropriate indicators based on the selected eclac standard and whether proportion indicators are needed.
eclac_standard(eclac, env = parent.frame(), proportion = FALSE)eclac_standard(eclac, env = parent.frame(), proportion = FALSE)
eclac |
A logical value indicating the eclac standard. |
env |
The environment from which to retrieve the existing indicator values. Defaults to the parent frame. |
proportion |
A logical value indicating whether proportion indicators should be turned on. Defaults to FALSE. |
A list of logical values indicating which indicators are turned on.
ELE data for the year 2022. Contains only a few variables.
ELE7ELE7
dataframe with 6.592 rows y 13 columns
Company ID
Economic activity
Company size by sales
Inclusion range
Cross-sectional weights
Longitudinal weights
Panel sample
Strata
Finite population correction
Value added 2022, difference between production value and intermediate consumption
VA_2022f is an adjusted version of VA_2022, where negative values are replaced with 0, while non-negative values remain unchanged.
Total personnel employed and hired by the company on a monthly basis
Total gross remuneration of personnel hired by the company
data(ELE7)data(ELE7)
Target cross-sectional sample size ELE data for the year 2022.
ELE7_n_objELE7_n_obj
dataframe with 59 rows y 4 columns
Company size by sales
Economic activity
Economic activity ID
Target sample size
data(ELE7_n_obj)data(ELE7_n_obj)
Reduced version of the ENE database. Contains some sociodemographic variables and the necessary information to work with complex design
eneene
dataframe with 87.842 rows y 7 columns
1 = man; 2 = woman
region
Economic activity status
sample weights
PSU
strata
It shows if the person belongs to labour force: 1 = yes; 0 = no
1 = employed; 0 = non-employed
1 = non-employed; 0 = employed
https://www.ine.cl/estadisticas/sociales/mercado-laboral/ocupacion-y-desocupacion
data(ene)data(ene)
ENUSC data for the year 2019. Contains only a few variables.
enuscenusc
dataframe with 24.465 rows y 22 columns
1 = man; 2 = woman
16 regions
person sample weights
household sample weights
PSU
strata
Individual victimization. It works combined with Fact_Pers
Household victimization. It works combined with Fact_Hog
age
Perception of increased crime in the country. It works combined with Fact_Pers
Cause of increased crime in the neighborhood. It works combined with Fact_Pers
Female perception of insecurity inside taxis. Variable elaborated with variables P9 and rph_sexo . It works combined with Fact_Pers
Male perception of insecurity inside taxis. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity inside buses. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity inside buses. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity inside malls. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity inside malls. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity public transport. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity public transport. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Female perception of insecurity neighborhood. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Male perception of insecurity neighborhood. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
data(enusc)data(enusc)
ENUSC data for the year 2023. Contains only a few variables.
enusc_2023enusc_2023
dataframe with 49.813 rows y 15 columns
16 regions
Code of region, province and commune
Person sample weights at region level
Person sample weights at commune level
Household sample weights at region level
Household sample weights at commune level
Strata
PSU
Households victimized by violent crimes. It works combined with Fact_Hog_*
Household victimization. It works combined with Fact_Hog_*
People victimized by violent crimes. It works combined with Fact_Pers_*
Individual victimization. It works combined with Fact_Pers_*
Perception of increased crime in the country. It works combined with Fact_Pers_*
1 = man; 2 = woman
Age
data(enusc_2023)data(enusc_2023)
Reduced version of the VIII EPF database. Contains some sociodemographic variables and the necessary information to work with complex design.
epf_personasepf_personas
dataframe compuesto por 48.308 observaciones y 8 variables
1 = male; 2 = female
1 = metropolitan area; 2 = rest of the regional capitals
marital status
sample weights
PSU
strata
household expenditure
1 = employed; 0 = non-employed
https://www.ine.cl/estadisticas/sociales/ingresos-y-gastos/encuesta-de-presupuestos-familiares
data(epf_personas)data(epf_personas)
Receive a table created with survey and return the coefficient of variation for each cell
get_cv(table, design, domains, type_est = "all", env = parent.frame())get_cv(table, design, domains, type_est = "all", env = parent.frame())
table |
|
design |
design |
domains |
|
type_est |
type of estimation: all or size. |
env |
parent environment |
dataframe with results including including CV
Receive data and domains. Returns a data frame with the psu (if available), strata and degrees of freedom (df) for each cell
get_df(data, domains, df_type = "eclac")get_df(data, domains, df_type = "eclac")
data |
|
domains |
|
df_type |
|
dataframe with results including degrees of freedom
Generates a table with estimates for a given aggregation
get_survey_table( var, domains, complex_design, estimation = "mean", env = parent.frame(), fun, denom = NULL, type_est = "all" )get_survey_table( var, domains, complex_design, estimation = "mean", env = parent.frame(), fun, denom = NULL, type_est = "all" )
var |
|
domains |
|
complex_design |
design from |
estimation |
|
env |
parent environment |
fun |
function required regarding the estimation |
denom |
denominator. This parameter works for the ratio estimation |
type_est |
type of estimation: all or size |
dataframe containing main results from survey
quadratic returns the output of a particular function created by INE Chile, which
is assessed at the value of the estimated proportion from a sample. If the output of the
function is higher than the standard error, it is interpreted as a signal that the
estimation is not reliable.
quadratic(p)quadratic(p)
p |
numeric vector with the values of the estimations for proportions |
numeric vector
Receive the survey table in raw state and sort it
standardize_columns(data, var, denom)standardize_columns(data, var, denom)
data |
|
var |
|
denom |
denominator |
dataframe with standardized data
Rename design variables, so we can use the later
standardize_design_variables(design)standardize_design_variables(design)
design |
|
design survey