This function helps you convert your region names to region codes acceptable
by
serotrackr. It is usually used when you get an error for
either adm1 or adm2 arguments in st_validate(). In that case, you
can use this function to convert your region names to codes and make sure
they pass the validation by st_validate().
st_locate() uses its first three (or four) arguments to automatically
match your region names with serotrackr's predefined region codes, which
are stored in serotrackr::regions_df. It performs exact case insensitive
matching and also ignores accented letters. It then produces a report of
your region names that could not be matched. You can then use
st_locate()'s ... (ellipsis) argument to define those unmatched region
names.
Usage
st_locate(
data,
adm0,
adm1,
adm2 = NULL,
...,
into = c("adm1", "adm2"),
n_unmatched_printed = 20
)Arguments
- data
A dataframe.
- adm0
A string representing one country (adm0) code. Use
serotrackr::regions$adm0$YourCountryto select it.- adm1, adm2
a string or an unquoted name of a character column that contains the state/province (adm1), or district/municipality (adm2) names or codes. If your study is conducted in only one adm1 or adm2 region, use
serotrackr::regionsto select them.- ...
A sequence of two-sided assignments as in
"region_name" = region_code.region_nameis your unmatched region name.region_codemust be in the form ofregions$adm1$YourCountry$YourStatefor unmatched adm1 regions andregions$adm2$YourCountry$YourState$YourDistrictfor unmatched adm2 regions.st_locate()usesdplyr::case_when()internally for this part.- into
A character vector of length 1 or 2, specifying the name(s) of the new columns that are added to your data for adm1 (and adm2) region codes. If length is one, name of the adm1 column will be assigned. If length is two, names of adm1 and adm2 columns will be assigned, respectively.
- n_unmatched_printed
A single number, indicating the quantity of unmatched unique region names that are printed to the console. This argument is implemented to prevent potentially flooding your console with unmatched region names. The default value is 20. If there are more than 20 unmatched region names in your data, increase this number to see the rest.
Value
A data.frame that consists of the input data.frame plus one (or two)
new columns containing adm1 (and adm2) region codes. You can then use these
two new columns for adm1 and/or adm2 arguments of st_validate().
Examples
st_locate(
data = sample_raw_data,
adm0 = regions$adm0$Canada,
adm1 = state,
adm2 = city,
"Toronoto" = regions$adm2$Canada$Ontario$Toronto,
"Calagry" = regions$adm2$Canada$Alberta$Calgary
)
#> ✔ adm1 region names/codes were successfully matched. `adm1` column was added.
#> ✔ adm2 region names/codes were successfully matched. `adm2` column was added.
#> # A tibble: 100 × 15
#> dataset_id id age_group age sex country state city start_date
#> <int> <int> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 65+ 999 m Canada ontario Toronoto 2023/Jan/01
#> 2 1 2 0-17 8 f Canada ontario London 2023/Jan/01
#> 3 1 3 65+ 67 m Canada Alberta Calagry 2023/Jan/01
#> 4 1 4 NA -999 f Canada ontario toronto 2023/Jan/01
#> 5 1 5 0-17 9 m Canada ontario Toronoto 2023/Jan/01
#> 6 1 6 18-64 43 m Canada Alberta Edmonton 2023/Jan/01
#> 7 1 7 18-64 18 f Canada Alberta Edmonton 2023/Jan/01
#> 8 1 8 65+ 83 f Canada Alberta Calgary 2023/Jan/01
#> 9 1 9 65+ 77 f Canada Alberta Calgary 2023/Jan/01
#> 10 1 10 65+ 77 m Canada Alberta Calagry 2023/Jan/01
#> # ℹ 90 more rows
#> # ℹ 6 more variables: end_date <chr>, test_id <chr>, result <dbl>,
#> # result_cat <chr>, adm1 <chr>, adm2 <chr>