This page contains documentation relevant for those wishing to contribute to the package and specific instructions for how to add support for a new geocoding service.
The two core functions to focus on in the package are geo()
and reverse_geo().
These functions have very similar layouts, but geo()
is for
forward geocoding while reverse_geo()
is for reverse
geocoding. The geocode()
and reverse_geocode()
functions only extract input data from a dataframe and pass it to the
geo()
and reverse_geo()
functions respectively
for geocoding.
Both the geo()
and reverse_geo()
functions
take inputs (either addresses or coordinates) and call other functions
as needed to deduplicate the inputs, pause to comply with API usage rate
policies, and execute queries. Key parameters and settings for geocoding
are stored for easy access and display in built-in datasets.
Consider this query:
library(dplyr)
library(tidygeocoder)
df <- tibble(
id = c(1, 2, 1),
locations = c('tokyo', 'madrid', 'tokyo')
)
df %>%
geocode(address = locations, method = 'osm', full_results = TRUE, verbose = TRUE)
#>
#> Number of Unique Addresses: 2
#> Passing 2 addresses to the Nominatim single address geocoder
#>
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "tokyo"
#> limit : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.5 seconds
#> Total query time (including sleep): 1 seconds
#>
#>
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "madrid"
#> limit : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.2 seconds
#> Total query time (including sleep): 1 seconds
#>
#> Query completed in: 2 seconds
#> # A tibble: 3 × 14
#> id locations lat long place_id licence osm_t…¹ osm_id bound…² displ…³
#> <dbl> <chr> <dbl> <dbl> <int> <chr> <chr> <int> <list> <chr>
#> 1 1 tokyo 35.7 140. 298316729 Data © … relati… 1.54e6 <chr> 東京都…
#> 2 2 madrid 40.4 -3.70 298773150 Data © … relati… 5.33e6 <chr> Madrid…
#> 3 1 tokyo 35.7 140. 298316729 Data © … relati… 1.54e6 <chr> 東京都…
#> # … with 4 more variables: class <chr>, type <chr>, importance <dbl>,
#> # icon <chr>, and abbreviated variable names ¹osm_type, ²boundingbox,
#> # ³display_name
Here is what is going on behind the scenes:
geocode()
function extracts the address data from
the input dataframe and passes it to the geo()
function.geo()
function looks for unique inputs and prepares
them for geocoding. In this case, there is one duplicate input so we
only have two unique inputs.geo()
function must figure out whether to use
single address geocoding (1 address per query) or batch
geocoding (multiple addresses per query). In this case the
specified Nominatim (“osm”) geocoding service does not have a batch
geocoding function so single address geocoding is used.geo()
function is called once for each input to geocode all addresses (twice
in this case) and the results are combined. If batch geocoding was used
then the appropriate batch geocoding function would be called based on
the geocoding service specified.min_time_reference
dataset. This behavior can be modified
with the min_time
argument.unique_only = TRUE
.geo()
to the
geocode()
function. The geocode()
function
then combines the returned data with the original dataset.Refer to the notes below on adding a geocoding service for more specific documentation on the code structure.
This section documents how to add support for a new geocoding service to the package. Required changes are organized by file. If anything isn’t clear, feel free to file an issue.
Base all changes on the main branch.
get_api_url()
function accordingly. If arguments need to be
added to the get_api_url()
function, make sure to adjust
the calls to this function in the geo()
and
reverse_geo()
functions accordingly.method
column (which is how the service
is specified in the geo()
and geocode()
functions). The generic_name
column has the universal
parameter name that is used across geocoding services (ie. “address”,
“limit”, etc.) while the api_name
column stores the
parameter names that are specific to the geocoding service.generic_name
)
unless the parameters are required. Parameters can
always be passed to services directly with the custom_query
argument in geo()
or reverse_geo()
.min_time_reference
with the minimum time
each query should take (in seconds) according to the geocoding service’s
free tier usage restrictions.api_key_reference
if the service requires
an API key.batch_limit_reference
.api_info_reference
with links to the
service’s website, documentation, and usage policy.batch_func_map
named list.get_coord_parameters()
function based on how
the service passed latitude and longitude coordinates for reverse
geocoding.reverse_batch_func_map
named list.extract_results()
function which is used for
parsing single addresses (ie. not batch geocoding). You can see examples
of how I’ve tested out parsing the results of geocoding services here.extract_reverse_results()
function for reverse
geocoding.extract_errors_from_results()
function to
extract error messages for invalid queries.no_query = TRUE
in the geo()
and geocode()
functions).These files don’t necessarily need to be updated. However, you might need to make changes to these files if the service you are implementing requires some non-standard workarounds.
devtools::check()
to make sure the
package still passes all tests and checks, but note that these tests are
designed to work offline so they do not make queries to geocoding
services.devtools::test()
)
because they require API keys which would not exist on all systems and
are dependent on the geocoding services being online at that the time of
the test.To release a new version of tidygeocoder:
pkgdown::build_site()
(make sure to
do this again if you need to make any more updates that show up on the
site)urlchecker::url_check()
devtools::spell_check()
Last, run devtools::release()
to release the new version
once everything looks good.