This page contains documentation relevant for those wishing to contribute to the package and specific instructions for how to add support for a new geocoding service.
The two core functions to focus on in the package are geo() and reverse_geo(). These functions have very similar layouts, but
geo() is for forward geocoding while
reverse_geo() is for reverse geocoding. The
reverse_geocode() functions only extract input data from a dataframe and pass it to the
reverse_geo() functions respectively for geocoding.
reverse_geo() functions take inputs (either addresses or coordinates) and call other functions as needed to deduplicate the inputs, pause to comply with API usage rate policies, and execute queries. Key parameters and settings for geocoding are stored for easy access and display in built-in datasets.
Consider this query:
library(dplyr) library(tidygeocoder) df <- tibble( id = c(1, 2, 1), locations = c('tokyo', 'madrid', 'tokyo') ) df %>% geocode(address = locations, method = 'osm', full_results = TRUE, verbose = TRUE) #> #> Number of Unique Addresses: 2 #> Passing 2 addresses to the Nominatim single address geocoder #> #> Number of Unique Addresses: 1 #> Querying API URL: https://nominatim.openstreetmap.org/search #> Passing the following parameters to the API: #> q : "tokyo" #> limit : "1" #> format : "json" #> HTTP Status Code: 200 #> Query completed in: 0.3 seconds #> Total query time (including sleep): 1 seconds #> #> #> Number of Unique Addresses: 1 #> Querying API URL: https://nominatim.openstreetmap.org/search #> Passing the following parameters to the API: #> q : "madrid" #> limit : "1" #> format : "json" #> HTTP Status Code: 200 #> Query completed in: 0.2 seconds #> Total query time (including sleep): 1 seconds #> #> Query completed in: 2 seconds #> # A tibble: 3 × 14 #> id locations lat long place_id licence osm_type osm_id boundingbox #> <dbl> <chr> <dbl> <dbl> <int> <chr> <chr> <int> <list> #> 1 1 tokyo 35.7 140. 282632558 Data © Ope… relation 1.54e6 <chr > #> 2 2 madrid 40.4 -3.70 282999935 Data © Ope… relation 5.33e6 <chr > #> 3 1 tokyo 35.7 140. 282632558 Data © Ope… relation 1.54e6 <chr > #> # … with 5 more variables: display_name <chr>, class <chr>, type <chr>, #> # importance <dbl>, icon <chr>
Here is what is going on behind the scenes:
geocode()function extracts the address data from the input dataframe and passes it to the
geo()function looks for unique inputs and prepares them for geocoding. In this case, there is one duplicate input so we only have two unique inputs.
geo()function must figure out whether to use single address geocoding (1 address per query) or batch geocoding (multiple addresses per query). In this case the specified Nominatim (“osm”) geocoding service does not have a batch geocoding function so single address geocoding is used.
geo()function is called once for each input to geocode all addresses (twice in this case) and the results are combined. If batch geocoding was used then the appropriate batch geocoding function would be called based on the geocoding service specified.
min_time_referencedataset. This behavior can be modified with the
unique_only = TRUE.
geocode()function then combines the returned data with the original dataset.
Refer to the notes below on adding a geocoding service for more specific documentation on the code structure.
This section documents how to add support for a new geocoding service to the package. Required changes are organized by file. If anything isn’t clear, feel free to file an issue.
Base all changes on the main branch.
methodcolumn (which is how the service is specified in the
generic_namecolumn has the universal parameter name that is used across geocoding services (ie. “address”, “limit”, etc.) while the
api_namecolumn stores the parameter names that are specific to the geocoding service.
generic_name) unless the parameters are required. Parameters can always be passed to services directly with the
min_time_referencewith the minimum time each query should take (in seconds) according to the geocoding service’s free tier usage restrictions.
api_key_referenceif the service requires an API key.
api_info_referencewith links to the service’s website, documentation, and usage policy.
get_coord_parameters()function based on how the service passed latitude and longitude coordinates for reverse geocoding.
extract_results()function which is used for parsing single addresses (ie. not batch geocoding). You can see examples of how I’ve tested out parsing the results of geocoding services here.
extract_reverse_results()function for reverse geocoding.
extract_errors_from_results()function to extract error messages for invalid queries.
no_query = TRUEin the
These files don’t necessarily need to be updated. However, you might need to make changes to these files if the service you are implementing requires some non-standard workarounds.
devtools::check()to make sure the package still passes all tests and checks, but note that these tests are designed to work offline so they do not make queries to geocoding services.
devtools::test()) because they require API keys which would not exist on all systems and are dependent on the geocoding services being online at that the time of the test.
To release a new version of tidygeocoder:
urlchecker::url_check()to check all package URLs
devtools::release() to release the new version