A new duck in the pond

R
spatial
duckspatial
duckh3
Presenting duckh3 v0.1.0
Author

Adrián Cidre

Published

April 25, 2026

1 Introduction

There’s a new duck in the R pond! The first one I raised was {duckspatial}, and the newest to join him goes by {duckh3}.

What does this one bring? {duckh3} wraps the H3 community extension for DuckDB and follows the same design principles as {duckspatial} — but before diving in, let’s talk about what H3 actually is.

H3 is a hierarchical hexagonal grid system developed by Uber and released as open source in 2018. It divides the entire surface of the Earth into hexagonal cells at multiple resolutions: from coarse (resolution 0, ~4,250 km² per cell) to fine (resolution 15, ~0.9 m² per cell). Each cell is identified by a unique 64-bit integer index, which makes spatial operations extremely fast and storage-efficient.

Figure 1: Screenshot of the H3 grid system (source: h3geo).

Why hexagons? Unlike squares or triangles, hexagons have a unique geometric property: every neighbor is equidistant from the center, which eliminates the diagonal-vs-cardinal distance bias that plagues square grids. This makes them ideal for spatial aggregation, movement modeling, and proximity analysis. The H3 system is particularly useful when you need to:

  • Aggregate point data (e.g., GPS pings, sensor readings) into uniform spatial units

  • Join datasets from different sources without complex polygon overlaps

  • Analyze spatial patterns at multiple scales by moving between resolutions

{duckh3} provides fast, memory-efficient functions for analysing and manipulating large spatial and non-spatial datasets using the H3 hierarchical indexing system in R. It bridges DuckDB’s H3 extension with R’s data and spatial ecosystems — in particular {duckspatial}{dplyr}, and {sf} — so you can leverage DuckDB’s analytical power without leaving your familiar R workflow. You can find the package’s repository here.

Let’s load some packages to introduce this {duckh3}.

Load packages
library(arrow)       # parquet format
library(dplyr)       # data wrangling
library(duckdb)      # interface with duckdb
library(duckh3)      # duckdb h3 extension
library(duckspatial) # duckdb spatial extension
library(mapgl)       # interactive maps
library(sf)          # vector data

2 Naming conventions

All functions follow the ddbh3_*() prefix (DuckDB H3), structured around the expected input data, and what they will be converted to:

  • ddbh3_lonlat_to_*() — from longitude/latitude coordinates to H3 representations
  • ddbh3_points_to_*() — from spatial point geometries to H3 representations
  • ddbh3_h3_to_*() — convert H3 cells to other representations
  • ddbh3_vertex_to_*() — convert H3 vertexes to other representations

With the following available transformations:

Function family Output
*_to_h3() H3 index as string or UBIGINT
*_to_spatial() H3 cell as spatial hexagon polygon
*_to_lon() Longitude of H3 cell centroid
*_to_lat() Latitude of H3 cell centroid

And there are also a set of function to retrieve or check properties of the data:

  • ddbh3_get_*() — retrieve H3 cell properties (resolution, parent, children, vertices…)
  • ddbh3_is_*() — check properties of H3 indexes (valid, pentagon, Class III…)

The functions might be slightly verbose, but we sacrificed that in order to make them very intuitive and descriptive.

3 First steps

There are several options to work with {duckh3}, but the main two options are:

  • Interacting with a DuckDB connection: we won’t explain this, but if you want more details you can explore this post about {duckspatial}, as the framework is the same). There’s a convenient function to create a connection with all the setup (ddbh3_create_conn()).

  • Working entirely in R: the functions of this package work with lazy-tables (i.e. tables that live in a DuckDB database, and they are not materialized into R until explicitly called). This makes the processing of the data more efficient, as the data doesn’t need to be in the R’s memory until all the processing within DuckDB is done. We will focus on this workflow.

So after we have loaded the package into the R session, we need to setup the environment with the ddbh3_default_conn() function. This will:

  • Create a default in-memory DuckDB’s connection that will be used internally by the package.

  • Install and load the spatial extension

  • Install and load the h3 extension

  • Optionally set the maximum number of threads and RAM allowed to use by {duckh3}. By default, DuckDB will use all the available cores and 80% of RAM.

ddbh3_default_conn()

For the moment this will be a mandatory step at the beginning of the script before using any {duckh3} function.

4 Example data

For the examples below, I will use the Burnt Area database from the EFFIS, which is a database that contains data about the wildfires that happened in Europe from 2016 until the current date (April 2026). I have prepared the data previously, and simplified it for the examples. The data contains six columns:

  • year: the year of the wildfire

  • country: the 2-letter ISO code of the country

  • area_ha: the amount of hectares the wildfire burned

  • lon: the geographic longitude of the centroid of the final perimeter

  • lat: the geographic latitude of the centroid of the final perimeter

It’s not perfect, but it will be do for the examples. Let’s start opening the data and exploring it:

## Read data
wildfires_tbl <- read_parquet("wildfires.parquet")
wildfires_tbl
# A tibble: 96,421 × 5
    year country area_ha   lon   lat
   <dbl> <chr>     <dbl> <dbl> <dbl>
 1  2016 AL           67 20.2   39.7
 2  2016 PT        26593 -8.18  40.9
 3  2016 PT           81 -8.06  41.4
 4  2016 TR           72 42.5   37.5
 5  2016 IT          267  9.08  44.4
 6  2016 PT          348 -8.69  39.8
 7  2016 PT          432 -7.74  40.3
 8  2016 PT           71 -8.41  41.3
 9  2016 MA           49 -5.74  35.1
10  2016 PT           99 -7.21  40.7
# ℹ 96,411 more rows

We have a total amount of 96421 wildfires during the 10-year period. Let’s assign an H3 index to each pair of coordinates. For the sake of the exercise, let’s use an H3 resolution of 4, which is about 1,770 km2.

## Add H3 as a new column
wildfires_h3_tbl <- ddbh3_lonlat_to_h3(wildfires_tbl, resolution = 4)
wildfires_h3_tbl
# Source:   table<temp_view_b0b62205_8b66_44e8_900b_ebe3eba801fe> [?? x 6]
# Database: DuckDB 1.5.2 [Cidre@Windows 10 x64:R 4.6.0/:memory:]
    year country area_ha   lon   lat h3string       
   <dbl> <chr>     <dbl> <dbl> <dbl> <chr>          
 1  2016 AL           67 20.2   39.7 841ed2bffffffff
 2  2016 PT        26593 -8.18  40.9 8439227ffffffff
 3  2016 PT           81 -8.06  41.4 8439223ffffffff
 4  2016 TR           72 42.5   37.5 842c335ffffffff
 5  2016 IT          267  9.08  44.4 841f9b1ffffffff
 6  2016 PT          348 -8.69  39.8 8439317ffffffff
 7  2016 PT          432 -7.74  40.3 8439041ffffffff
 8  2016 PT           71 -8.41  41.3 8439221ffffffff
 9  2016 MA           49 -5.74  35.1 8439a9bffffffff
10  2016 PT           99 -7.21  40.7 843904bffffffff
# ℹ more rows

Note a few things:

  • The table is lazy: it was inserted in the default’s DuckDB connection, and it’s living there.

  • A new column with a default’s name h3string was added.

  • If we work with the default names that the package use, we save some writing. For example, the default names for longitude and latitude are lon and lat, so we don’t need to specify those argument. If our columns had a different name we should specify them.

  • The object is not spatial. We just added the h3strings, but we didn’t convert the data to spatial. We have the functions that end in _to_spatial() to do so.

5 Analysis with duckh3

One typical analysis that we can do with the H3 grid system, is to aggregate values by the hexagons. Let’s calculate the total burned area during the 10-year period in the hexagons that we defined:

## Calculate total area by H3
wildfires_agg_tbl <- wildfires_h3_tbl |> 
  summarise(
    area_ha = sum(area_ha, na.rm = TRUE),
    .by = h3string
  )
wildfires_agg_tbl
# Source:   SQL [?? x 2]
# Database: DuckDB 1.5.2 [Cidre@Windows 10 x64:R 4.6.0/:memory:]
   h3string        area_ha
   <chr>             <dbl>
 1 8438283ffffffff   16066
 2 842c361ffffffff    8326
 3 842d1b5ffffffff    2219
 4 8438043ffffffff     627
 5 8439229ffffffff    8097
 6 841ea97ffffffff   17160
 7 8439053ffffffff    2369
 8 843f629ffffffff    1141
 9 843f261ffffffff   12961
10 842d139ffffffff    1902
# ℹ more rows

Great! Now that we have done our analysis, let’s create a map. We need to follow the next steps:

  • Convert the h3string to spatial polygons (hexagons)

  • Materialize the data into R

  • Create a map

Again, using the default name of h3string makes the conversion smooth:

wildfires_agg_ddbs <- ddbh3_h3_to_spatial(wildfires_agg_tbl)
wildfires_agg_ddbs
# A duckspatial lazy spatial table
# ● CRS: EPSG:4326 
# ● Geometry column: geometry 
# ● Geometry type: POLYGON 
# ● Bounding box: xmin: -28.605 ymin: 27.596 xmax: 44.967 ymax: 69.845 
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source:   table<temp_view_329e4a57_3f4e_4c95_a159_557c40841053> [?? x 3]
# Database: DuckDB 1.5.2 [Cidre@Windows 10 x64:R 4.6.0/:memory:]
   h3string        area_ha geometry                                             
   <chr>             <dbl> <wk_wkb>                                             
 1 841ead5ffffffff   14520 <POLYGON ((16.04612 44.94818, 15.94019 44.72155, 16.…
 2 841e969ffffffff   22141 <POLYGON ((8.350532 40.35212, 8.273517 40.1151, 8.49…
 3 843901bffffffff    7642 <POLYGON ((-4.954513 40.40729, -5.238131 40.31514, -…
 4 841ef61ffffffff   86908 <POLYGON ((18.09441 43.38587, 17.98439 43.15731, 18.…
 5 841e5a9ffffffff  193203 <POLYGON ((29.20613 45.33227, 29.0642 45.12055, 29.2…
 6 842d183ffffffff     894 <POLYGON ((38.10032 40.30493, 37.95077 40.09769, 38.…
 7 842c04dffffffff     179 <POLYGON ((42.27047 39.96157, 42.11652 39.76221, 42.…
 8 84185b7ffffffff   28196 <POLYGON ((-6.725514 43.55092, -7.023474 43.45819, -…
 9 841ef21ffffffff   33446 <POLYGON ((18.87152 42.44076, 18.76048 42.21092, 18.…
10 843f60dffffffff    4073 <POLYGON ((27.23108 38.44191, 27.10414 38.21375, 27.…
# ℹ more rows

This step brings the data from DuckDB into R. We use the duckspatial::ddbs_collect() function to collect the data as an sf object:

wildfires_agg_sf <- ddbs_collect(wildfires_agg_ddbs)
wildfires_agg_sf
Simple feature collection with 3099 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -28.60491 ymin: 27.59649 xmax: 44.96658 ymax: 69.84475
Geodetic CRS:  WGS 84
# A tibble: 3,099 × 3
   h3string        area_ha                                              geometry
 * <chr>             <dbl>                                         <POLYGON [°]>
 1 841ead5ffffffff   14520 ((16.04612 44.94818, 15.94019 44.72155, 16.17057 44.…
 2 841e969ffffffff   22141 ((8.350532 40.35212, 8.273517 40.1151, 8.498685 39.9…
 3 843901bffffffff    7642 ((-4.954513 40.40729, -5.238131 40.31514, -5.273724 …
 4 841ef61ffffffff   86908 ((18.09441 43.38587, 17.98439 43.15731, 18.20742 42.…
 5 841e5a9ffffffff  193203 ((29.20613 45.33227, 29.0642 45.12055, 29.26706 44.9…
 6 842d183ffffffff     894 ((38.10032 40.30493, 37.95077 40.09769, 38.11466 39.…
 7 842c04dffffffff     179 ((42.27047 39.96157, 42.11652 39.76221, 42.26505 39.…
 8 84185b7ffffffff   28196 ((-6.725514 43.55092, -7.023474 43.45819, -7.054985 …
 9 841ef21ffffffff   33446 ((18.87152 42.44076, 18.76048 42.21092, 18.97979 42.…
10 843f60dffffffff    4073 ((27.23108 38.44191, 27.10414 38.21375, 27.29672 38.…
# ℹ 3,089 more rows

Finally, we can create a beautiful map using mapgl:

## Define legend breaks and palette
area_brks <- c(0, 1000, 5000, 20000, 50000, 100000, 200000)
pal <- c("#FFEDA0", "#FED976", "#FC4E2A", "#E31A1C", "#BD0026", "#800026", "#3D0010")

## Create a popup
wildfires_agg_sf <- wildfires_agg_sf |> 
  mutate(popup = glue::glue("
    <strong>Total Area: </strong>{wildfires_agg_sf$area_ha} ha<br><strong>
"))

## Draw the map
maplibre(style = carto_style("dark-matter")) |>
  add_fill_layer(
  id = "wildfires",
  source = wildfires_agg_sf,
  fill_color = interpolate(
    column = "area_ha",
    values = area_brks,
    stops = pal,
    na_color = "lightgrey"
  ),
  fill_opacity = 0.8,
  popup = "popup"
 ) |>
  add_legend(
    "Burned Area (ha) (2016-2026)",
    values = area_brks,
    colors = pal,
    width  = "270px",
    style = legend_style(
      background_color = "grey90",
      title_color = "black",
      text_color = "black"
    )
  ) |> 
  fit_bounds(wildfires_agg_sf) |> 
  add_fullscreen_control()

That’s amazing!! There we just saw a couple of functions of the package, but you see the pattern. You have a bunch of more examples in the functions documentation, and hopefully, I will add some vignettes soon to the package.

6 Low-level processing

The package also offers low-level processing. For example, let’s obtain the resolution of the following two H3 strings:

ddbh3_get_resolution(c("8439205fffffff", "841ed6bffffffff"))
[1] 8 4

7 Other H3 packages

Of course, {duckh3} is not the only choice out there to work with the H3 grid system in R. Other popular options are:

  • {h3}: it has a very basic integration with {sf}. It’s simple and lightweight, but it has several limitations when working with spatial data. It’s not in CRAN currently.

  • {h3jsr}: one of the most popular options nowadays with the higgest number of downloads from CRAN. Its integration is via JavaScript and it’s integrated with current’s R ecosystem. It’s great for smaller datasets, but it’s weak when working with millions of observations.

  • {h3r}: implements the H3 grid system using C through the R package {h3lib}. It provides a low-level API without integration with {sf}.

  • {h3o}: a new R package written in Rust, that is very fast, it has a strong integration with the current geospatial ecosystem in R and the {tidyverse}, and it’s very R-user friendly.

  • {duckh3}: it uses DuckDB to speed up the workflows, it has a strong integration with the current geospatial ecosystem in R and the {tidyverse}. It provides processing at the table level, and vector level. It’s complemented with {duckspatial} for spatial data workflows.

8 Supporting duckh3

You can support this project by:

  • Trying the package, and staring the project in GitHub if you find it useful.

  • Buying me a coffee.

9 Acknowledgments

This development of this package wouldn’t be possible without the incredible work of many people.

First and foremost, my deepest thanks to the DuckDB team for building such a powerful and elegant analytical engine. Their commitment to performance, correctness, and developer experience has created a foundation that makes tools like {duckspatial} and {duckh3} possible. This includes the developers and maintainers of the duckdb R package, which makes possible to operate with this awesome tool from R, and it makes possible to create packages such as {duckh3}.

Special recognition goes to the developers of the DuckDB H3 Extension, the engine that powers everything in {duckh3}. Their invaluable work on spatial operations, format support, and performance optimization is what makes this package capable of handling large-scale spatial analysis efficiently.

Most importantly, this package represents a true collaborative effort. Rafael Pereira and Egor Kotov have been fundamental in shaping the design of {duckspatial}, and consequently, the design of {duckh3}.

Finally, thank you to the broader R spatial community for building the ecosystem that {duckspatial} and {duckh3} integrates with, and to everyone who has tested early versions, reported issues, or provided feedback.

10 Session Information

Analyses were conducted using the R Statistical language (version 4.6.0; R Core Team, 2026) on Windows 11 x64 (build 26200), using the packages duckh3 (version 0.1.0; Cidre González A et al., 2026), duckspatial (version 1.0.0.9000; Cidre González A et al., 2026), duckdb (version 1.5.2; Mühleisen H, Raasveldt M, 2026), sf (version 1.1.0; Pebesma E, Bivand R, 2023), DBI (version 1.3.0; R Special Interest Group on Databases, R-SIG-DB), arrow (version 23.0.1.2; Richardson N et al., 2026), mapgl (version 0.4.6; Walker K, 2026) and dplyr (version 1.2.1; Wickham H et al., 2026).

11 References