duckspatial v1.0.0
1 Introduction
A couple of months ago, I presented {duckspatial} v0.9.0 (Cidre González, Pereira, and Kotov 2026) as a major step towards the v1.0.0. Today, I am exited to say that the v1.0.0 is finally here!!
{duckspatial} is an R package that brings the power of DuckDB’s spatial extension to R. It provides fast and memory-efficient functions to analyze and manipulate large spatial vector datasets while maintaining full compatibility with R’s spatial ecosystem, especially the {sf} package (Pebesma and Bivand 2023).
The highlights of this version are:
Implementation of a native S3 class named
duckspatial_df.All functions work seamlessly with
sfandduckspatial_dfobjects.Implementing two operating modes:
duckspatialandsf.The geospatial operations are kept within DuckDB, and no data is materialized into R until explicitly collected. That powers the engine of DuckDB and avoids materializing the data unnecessarily.
About 20 new geospatial operations (e.g.
ddbs_perimeter(),ddbs_as_points()).
2 duckspatial_df class
The duckspatial_df class is an S3 class that it’s just a pointer to the data stored in a temporary table in a DuckDB connection. If no connection is provided, the package will use internally a default connection that will be used in the functions that do not have a conn argument defined. We have several ways to define a duckspatial_df class:
ddbs_open_dataset(): this function will open a local file with a valid extension asduckspatial_dfthat is stored as a temporary view.as_duckspatial_df(): transforms other classes into aduckspatial_df. For example, we can transform ansfobject, or a table that is stored in a DuckDB connection.ddbs_as_points(): converts a data frame with X and Y coordinates intoduckspatial_dfpoints.
Here we are reading the file. Note that “reading” actually means to register the data as a temporary view in a default’s DuckDB connection, and open the data lazily. You can see that the print method is very informative and inspired in the {sf} package. It shows the coordinates reference system in the AUTH:CODE format, the name of the geometry colum, the different geometry types that are found in the geometry column, and the bounding box.
## Path to a local file to be reused later
countries_path <- system.file("spatial/countries.geojson", package = "duckspatial")
## Read as duckspatial_df
countries_ddbs <- ddbs_open_dataset(countries_path)
countries_ddbs# A duckspatial lazy spatial table
# ● CRS: EPSG:4326
# ● Geometry column: geom
# ● Geometry type: POLYGON
# ● Bounding box: xmin: -178.91 ymin: -89.9 xmax: 180 ymax: 83.652
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source: table<temp_view_c33c0c39_89ab_479f_b291_278f270067df> [?? x 8]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
OGC_FID CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date geom
<dbl> <chr> <chr> <chr> <chr> <chr> <date> <wk_>
1 0 AR Argentina ARG Argentina AR 2021-01-01 <POL…
2 1 AS American Samoa ASM American… AS 2021-01-01 <POL…
3 2 AT Austria AUT Österrei… AT 2021-01-01 <POL…
4 3 AQ Antarctica ATA Antarcti… AQ 2021-01-01 <POL…
5 4 AD Andorra AND Andorra AD 2021-01-01 <POL…
6 5 AE United Arab Emira… ARE ????????… AE 2021-01-01 <POL…
7 6 AF Afghanistan AFG ????????… AF 2021-01-01 <POL…
8 7 AG Antigua and Barbu… ATG Antigua … AG 2021-01-01 <POL…
9 8 AI Anguilla AIA Anguilla AI 2021-01-01 <POL…
10 9 AL Albania ALB Shqipëria AL 2021-01-01 <POL…
# ℹ more rows
To convert an sf, we simply use as_duckspatial_df():
## Read as sf
countries_sf <- read_sf(countries_path)
## Convert to duckspatial_df
countries_ddbs <- as_duckspatial_df(countries_sf)
class(countries_ddbs)[1] "duckspatial_df" "tbl_duckdb_connection" "tbl_dbi"
[4] "tbl_sql" "tbl_lazy" "tbl"
However, if we already have an sf object, and what we want to do is to apply some geospatial operation with {duckspatial}, we can just apply the geospatial operation, and the output will be by default a duckspatial_df.
## Apply geospatial operation to sf
centroids_ddbs <- ddbs_centroid(countries_sf)
class(centroids_ddbs)[1] "duckspatial_df" "tbl_duckdb_connection" "tbl_dbi"
[4] "tbl_sql" "tbl_lazy" "tbl"
We might have an existing DuckDB table with spatial data in it, and we want to work with it in R. Let’s create a connection imagining this scenario:
## Create a duckdb in-memory connection with spatial extension
conn <- ddbs_create_conn()
## Write data to the database
ddbs_write_table(conn, countries_path, "countries")Now, if we wanted to create a duckspatial_df from that countries table, we can just use as_duckspatial_df() again:
## Open lazily as duckspatial_df
countries_db_ddbs <- as_duckspatial_df("countries", conn)
countries_db_ddbs# A duckspatial lazy spatial table
# ● CRS: EPSG:4326
# ● Geometry column: geom
# ● Geometry type: POLYGON
# ● Bounding box: xmin: -178.91 ymin: -89.9 xmax: 180 ymax: 83.652
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source: table<countries> [?? x 8]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
OGC_FID CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date geom
<dbl> <chr> <chr> <chr> <chr> <chr> <date> <wk_>
1 0 AR Argentina ARG Argentina AR 2021-01-01 <POL…
2 1 AS American Samoa ASM American… AS 2021-01-01 <POL…
3 2 AT Austria AUT Österrei… AT 2021-01-01 <POL…
4 3 AQ Antarctica ATA Antarcti… AQ 2021-01-01 <POL…
5 4 AD Andorra AND Andorra AD 2021-01-01 <POL…
6 5 AE United Arab Emira… ARE ????????… AE 2021-01-01 <POL…
7 6 AF Afghanistan AFG ????????… AF 2021-01-01 <POL…
8 7 AG Antigua and Barbu… ATG Antigua … AG 2021-01-01 <POL…
9 8 AI Anguilla AIA Anguilla AI 2021-01-01 <POL…
10 9 AL Albania ALB Shqipëria AL 2021-01-01 <POL…
# ℹ more rows
The function ddbs_as_points() works in a similar way as st_as_sf() for creating points from a data frame, and that’s the main use case of this function. Let’s see it with an example:
## Create sample data
cities_df <- data.frame(
city = c("Buenos Aires", "Córdoba", "Rosario"),
lon = c(-58.3816, -64.1811, -60.6393),
lat = c(-34.6037, -31.4201, -32.9468),
population = c(3075000, 1391000, 1193605)
)
# Convert to duckspatial_df
cities_ddbs <- ddbs_as_points(cities_df)
## If the columns were named differently, we would need to explicitly specify them
## and the same with the CRS
cities_ddbs <- ddbs_as_points(
cities_df,
coords = c("lon", "lat"),
crs = "EPSG:4326"
)
cities_ddbs# A duckspatial lazy spatial table
# ● CRS: EPSG:4326
# ● Geometry column: geometry
# ● Geometry type: POINT
# ● Bounding box: xmin: -64.181 ymin: -34.604 xmax: -58.382 ymax: -31.42
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source: table<temp_view_50c17e0f_601b_4c09_a7c5_8c656b23f816> [?? x 5]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
city lon lat population geometry
<chr> <dbl> <dbl> <dbl> <wk_wkb>
1 Buenos Aires -58.4 -34.6 3075000 <POINT (-58.3816 -34.6037)>
2 Córdoba -64.2 -31.4 1391000 <POINT (-64.1811 -31.4201)>
3 Rosario -60.6 -32.9 1193605 <POINT (-60.6393 -32.9468)>
3 Operating modes
This version brings two operating modes for the geospatial operations:
mode = 'duckspatial': the output will be typically aduckspatial_df, so the workflow will be kept within DuckDB.mode = 'sf': the output will be the same or similar as thesfpackage would return for the function. For example, the functionddbs_centroid()would return ansfobject, while the functionddbs_area()would return anunitsvector.
We can check what’s the current’s mode with ddbs_sitrep():
Let’s picture it with a couple of examples.
The default mode will be duckspatial, so we don’t need to modify the mode argument:
centroids_ddbs <- ddbs_centroid(countries_ddbs)
class(centroids_ddbs)[1] "duckspatial_df" "tbl_duckdb_connection" "tbl_dbi"
[4] "tbl_sql" "tbl_lazy" "tbl"
If we use mode sf, the output will be an sf object:
centroids_sf <- ddbs_centroid(countries_ddbs, mode = "sf")
class(centroids_sf)[1] "sf" "data.frame"
Using the duckspatial mode, the function will add a new column to the data:
area_ddbs <- ddbs_area(countries_ddbs)
area_ddbs# A duckspatial lazy spatial table
# ● CRS: EPSG:4326
# ● Geometry column: geometry
# ● Geometry type: POLYGON
# ● Bounding box: xmin: -178.91 ymin: -89.9 xmax: 180 ymax: 83.652
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source: table<temp_view_797d224c_0b1f_4c1f_89ff_1ab356847e6d> [?? x 8]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date area geometry
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <wk_wkb>
1 AR Argentina ARG Argentina AR 2021-01-01 2.76e12 <POLYGO…
2 AS American Samoa ASM American… AS 2021-01-01 1.24e 8 <POLYGO…
3 AT Austria AUT Österrei… AT 2021-01-01 8.39e10 <POLYGO…
4 AQ Antarctica ATA Antarcti… AQ 2021-01-01 1.24e13 <POLYGO…
5 AD Andorra AND Andorra AD 2021-01-01 3.83e 8 <POLYGO…
6 AE United Arab Em… ARE ????????… AE 2021-01-01 7.08e10 <POLYGO…
7 AF Afghanistan AFG ????????… AF 2021-01-01 6.42e11 <POLYGO…
8 AG Antigua and Ba… ATG Antigua … AG 2021-01-01 2.59e 8 <POLYGO…
9 AI Anguilla AIA Anguilla AI 2021-01-01 9.45e 7 <POLYGO…
10 AL Albania ALB Shqipëria AL 2021-01-01 2.87e10 <POLYGO…
# ℹ more rows
In this case, the output will be an units vector, which is the expected behaviour if we were using the sf package:
area_sf <- ddbs_area(countries_ddbs, mode = "sf")
area_sf[1:5]Units: [m^2]
[1] 2.759463e+12 1.244150e+08 8.385156e+10 1.238303e+13 3.834823e+08
We can also change the default mode to sf as follows:
ddbs_options(mode = "sf")Now, if we apply a geospatial operation without using the mode argument, the output will be from the sf package:
area_sf <- ddbs_area(countries_ddbs)
area_sf[1:5]Units: [m^2]
[1] 2.759463e+12 1.244150e+08 8.385156e+10 1.238303e+13 3.834823e+08
Let’s get back the duckspatial mode:
ddbs_options(mode = "duckspatial")4 Data materialization
If we operate with the duckspatial mode, no data is loaded into R’s memory, and everything is kept within the DuckDB engine. If we have finished all our operations in DuckDB and we want to materialize our data in R, we can do it with ddbs_collect(), collect() (which mirrors the previous), or st_as_sf(). If we use the first one, we have an as argument that allows to materialize to four different output types:
This is the default’s output type.
ddbs_collect(centroids_ddbs)Simple feature collection with 257 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -178.1035 ymin: -77.08226 xmax: 177.9747 ymax: 78.50949
Geodetic CRS: WGS 84
# A tibble: 257 × 7
CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date
* <chr> <chr> <chr> <chr> <chr> <date>
1 AR Argentina ARG Argentina AR 2021-01-01
2 AS American Samoa ASM American Samoa-S?moa… AS 2021-01-01
3 AT Austria AUT Österreich AT 2021-01-01
4 AQ Antarctica ATA Antarctica AQ 2021-01-01
5 AD Andorra AND Andorra AD 2021-01-01
6 AE United Arab Emirates ARE ???????? ??????? ???… AE 2021-01-01
7 AF Afghanistan AFG ?????????-????????? AF 2021-01-01
8 AG Antigua and Barbuda ATG Antigua and Barbuda AG 2021-01-01
9 AI Anguilla AIA Anguilla AI 2021-01-01
10 AL Albania ALB Shqipëria AL 2021-01-01
# ℹ 247 more rows
# ℹ 1 more variable: geometry <POINT [°]>
This output removes the spatial component.
ddbs_collect(centroids_ddbs, as = "tibble")# A tibble: 257 × 6
CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date
<chr> <chr> <chr> <chr> <chr> <date>
1 AR Argentina ARG Argentina AR 2021-01-01
2 AS American Samoa ASM American Samoa-S?moa… AS 2021-01-01
3 AT Austria AUT Österreich AT 2021-01-01
4 AQ Antarctica ATA Antarctica AQ 2021-01-01
5 AD Andorra AND Andorra AD 2021-01-01
6 AE United Arab Emirates ARE ???????? ??????? ???… AE 2021-01-01
7 AF Afghanistan AFG ?????????-????????? AF 2021-01-01
8 AG Antigua and Barbuda ATG Antigua and Barbuda AG 2021-01-01
9 AI Anguilla AIA Anguilla AI 2021-01-01
10 AL Albania ALB Shqipëria AL 2021-01-01
# ℹ 247 more rows
It keeps the geometry column as a raw WKB list.
ddbs_collect(centroids_ddbs, as = "raw")# A tibble: 257 × 7
CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date geometry
<chr> <chr> <chr> <chr> <chr> <date> <list>
1 AR Argentina ARG Argentina AR 2021-01-01 <raw>
2 AS American Samoa ASM American Sa… AS 2021-01-01 <raw>
3 AT Austria AUT Österreich AT 2021-01-01 <raw>
4 AQ Antarctica ATA Antarctica AQ 2021-01-01 <raw>
5 AD Andorra AND Andorra AD 2021-01-01 <raw>
6 AE United Arab Emirates ARE ???????? ??… AE 2021-01-01 <raw>
7 AF Afghanistan AFG ?????????-?… AF 2021-01-01 <raw>
8 AG Antigua and Barbuda ATG Antigua and… AG 2021-01-01 <raw>
9 AI Anguilla AIA Anguilla AI 2021-01-01 <raw>
10 AL Albania ALB Shqipëria AL 2021-01-01 <raw>
# ℹ 247 more rows
Converts the geometry to a geoarrow vector.
ddbs_collect(centroids_ddbs, as = "geoarrow")# A tibble: 257 × 7
CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date geometry
<chr> <chr> <chr> <chr> <chr> <date> <grrw_v>
1 AR Argentina ARG Argentina AR 2021-01-01 <POINT …
2 AS American Samoa ASM American Sa… AS 2021-01-01 <POINT …
3 AT Austria AUT Österreich AT 2021-01-01 <POINT …
4 AQ Antarctica ATA Antarctica AQ 2021-01-01 <POINT …
5 AD Andorra AND Andorra AD 2021-01-01 <POINT …
6 AE United Arab Emirates ARE ???????? ??… AE 2021-01-01 <POINT …
7 AF Afghanistan AFG ?????????-?… AF 2021-01-01 <POINT …
8 AG Antigua and Barbuda ATG Antigua and… AG 2021-01-01 <POINT …
9 AI Anguilla AIA Anguilla AI 2021-01-01 <POINT …
10 AL Albania ALB Shqipëria AL 2021-01-01 <POINT …
# ℹ 247 more rows
5 Performance Improvements
The output of a ddbs_*() function is kept within DuckDB. So chaining more geospatial operations won’t materialize the data until explicitly collected.
6 crs_column and crs removed
When duckspatial was created, there wasn’t a way of storing CRS metadata of spatial data in DuckDB. Therefore, we overcame this issue by internally creating a column called crs_duckspatial storing the CRS of the data. However, this was not efficient, nor interoperally with other tools. This changed with the DuckDB’s version 1.5.0. Now, the CRS is stored as metadata with the geometry column. Let’s explore the structure of the “countries” table we created earlier:
countries_desc <- dbGetQuery(conn, "DESCRIBE countries;")
countries_desc column_name column_type null key default extra
1 OGC_FID BIGINT YES <NA> <NA> <NA>
2 CNTR_ID VARCHAR YES <NA> <NA> <NA>
3 NAME_ENGL VARCHAR YES <NA> <NA> <NA>
4 ISO3_CODE VARCHAR YES <NA> <NA> <NA>
5 CNTR_NAME VARCHAR YES <NA> <NA> <NA>
6 FID VARCHAR YES <NA> <NA> <NA>
7 date DATE YES <NA> <NA> <NA>
8 geom GEOMETRY('EPSG:4326') YES <NA> <NA> <NA>
As you can see, the CRS is stored with the column type. This allowed us to remote the crs_column and crs arguments from every geospatial operation, and manage the CRS naturally.
7 New Features
This release brings about 20 new geospatial functions, another set of functions to work with duckspatial_df, as well as dplyr methods for this class. You can explore all the new functions in the news of the package.
8 Acknowledgments
This release wouldn’t be possible without the incredible work of many people.
First and foremost, my deepest thanks to the DuckDB team for building such a powerful and elegant analytical engine. Their commitment to performance, correctness, and developer experience has created a foundation that makes tools like {duckspatial} possible. This includes the developers and maintainers of the duckdb R package, which makes possible to operate with this awesome tool from R, and it makes possible to create packages such as {duckspatial}.
Special recognition goes to the developers of the DuckDB Spatial Extension, the engine that powers everything in {duckspatial}. Their unvaluable work on spatial operations, format support, and performance optimization is what makes this package capable of handling large-scale spatial analysis efficiently.
Most importantly, this release represents a true collaborative effort. Rafael Pereira and Egor Kotov have been fundamental in shaping duckspatial’s new design and implementation. Their ideas, code contributions, and thoughtful feedback have shaped this package far beyond what I could have achieved alone. This is a shared accomplishment.
Finally, thank you to the broader R spatial community for building the ecosystem that {duckspatial} integrates with, and to everyone who has tested early versions, reported issues, or provided feedback.