duckspatial v1.0.0

R
spatial
duckspatial
New duckspatial API, and presentation of duckspatial_df
Author

Adrián Cidre

Published

March 30, 2026

1 Introduction

A couple of months ago, I presented {duckspatial} v0.9.0 (Cidre González, Pereira, and Kotov 2026) as a major step towards the v1.0.0. Today, I am exited to say that the v1.0.0 is finally here!!

{duckspatial} is an R package that brings the power of DuckDB’s spatial extension to R. It provides fast and memory-efficient functions to analyze and manipulate large spatial vector datasets while maintaining full compatibility with R’s spatial ecosystem, especially the {sf} package (Pebesma and Bivand 2023).

The highlights of this version are:

  • Implementation of a native S3 class named duckspatial_df.

  • All functions work seamlessly with sf and duckspatial_df objects.

  • Implementing two operating modes: duckspatial and sf.

  • The geospatial operations are kept within DuckDB, and no data is materialized into R until explicitly collected. That powers the engine of DuckDB and avoids materializing the data unnecessarily.

  • About 20 new geospatial operations (e.g. ddbs_perimeter(), ddbs_as_points()).

2 duckspatial_df class

The duckspatial_df class is an S3 class that it’s just a pointer to the data stored in a temporary table in a DuckDB connection. If no connection is provided, the package will use internally a default connection that will be used in the functions that do not have a conn argument defined. We have several ways to define a duckspatial_df class:

  • ddbs_open_dataset(): this function will open a local file with a valid extension as duckspatial_df that is stored as a temporary view.

  • as_duckspatial_df(): transforms other classes into a duckspatial_df. For example, we can transform an sf object, or a table that is stored in a DuckDB connection.

  • ddbs_as_points(): converts a data frame with X and Y coordinates into duckspatial_df points.

Here we are reading the file. Note that “reading” actually means to register the data as a temporary view in a default’s DuckDB connection, and open the data lazily. You can see that the print method is very informative and inspired in the {sf} package. It shows the coordinates reference system in the AUTH:CODE format, the name of the geometry colum, the different geometry types that are found in the geometry column, and the bounding box.

## Path to a local file to be reused later
countries_path <- system.file("spatial/countries.geojson", package = "duckspatial")

## Read as duckspatial_df
countries_ddbs <- ddbs_open_dataset(countries_path)

countries_ddbs
# A duckspatial lazy spatial table
# ● CRS: EPSG:4326 
# ● Geometry column: geom 
# ● Geometry type: POLYGON 
# ● Bounding box: xmin: -178.91 ymin: -89.9 xmax: 180 ymax: 83.652 
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source:   table<temp_view_c33c0c39_89ab_479f_b291_278f270067df> [?? x 8]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
   OGC_FID CNTR_ID NAME_ENGL          ISO3_CODE CNTR_NAME FID   date       geom 
     <dbl> <chr>   <chr>              <chr>     <chr>     <chr> <date>     <wk_>
 1       0 AR      Argentina          ARG       Argentina AR    2021-01-01 <POL…
 2       1 AS      American Samoa     ASM       American… AS    2021-01-01 <POL…
 3       2 AT      Austria            AUT       Österrei… AT    2021-01-01 <POL…
 4       3 AQ      Antarctica         ATA       Antarcti… AQ    2021-01-01 <POL…
 5       4 AD      Andorra            AND       Andorra   AD    2021-01-01 <POL…
 6       5 AE      United Arab Emira… ARE       ????????… AE    2021-01-01 <POL…
 7       6 AF      Afghanistan        AFG       ????????… AF    2021-01-01 <POL…
 8       7 AG      Antigua and Barbu… ATG       Antigua … AG    2021-01-01 <POL…
 9       8 AI      Anguilla           AIA       Anguilla  AI    2021-01-01 <POL…
10       9 AL      Albania            ALB       Shqipëria AL    2021-01-01 <POL…
# ℹ more rows

To convert an sf, we simply use as_duckspatial_df():

## Read as sf
countries_sf <- read_sf(countries_path)

## Convert to duckspatial_df
countries_ddbs <- as_duckspatial_df(countries_sf)

class(countries_ddbs)
[1] "duckspatial_df"        "tbl_duckdb_connection" "tbl_dbi"              
[4] "tbl_sql"               "tbl_lazy"              "tbl"                  

However, if we already have an sf object, and what we want to do is to apply some geospatial operation with {duckspatial}, we can just apply the geospatial operation, and the output will be by default a duckspatial_df.

## Apply geospatial operation to sf
centroids_ddbs <- ddbs_centroid(countries_sf)

class(centroids_ddbs)
[1] "duckspatial_df"        "tbl_duckdb_connection" "tbl_dbi"              
[4] "tbl_sql"               "tbl_lazy"              "tbl"                  

We might have an existing DuckDB table with spatial data in it, and we want to work with it in R. Let’s create a connection imagining this scenario:

## Create a duckdb in-memory connection with spatial extension
conn <- ddbs_create_conn()

## Write data to the database
ddbs_write_table(conn, countries_path, "countries")

Now, if we wanted to create a duckspatial_df from that countries table, we can just use as_duckspatial_df() again:

## Open lazily as duckspatial_df
countries_db_ddbs <- as_duckspatial_df("countries", conn)

countries_db_ddbs
# A duckspatial lazy spatial table
# ● CRS: EPSG:4326 
# ● Geometry column: geom 
# ● Geometry type: POLYGON 
# ● Bounding box: xmin: -178.91 ymin: -89.9 xmax: 180 ymax: 83.652 
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source:   table<countries> [?? x 8]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
   OGC_FID CNTR_ID NAME_ENGL          ISO3_CODE CNTR_NAME FID   date       geom 
     <dbl> <chr>   <chr>              <chr>     <chr>     <chr> <date>     <wk_>
 1       0 AR      Argentina          ARG       Argentina AR    2021-01-01 <POL…
 2       1 AS      American Samoa     ASM       American… AS    2021-01-01 <POL…
 3       2 AT      Austria            AUT       Österrei… AT    2021-01-01 <POL…
 4       3 AQ      Antarctica         ATA       Antarcti… AQ    2021-01-01 <POL…
 5       4 AD      Andorra            AND       Andorra   AD    2021-01-01 <POL…
 6       5 AE      United Arab Emira… ARE       ????????… AE    2021-01-01 <POL…
 7       6 AF      Afghanistan        AFG       ????????… AF    2021-01-01 <POL…
 8       7 AG      Antigua and Barbu… ATG       Antigua … AG    2021-01-01 <POL…
 9       8 AI      Anguilla           AIA       Anguilla  AI    2021-01-01 <POL…
10       9 AL      Albania            ALB       Shqipëria AL    2021-01-01 <POL…
# ℹ more rows

The function ddbs_as_points() works in a similar way as st_as_sf() for creating points from a data frame, and that’s the main use case of this function. Let’s see it with an example:

## Create sample data
cities_df <- data.frame(
  city = c("Buenos Aires", "Córdoba", "Rosario"),
  lon = c(-58.3816, -64.1811, -60.6393),
  lat = c(-34.6037, -31.4201, -32.9468),
  population = c(3075000, 1391000, 1193605)
)

# Convert to duckspatial_df
cities_ddbs <- ddbs_as_points(cities_df)

## If the columns were named differently, we would need to explicitly specify them
## and the same with the CRS
cities_ddbs <- ddbs_as_points(
  cities_df, 
  coords = c("lon", "lat"),
  crs = "EPSG:4326"
)

cities_ddbs
# A duckspatial lazy spatial table
# ● CRS: EPSG:4326 
# ● Geometry column: geometry 
# ● Geometry type: POINT 
# ● Bounding box: xmin: -64.181 ymin: -34.604 xmax: -58.382 ymax: -31.42 
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source:   table<temp_view_50c17e0f_601b_4c09_a7c5_8c656b23f816> [?? x 5]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
  city           lon   lat population geometry                   
  <chr>        <dbl> <dbl>      <dbl> <wk_wkb>                   
1 Buenos Aires -58.4 -34.6    3075000 <POINT (-58.3816 -34.6037)>
2 Córdoba      -64.2 -31.4    1391000 <POINT (-64.1811 -31.4201)>
3 Rosario      -60.6 -32.9    1193605 <POINT (-60.6393 -32.9468)>

3 Operating modes

This version brings two operating modes for the geospatial operations:

  • mode = 'duckspatial': the output will be typically a duckspatial_df, so the workflow will be kept within DuckDB.

  • mode = 'sf': the output will be the same or similar as the sf package would return for the function. For example, the function ddbs_centroid() would return an sf object, while the function ddbs_area() would return an units vector.

We can check what’s the current’s mode with ddbs_sitrep():

Let’s picture it with a couple of examples.

The default mode will be duckspatial, so we don’t need to modify the mode argument:

centroids_ddbs <- ddbs_centroid(countries_ddbs)
class(centroids_ddbs)
[1] "duckspatial_df"        "tbl_duckdb_connection" "tbl_dbi"              
[4] "tbl_sql"               "tbl_lazy"              "tbl"                  

If we use mode sf, the output will be an sf object:

centroids_sf <- ddbs_centroid(countries_ddbs, mode = "sf")
class(centroids_sf)
[1] "sf"         "data.frame"

Using the duckspatial mode, the function will add a new column to the data:

area_ddbs <- ddbs_area(countries_ddbs)
area_ddbs
# A duckspatial lazy spatial table
# ● CRS: EPSG:4326 
# ● Geometry column: geometry 
# ● Geometry type: POLYGON 
# ● Bounding box: xmin: -178.91 ymin: -89.9 xmax: 180 ymax: 83.652 
# Data backed by DuckDB (dbplyr lazy evaluation)
# Use ddbs_collect() or st_as_sf() to materialize to sf
#
# Source:   table<temp_view_797d224c_0b1f_4c1f_89ff_1ab356847e6d> [?? x 8]
# Database: DuckDB 1.5.1 [Cidre@Windows 10 x64:R 4.5.2/:memory:]
   CNTR_ID NAME_ENGL       ISO3_CODE CNTR_NAME FID   date          area geometry
   <chr>   <chr>           <chr>     <chr>     <chr> <date>       <dbl> <wk_wkb>
 1 AR      Argentina       ARG       Argentina AR    2021-01-01 2.76e12 <POLYGO…
 2 AS      American Samoa  ASM       American… AS    2021-01-01 1.24e 8 <POLYGO…
 3 AT      Austria         AUT       Österrei… AT    2021-01-01 8.39e10 <POLYGO…
 4 AQ      Antarctica      ATA       Antarcti… AQ    2021-01-01 1.24e13 <POLYGO…
 5 AD      Andorra         AND       Andorra   AD    2021-01-01 3.83e 8 <POLYGO…
 6 AE      United Arab Em… ARE       ????????… AE    2021-01-01 7.08e10 <POLYGO…
 7 AF      Afghanistan     AFG       ????????… AF    2021-01-01 6.42e11 <POLYGO…
 8 AG      Antigua and Ba… ATG       Antigua … AG    2021-01-01 2.59e 8 <POLYGO…
 9 AI      Anguilla        AIA       Anguilla  AI    2021-01-01 9.45e 7 <POLYGO…
10 AL      Albania         ALB       Shqipëria AL    2021-01-01 2.87e10 <POLYGO…
# ℹ more rows

In this case, the output will be an units vector, which is the expected behaviour if we were using the sf package:

area_sf <- ddbs_area(countries_ddbs, mode = "sf")
area_sf[1:5]
Units: [m^2]
[1] 2.759463e+12 1.244150e+08 8.385156e+10 1.238303e+13 3.834823e+08

We can also change the default mode to sf as follows:

ddbs_options(mode = "sf")

Now, if we apply a geospatial operation without using the mode argument, the output will be from the sf package:

area_sf <- ddbs_area(countries_ddbs)
area_sf[1:5]
Units: [m^2]
[1] 2.759463e+12 1.244150e+08 8.385156e+10 1.238303e+13 3.834823e+08

Let’s get back the duckspatial mode:

ddbs_options(mode = "duckspatial")

4 Data materialization

If we operate with the duckspatial mode, no data is loaded into R’s memory, and everything is kept within the DuckDB engine. If we have finished all our operations in DuckDB and we want to materialize our data in R, we can do it with ddbs_collect(), collect() (which mirrors the previous), or st_as_sf(). If we use the first one, we have an as argument that allows to materialize to four different output types:

This is the default’s output type.

ddbs_collect(centroids_ddbs)
Simple feature collection with 257 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -178.1035 ymin: -77.08226 xmax: 177.9747 ymax: 78.50949
Geodetic CRS:  WGS 84
# A tibble: 257 × 7
   CNTR_ID NAME_ENGL            ISO3_CODE CNTR_NAME             FID   date      
 * <chr>   <chr>                <chr>     <chr>                 <chr> <date>    
 1 AR      Argentina            ARG       Argentina             AR    2021-01-01
 2 AS      American Samoa       ASM       American Samoa-S?moa… AS    2021-01-01
 3 AT      Austria              AUT       Österreich            AT    2021-01-01
 4 AQ      Antarctica           ATA       Antarctica            AQ    2021-01-01
 5 AD      Andorra              AND       Andorra               AD    2021-01-01
 6 AE      United Arab Emirates ARE       ???????? ??????? ???… AE    2021-01-01
 7 AF      Afghanistan          AFG       ?????????-?????????   AF    2021-01-01
 8 AG      Antigua and Barbuda  ATG       Antigua and Barbuda   AG    2021-01-01
 9 AI      Anguilla             AIA       Anguilla              AI    2021-01-01
10 AL      Albania              ALB       Shqipëria             AL    2021-01-01
# ℹ 247 more rows
# ℹ 1 more variable: geometry <POINT [°]>

This output removes the spatial component.

ddbs_collect(centroids_ddbs, as = "tibble")
# A tibble: 257 × 6
   CNTR_ID NAME_ENGL            ISO3_CODE CNTR_NAME             FID   date      
   <chr>   <chr>                <chr>     <chr>                 <chr> <date>    
 1 AR      Argentina            ARG       Argentina             AR    2021-01-01
 2 AS      American Samoa       ASM       American Samoa-S?moa… AS    2021-01-01
 3 AT      Austria              AUT       Österreich            AT    2021-01-01
 4 AQ      Antarctica           ATA       Antarctica            AQ    2021-01-01
 5 AD      Andorra              AND       Andorra               AD    2021-01-01
 6 AE      United Arab Emirates ARE       ???????? ??????? ???… AE    2021-01-01
 7 AF      Afghanistan          AFG       ?????????-?????????   AF    2021-01-01
 8 AG      Antigua and Barbuda  ATG       Antigua and Barbuda   AG    2021-01-01
 9 AI      Anguilla             AIA       Anguilla              AI    2021-01-01
10 AL      Albania              ALB       Shqipëria             AL    2021-01-01
# ℹ 247 more rows

It keeps the geometry column as a raw WKB list.

ddbs_collect(centroids_ddbs, as = "raw")
# A tibble: 257 × 7
   CNTR_ID NAME_ENGL            ISO3_CODE CNTR_NAME    FID   date       geometry
   <chr>   <chr>                <chr>     <chr>        <chr> <date>     <list>  
 1 AR      Argentina            ARG       Argentina    AR    2021-01-01 <raw>   
 2 AS      American Samoa       ASM       American Sa… AS    2021-01-01 <raw>   
 3 AT      Austria              AUT       Österreich   AT    2021-01-01 <raw>   
 4 AQ      Antarctica           ATA       Antarctica   AQ    2021-01-01 <raw>   
 5 AD      Andorra              AND       Andorra      AD    2021-01-01 <raw>   
 6 AE      United Arab Emirates ARE       ???????? ??… AE    2021-01-01 <raw>   
 7 AF      Afghanistan          AFG       ?????????-?… AF    2021-01-01 <raw>   
 8 AG      Antigua and Barbuda  ATG       Antigua and… AG    2021-01-01 <raw>   
 9 AI      Anguilla             AIA       Anguilla     AI    2021-01-01 <raw>   
10 AL      Albania              ALB       Shqipëria    AL    2021-01-01 <raw>   
# ℹ 247 more rows

Converts the geometry to a geoarrow vector.

ddbs_collect(centroids_ddbs, as = "geoarrow")
# A tibble: 257 × 7
   CNTR_ID NAME_ENGL            ISO3_CODE CNTR_NAME    FID   date       geometry
   <chr>   <chr>                <chr>     <chr>        <chr> <date>     <grrw_v>
 1 AR      Argentina            ARG       Argentina    AR    2021-01-01 <POINT …
 2 AS      American Samoa       ASM       American Sa… AS    2021-01-01 <POINT …
 3 AT      Austria              AUT       Österreich   AT    2021-01-01 <POINT …
 4 AQ      Antarctica           ATA       Antarctica   AQ    2021-01-01 <POINT …
 5 AD      Andorra              AND       Andorra      AD    2021-01-01 <POINT …
 6 AE      United Arab Emirates ARE       ???????? ??… AE    2021-01-01 <POINT …
 7 AF      Afghanistan          AFG       ?????????-?… AF    2021-01-01 <POINT …
 8 AG      Antigua and Barbuda  ATG       Antigua and… AG    2021-01-01 <POINT …
 9 AI      Anguilla             AIA       Anguilla     AI    2021-01-01 <POINT …
10 AL      Albania              ALB       Shqipëria    AL    2021-01-01 <POINT …
# ℹ 247 more rows

5 Performance Improvements

The output of a ddbs_*() function is kept within DuckDB. So chaining more geospatial operations won’t materialize the data until explicitly collected.

6 crs_column and crs removed

When duckspatial was created, there wasn’t a way of storing CRS metadata of spatial data in DuckDB. Therefore, we overcame this issue by internally creating a column called crs_duckspatial storing the CRS of the data. However, this was not efficient, nor interoperally with other tools. This changed with the DuckDB’s version 1.5.0. Now, the CRS is stored as metadata with the geometry column. Let’s explore the structure of the “countries” table we created earlier:

countries_desc <- dbGetQuery(conn, "DESCRIBE countries;")
countries_desc
  column_name           column_type null  key default extra
1     OGC_FID                BIGINT  YES <NA>    <NA>  <NA>
2     CNTR_ID               VARCHAR  YES <NA>    <NA>  <NA>
3   NAME_ENGL               VARCHAR  YES <NA>    <NA>  <NA>
4   ISO3_CODE               VARCHAR  YES <NA>    <NA>  <NA>
5   CNTR_NAME               VARCHAR  YES <NA>    <NA>  <NA>
6         FID               VARCHAR  YES <NA>    <NA>  <NA>
7        date                  DATE  YES <NA>    <NA>  <NA>
8        geom GEOMETRY('EPSG:4326')  YES <NA>    <NA>  <NA>

As you can see, the CRS is stored with the column type. This allowed us to remote the crs_column and crs arguments from every geospatial operation, and manage the CRS naturally.

7 New Features

This release brings about 20 new geospatial functions, another set of functions to work with duckspatial_df, as well as dplyr methods for this class. You can explore all the new functions in the news of the package.

8 Acknowledgments

This release wouldn’t be possible without the incredible work of many people.

First and foremost, my deepest thanks to the DuckDB team for building such a powerful and elegant analytical engine. Their commitment to performance, correctness, and developer experience has created a foundation that makes tools like {duckspatial} possible. This includes the developers and maintainers of the duckdb R package, which makes possible to operate with this awesome tool from R, and it makes possible to create packages such as {duckspatial}.

Special recognition goes to the developers of the DuckDB Spatial Extension, the engine that powers everything in {duckspatial}. Their unvaluable work on spatial operations, format support, and performance optimization is what makes this package capable of handling large-scale spatial analysis efficiently.

Most importantly, this release represents a true collaborative effort. Rafael Pereira and Egor Kotov have been fundamental in shaping duckspatial’s new design and implementation. Their ideas, code contributions, and thoughtful feedback have shaped this package far beyond what I could have achieved alone. This is a shared accomplishment.

Finally, thank you to the broader R spatial community for building the ecosystem that {duckspatial} integrates with, and to everyone who has tested early versions, reported issues, or provided feedback.

9 References

Cidre González, Adrián, Rafael H. M. Pereira, and Egor Kotov. 2026. “Duckspatial: R Interface to ’DuckDB’ Database with Spatial Extension.” https://github.com/Cidree/duckspatial.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in r.” https://doi.org/10.1201/9780429459016.