Skip to content

[BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion #16756

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

#14836

@pepijnve asked in discord:

The project I'm working on uses a parquet-files-on-s3 style approach for its storage. In order to tell DataFusion where all those files are located I'm implementing a catalog that's similar in spirit to duck lake. What I bumped into now is that at the moment there's no support for attaching databases using ATTACH .... CREATE EXTERNAL TABLE... is the closest thing. Has the idea of supporting this been explored already? I couldn't find anything in the issue tracker. Seems like it might be useful for Iceberg, DeltaLake, DuckLake, etc.

I found the duckdb attach and sqlite attach variants in the sqlparser in the meantime. What I had in mind was something a bit more flexible/general purpose. Something closer to CREATE EXTERNAL CATALOG ... STORED AS ... LOCATION ... OPTIONS ... with a extension mechanism.

It turns out there is an example of how to extend the SQL parser (and it is the mechanism that datafusio-cli uses):

https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/sql_dialect.rs

However, it would be nice to make this example more visible / easy to find / understand

Describe the solution you'd like

Write a blog post with a title something like "Implementing your own SQL dialect and SQL statements with DataFusion"

Describe alternatives you've considered

I suggest following the "low key technical evangelism" recipe: teach the readers something about how databases work in general (using DataFusion as an example) and then show how to use DataFusion to do something cool

An outline might be

Introduce the idea at a high level:

  1. You want to add some feature to SQL (maybe use the motivating example from @pepijnve above)
  2. Background: How SQL is normally planned (sql text --> sqlparser AST --> LogicalPlan)
  3. Explain how to insert your own parser / point to the examples
  4. Show relevant excerpts from the example

Additional context

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions