-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
The project I'm working on uses a parquet-files-on-s3 style approach for its storage. In order to tell DataFusion where all those files are located I'm implementing a catalog that's similar in spirit to duck lake. What I bumped into now is that at the moment there's no support for attaching databases using ATTACH .... CREATE EXTERNAL TABLE... is the closest thing. Has the idea of supporting this been explored already? I couldn't find anything in the issue tracker. Seems like it might be useful for Iceberg, DeltaLake, DuckLake, etc.
I found the duckdb attach and sqlite attach variants in the sqlparser in the meantime. What I had in mind was something a bit more flexible/general purpose. Something closer to CREATE EXTERNAL CATALOG ... STORED AS ... LOCATION ... OPTIONS ... with a extension mechanism.
It turns out there is an example of how to extend the SQL parser (and it is the mechanism that datafusio-cli uses):
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/sql_dialect.rs
However, it would be nice to make this example more visible / easy to find / understand
Describe the solution you'd like
Write a blog post with a title something like "Implementing your own SQL dialect and SQL statements with DataFusion"
Describe alternatives you've considered
I suggest following the "low key technical evangelism" recipe: teach the readers something about how databases work in general (using DataFusion as an example) and then show how to use DataFusion to do something cool
An outline might be
Introduce the idea at a high level:
- You want to add some feature to SQL (maybe use the motivating example from @pepijnve above)
- Background: How SQL is normally planned (sql text --> sqlparser AST --> LogicalPlan)
- Explain how to insert your own parser / point to the examples
- Show relevant excerpts from the example
Additional context
No response