Skip to content

Conversation

@rolandwalker
Copy link
Contributor

Description

Working extract-tables for multi-statement inputs, using an unholy combination of sqlparse and sqlglot.

sqlparse is more lenient and can chop up multiple statements even though it doesn't understand \T. sqlglot is stricter, and chokes when it sees \T, but is better at the simple task of extracting table names from a valid statement, and provides a much better interface.

We could also use sqlglot.tokenize() and split on the semicolon token, which would be more verbose.

Closes #1122, in which the table name was always given as DUAL when multiple statements were in the same input leading with \T sql-insert.

Checklist

  • I've added this contribution to the changelog.md.
  • I've added my name to the AUTHORS file (or it's already there).

@rolandwalker rolandwalker requested a review from amjith April 30, 2025 17:40
@rolandwalker rolandwalker self-assigned this Apr 30, 2025
@rolandwalker
Copy link
Contributor Author

Appears to close #1066 as well.

using an unholy combination of sqlparse and sqlglot.

sqlparse is more lenient and can chop up multiple statements even though
it doesn't understand "\T".  sqlglot is stricter, and chokes when it
sees "\T", but is better at the simple task of extracting table names
from a valid statement, and provides a much better interface.

We could also use sqlglot.tokenize() and split on the semicolon token,
which would be more verbose.

Closes #1122, in which the table name was always given as "DUAL" when
multiple statements were in the same input leading with "\T sql-insert".
@rolandwalker rolandwalker force-pushed the RW/extract-tables-from-multiple-statements branch from 129a8b2 to 1acbfdb Compare April 30, 2025 18:25
Copy link
Member

@amjith amjith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. That is an interesting approach.

@rolandwalker
Copy link
Contributor Author

I think a tree-sitter grammar could be even better at covering all the cases with one tool. In my experience tree-sitter is especially good at best-effort parsing of erroneous or partial data, which makes sense since it was developed for syntax highlighting in an editor.

@rolandwalker rolandwalker merged commit 6c650e7 into main May 1, 2025
6 checks passed
@rolandwalker rolandwalker deleted the RW/extract-tables-from-multiple-statements branch May 1, 2025 12:44
@rolandwalker rolandwalker mentioned this pull request May 3, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

why table name will be DUAL using sql-insert tableformat

3 participants