Skip to content

Commit edad903

Browse files
HaoXuAIhao-xu5
andcommitted
feat: Support table format: Iceberg, Delta, and Hudi (#5650)
* add support for table format such as Iceberg, Delta, Hudi etc. Signed-off-by: HaoXuAI <[email protected]> * linting Signed-off-by: HaoXuAI <[email protected]> * linting Signed-off-by: HaoXuAI <[email protected]> * add tests Signed-off-by: HaoXuAI <[email protected]> * fix tests Signed-off-by: HaoXuAI <[email protected]> * fix tests Signed-off-by: HaoXuAI <[email protected]> * linting Signed-off-by: HaoXuAI <[email protected]> * add tableformat proto Signed-off-by: hao-xu5 <[email protected]> * update Signed-off-by: hao-xu5 <[email protected]> * update doc Signed-off-by: hao-xu5 <[email protected]> * fix linting Signed-off-by: hao-xu5 <[email protected]> * fix test Signed-off-by: hao-xu5 <[email protected]> --------- Signed-off-by: HaoXuAI <[email protected]> Signed-off-by: hao-xu5 <[email protected]> Co-authored-by: hao-xu5 <[email protected]>
1 parent ce4490c commit edad903

File tree

14 files changed

+1960
-57
lines changed

14 files changed

+1960
-57
lines changed

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@
8282
* [Type System](reference/type-system.md)
8383
* [Data sources](reference/data-sources/README.md)
8484
* [Overview](reference/data-sources/overview.md)
85+
* [Table formats](reference/data-sources/table-formats.md)
8586
* [File](reference/data-sources/file.md)
8687
* [Snowflake](reference/data-sources/snowflake.md)
8788
* [BigQuery](reference/data-sources/bigquery.md)

docs/reference/data-sources/spark.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,17 @@
44

55
Spark data sources are tables or files that can be loaded from some Spark store (e.g. Hive or in-memory). They can also be specified by a SQL query.
66

7+
**New in Feast:** SparkSource now supports advanced table formats including **Apache Iceberg**, **Delta Lake**, and **Apache Hudi**, enabling ACID transactions, time travel, and schema evolution capabilities. See the [Table Formats guide](table-formats.md) for detailed documentation.
8+
79
## Disclaimer
810

911
The Spark data source does not achieve full test coverage.
1012
Please do not assume complete stability.
1113

1214
## Examples
1315

16+
### Basic Examples
17+
1418
Using a table reference from SparkSession (for example, either in-memory or a Hive Metastore):
1519

1620
```python
@@ -51,8 +55,77 @@ my_spark_source = SparkSource(
5155
)
5256
```
5357

58+
### Table Format Examples
59+
60+
SparkSource supports advanced table formats for modern data lakehouse architectures. For detailed documentation, configuration options, and best practices, see the **[Table Formats guide](table-formats.md)**.
61+
62+
#### Apache Iceberg
63+
64+
```python
65+
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource
66+
from feast.table_format import IcebergFormat
67+
68+
iceberg_format = IcebergFormat(
69+
catalog="my_catalog",
70+
namespace="my_database"
71+
)
72+
73+
my_spark_source = SparkSource(
74+
name="user_features",
75+
path="my_catalog.my_database.user_table",
76+
table_format=iceberg_format,
77+
timestamp_field="event_timestamp"
78+
)
79+
```
80+
81+
#### Delta Lake
82+
83+
```python
84+
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource
85+
from feast.table_format import DeltaFormat
86+
87+
delta_format = DeltaFormat()
88+
89+
my_spark_source = SparkSource(
90+
name="transaction_features",
91+
path="s3://my-bucket/delta-tables/transactions",
92+
table_format=delta_format,
93+
timestamp_field="transaction_timestamp"
94+
)
95+
```
96+
97+
#### Apache Hudi
98+
99+
```python
100+
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource
101+
from feast.table_format import HudiFormat
102+
103+
hudi_format = HudiFormat(
104+
table_type="COPY_ON_WRITE",
105+
record_key="user_id",
106+
precombine_field="updated_at"
107+
)
108+
109+
my_spark_source = SparkSource(
110+
name="user_profiles",
111+
path="s3://my-bucket/hudi-tables/user_profiles",
112+
table_format=hudi_format,
113+
timestamp_field="event_timestamp"
114+
)
115+
```
116+
117+
For advanced configuration including time travel, incremental queries, and performance tuning, see the **[Table Formats guide](table-formats.md)**.
118+
119+
## Configuration Options
120+
54121
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.spark_offline_store.spark_source.SparkSource).
55122

123+
### Table Format Options
124+
125+
- **IcebergFormat**: See [Table Formats - Iceberg](table-formats.md#apache-iceberg)
126+
- **DeltaFormat**: See [Table Formats - Delta Lake](table-formats.md#delta-lake)
127+
- **HudiFormat**: See [Table Formats - Hudi](table-formats.md#apache-hudi)
128+
56129
## Supported Types
57130

58131
Spark data sources support all eight primitive types and their corresponding array types.

0 commit comments

Comments
 (0)