Commit 47752e0
ARROW-11268: [Rust][DataFusion] MemTable::load output partition support
I think the feature to be able to repartition an in memory table is useful, as the repartitioning only needs to be applied once, and repartition itself is cheap (at the same node). Doing this when loading data is very useful for in-memory analytics as we can benefit from mutliple cores after loading the data.
The speed up from repartitioning is very big (mainly on aggregates), on my (8-core machine): ~5-7x on query 1 and 12 versus a single partition, and a smaller (~30%) difference for query 5 when using 16 partition. q1/q12 also have very high cpu utilization.
@jorgecarleitao maybe this is of interest to you, as you mentioned you are looking into multi-threading. I think this would be a "high level" way to get more parallelism, also in the logical plan. I think in some optimizer rules and/or dynamically we can do repartitions, similar to what's described here https://issues.apache.org/jira/browse/ARROW-9464
Benchmarks after repartitioning (16 partitions):
PR (16 partitions)
```
Query 12 iteration 0 took 33.9 ms
Query 12 iteration 1 took 34.3 ms
Query 12 iteration 2 took 36.9 ms
Query 12 iteration 3 took 33.6 ms
Query 12 iteration 4 took 35.1 ms
Query 12 iteration 5 took 38.8 ms
Query 12 iteration 6 took 35.8 ms
Query 12 iteration 7 took 34.4 ms
Query 12 iteration 8 took 34.2 ms
Query 12 iteration 9 took 35.3 ms
Query 12 avg time: 35.24 ms
```
Master (1 partition):
```
Query 12 iteration 0 took 245.6 ms
Query 12 iteration 1 took 246.4 ms
Query 12 iteration 2 took 246.1 ms
Query 12 iteration 3 took 247.9 ms
Query 12 iteration 4 took 246.5 ms
Query 12 iteration 5 took 248.2 ms
Query 12 iteration 6 took 247.8 ms
Query 12 iteration 7 took 246.4 ms
Query 12 iteration 8 took 246.6 ms
Query 12 iteration 9 took 246.5 ms
Query 12 avg time: 246.79 ms
```
PR (16 partitions):
```
Query 1 iteration 0 took 138.6 ms
Query 1 iteration 1 took 142.2 ms
Query 1 iteration 2 took 125.8 ms
Query 1 iteration 3 took 102.4 ms
Query 1 iteration 4 took 105.9 ms
Query 1 iteration 5 took 107.0 ms
Query 1 iteration 6 took 109.3 ms
Query 1 iteration 7 took 109.9 ms
Query 1 iteration 8 took 108.8 ms
Query 1 iteration 9 took 112.0 ms
Query 1 avg time: 116.19 ms
```
Master (1 partition):
```
Query 1 iteration 0 took 640.6 ms
Query 1 iteration 1 took 640.0 ms
Query 1 iteration 2 took 632.9 ms
Query 1 iteration 3 took 634.6 ms
Query 1 iteration 4 took 630.7 ms
Query 1 iteration 5 took 630.7 ms
Query 1 iteration 6 took 631.9 ms
Query 1 iteration 7 took 635.5 ms
Query 1 iteration 8 took 639.0 ms
Query 1 iteration 9 took 638.3 ms
Query 1 avg time: 635.43 ms
```
PR (16 partitions)
```
Query 5 iteration 0 took 465.8 ms
Query 5 iteration 1 took 428.0 ms
Query 5 iteration 2 took 435.0 ms
Query 5 iteration 3 took 407.3 ms
Query 5 iteration 4 took 435.7 ms
Query 5 iteration 5 took 437.4 ms
Query 5 iteration 6 took 411.2 ms
Query 5 iteration 7 took 432.0 ms
Query 5 iteration 8 took 436.8 ms
Query 5 iteration 9 took 435.6 ms
Query 5 avg time: 432.47 ms
```
Master (1 partition)
```
Query 5 iteration 0 took 660.6 ms
Query 5 iteration 1 took 634.4 ms
Query 5 iteration 2 took 626.4 ms
Query 5 iteration 3 took 628.0 ms
Query 5 iteration 4 took 635.3 ms
Query 5 iteration 5 took 631.1 ms
Query 5 iteration 6 took 631.3 ms
Query 5 iteration 7 took 639.4 ms
Query 5 iteration 8 took 634.3 ms
Query 5 iteration 9 took 639.0 ms
Query 5 avg time: 635.97 ms
```
Closes #9214 from Dandandan/mem_table_repartition
Lead-authored-by: Heres, Daniel <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Signed-off-by: Jorge C. Leitao <[email protected]>1 parent 2de39d1 commit 47752e0
File tree
3 files changed
+49
-5
lines changed- rust
- benchmarks/src/bin
- datafusion
- benches
- src/datasource
3 files changed
+49
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
73 | 77 | | |
74 | 78 | | |
75 | 79 | | |
| |||
138 | 142 | | |
139 | 143 | | |
140 | 144 | | |
141 | | - | |
142 | | - | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
143 | 151 | | |
144 | 152 | | |
145 | 153 | | |
| |||
1593 | 1601 | | |
1594 | 1602 | | |
1595 | 1603 | | |
| 1604 | + | |
1596 | 1605 | | |
1597 | 1606 | | |
1598 | 1607 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
73 | 76 | | |
74 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
75 | 80 | | |
76 | 81 | | |
77 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | | - | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
36 | 40 | | |
37 | 41 | | |
38 | 42 | | |
| |||
102 | 106 | | |
103 | 107 | | |
104 | 108 | | |
105 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
106 | 114 | | |
107 | 115 | | |
108 | 116 | | |
| |||
126 | 134 | | |
127 | 135 | | |
128 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
129 | 159 | | |
130 | 160 | | |
131 | 161 | | |
| |||
0 commit comments