Hello.
I also have a question about the implementation of ParseFileParallel.
Actually, you use ProcessBlocksImpl by assigning block_begin and block_end for each thread in the multi threaded configuration.
My concern is how your code is handling the case where the buffer has an uncomplete line at the end of blocks.
For example, Let's assume we have block_begin 4 and block_end 8 for thread 2 in ProcessBlocksImpl. I have an virtual obj lines for this example:
# BLOCK 4 Start
v 0.0 0.0 0.0
...
# BLOCK 4 End
# BLOCK 5 Start
v 0.0 0.0 0.0
...
# BLOCK 5 End
# BLOCK 6 Start
v 0.0 0.0 0.0
...
# BLOCK 6 End
# BLOCK 7 Start
v 0.0 0.0 0.0
v 0.0 0.0 0.0
...
v 0.0 0.0
# BLOCK 7 End
# BLOCK 8 Start
0.0
v 0.0 0.0 0.0
...
# BLOCK 8 END
In this case, when processing BLOCK 7, it encounters an uncomplete line v 0.0 0.0, missing one element of the vertex. I think your code is not handling this case in the multi thread case. In a single thread case, your code is handling this case by copying the rest of the line into the back_buffer with the remainder variable and stop_parsing_after_eol false.
I guess the problem is caused by stop_parsing_after_eol set as true in the multi thread case.
|
for (size_t i = 0; i != tasks.size(); ++i) { |
|
bool is_last = i + 1 == tasks.size(); |
|
auto begin = tasks[i]; |
|
auto end = is_last ? num_blocks : (tasks[i + 1] + 1); |
|
bool stop_parsing_after_eol = !is_last; |
|
auto chunk = &(*chunks)[i]; |
|
|
|
threads.emplace_back(ProcessBlocks, source, i, begin, end, stop_parsing_after_eol, chunk, context); |
|
threads.back().detach(); |
|
} |
On the above code, you are setting
stop_parsing_after_eol as true for all the threads except for the last one. As a result,
|
for (size_t i = block_begin; i != block_end; ++i) { |
|
auto remainder = size_t{}; |
|
|
|
bool last_block = (i + 1 == block_end) || reached_eof; |
|
|
|
if (!last_block) { |
|
file_offset = (i + 1) * kBlockSize; |
|
|
|
if (auto ec = reader->ReadBlock(file_offset, kBlockSize, back_buffer + kMaxLineLength)) { |
|
chunk->error = Error{ ec }; |
|
return; |
|
} |
|
|
|
} else if (stop_parsing_after_eol) { |
|
if (auto ptr = static_cast<const char*>(memchr(text.data(), '\n', kMaxLineLength))) { |
|
auto pos = static_cast<size_t>(ptr - text.data()); |
|
line = text.substr(0, pos); |
|
if (EndsWith(line, '\r')) { |
|
line.remove_suffix(1); |
|
} |
|
++chunk->text.line_count; |
|
if (auto rc = ProcessLine(line, chunk, context); rc != rapidobj_errc::Success) { |
|
chunk->error = Error{ make_error_code(rc), std::string(line), chunk->text.line_count }; |
|
} |
|
} else { |
|
++chunk->text.line_count; |
|
auto ec = make_error_code(rapidobj_errc::LineTooLongError); |
|
chunk->error = Error{ ec, std::string(text, 0, kMaxLineLength), chunk->text.line_count }; |
|
} |
|
return; |
|
} |
When
i becomes
block_end - 1 (the last
i), it will at most process one line and then exit the
ProcessBlocksImpl without handling the rest of the text data in the branch
else if (stop_parsing_after_eol). Even though we set
stop_parsing_after_eol as false in other threads, we need more code to handle the last line of BLOCK 7 which has a missing element. I think you have to read the next block (BLOCK 8 in my example) and then process one line to get the missing element.
I might be confused with your code because I have looked through your code for two days,
but what I still have seen works like that.
If you have any idea for this, please let me know.
Hello.
I also have a question about the implementation of
ParseFileParallel.Actually, you use
ProcessBlocksImplby assigningblock_beginandblock_endfor each thread in the multi threaded configuration.My concern is how your code is handling the case where the buffer has an uncomplete line at the end of blocks.
For example, Let's assume we have
block_begin4 andblock_end8 for thread 2 inProcessBlocksImpl. I have an virtual obj lines for this example:In this case, when processing BLOCK 7, it encounters an uncomplete line
v 0.0 0.0, missing one element of the vertex. I think your code is not handling this case in the multi thread case. In a single thread case, your code is handling this case by copying the rest of the line into the back_buffer with the remainder variable andstop_parsing_after_eolfalse.I guess the problem is caused by
stop_parsing_after_eolset as true in the multi thread case.rapidobj/include/rapidobj/rapidobj.hpp
Lines 7124 to 7133 in 744374a
On the above code, you are setting
stop_parsing_after_eolas true for all the threads except for the last one. As a result,rapidobj/include/rapidobj/rapidobj.hpp
Lines 6932 to 6962 in 744374a
When
ibecomesblock_end - 1(the lasti), it will at most process one line and then exit theProcessBlocksImplwithout handling the rest of the text data in the branchelse if (stop_parsing_after_eol). Even though we setstop_parsing_after_eolas false in other threads, we need more code to handle the last line of BLOCK 7 which has a missing element. I think you have to read the next block (BLOCK 8 in my example) and then process one line to get the missing element.I might be confused with your code because I have looked through your code for two days,
but what I still have seen works like that.
If you have any idea for this, please let me know.