Skip to content

Bad performance when reading large array of strings. #24

@benhamad

Description

@benhamad
keys(LazyJSON.value(json_file))

The above is asymptotically problematic when json_file contain large array of strings.
I run this code

for i in 1:5
    items = i * 10000000
    json_file = open("/tmp/json", "w")
    write(json_file, JSON.json(Dict("a"=> "a", "b"=>repeat(["test"], items))))
    json_file = open("/tmp/json")
    t = @elapsed collect(keys(LazyJSON.value(json_file)))
    println("$items $t")
end

And as you can see from the result it's far away from linear (the second column is in seconds and the first is the number of items in the array)

10000000 75.786150509
20000000 317.985342906
30000000 724.489721802
40000000 1305.421886045
50000000 2040.987945434
60000000 2977.542937743

Compared to JSON.parse which return

10000000 8.384795834
20000000 18.123253007
30000000 27.854969659
40000000 38.360378806
50000000 51.391322248
60000000 73.577127605

We did some profiling and it seems that most of the time is spent in

LazyJSON.jl/src/LazyJSON.jl

Lines 478 to 496 in 53c63f0

function scan_string(s, i)
i, c = next_ic(s, i)
has_escape = false
while c != '"'
if isnull(c) || c == IOStrings.ASCII_ETB
throw(JSON.ParseError(s, i, c, "input incomplete"))
end
escape = c == '\\'
i, c = next_ic(s, i)
if escape && !(isnull(c) || c == IOStrings.ASCII_ETB)
has_escape = true
i, c = next_ic(s, i)
end
end
return i, has_escape
end

Screen Shot 2021-02-23 at 23 09 31

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions