Skip to content

Conversation

@neagle
Copy link
Contributor

@neagle neagle commented May 22, 2015

This was all caused by an emoticon: :/

SGF parsing can currently be broken by an escaped backslash preceding an ending square bracket: ']'

This updates the regular expressions used to parse sequences and properties to account for the possibility that a preceding backslash has been, itself, escaped.

- SGF parsing can currently be broken by an escaped backslash preceding
  an ending square bracket: '\\]'
@neagle neagle changed the title Fix SGF parsing regular expressions, fixes #36 Fix SGF parsing regular expressions, fix #36 May 22, 2015
neagle added 2 commits May 22, 2015 14:44
- SGF parsing can currently be broken by an escaped backslash preceding
  an ending square bracket: '\\]'
@waltheri
Copy link
Owner

Thanks for solving this issue (I wasn't at home). I will check it and merge soon.

@neagle
Copy link
Contributor Author

neagle commented May 26, 2015

No problem! It's a pretty understandable oversight since it's such an edge case to have an intended (and escaped) backslash right before a closing bracket, but it managed to break the game to an impressive degree. 😄

@xcombelle
Copy link

I have an improvement on @neagle code. It is still to be tested but here it is:

If I understand it is done to detect the between square bracket elements
shouldn't be

  \[([^\\\]]|\\.)*\]

expanded with explanation pcre with /x modifier

  \[ #an opening braquet
   ( # either
      [^\\\]] # not ( an antislash or a closing braquet )
    |  # or
    \\. # an antislash followed by any character
   )* #repeated at least 0 times
   \] # followed by a close bracket

I tested both regexp with regex101 online tool ( https://regex101.com/r/bF0jQ5/4 for compact and js one and https://regex101.com/r/bF0jQ5/2 for pcre and explained one ) It looks that both work

By the way looks like on some cases, your version eat too much
for example
with

 C[\\];C[a]

your version eat both comment and that between: https://regex101.com/r/bF0jQ5/5

while mine take just the first https://regex101.com/r/fB2hF2/1

(originally in in neagle/gokibitz#92 (comment) which is said by @neagle as a clarity improvement and fix bug improvement)

@waltheri
Copy link
Owner

@xcombelle code looks pretty and works. However I would suggest a little improvement. According to SGF specification you can escape new line characters. Therefore I would use this reg exp:

/\[([^\\\]]|\\(.|\n|\r))*\]/g

and then I would remove these line breaks. Please check it and I will update master branch.

@xcombelle
Copy link

You are full right I actually missed that . don't match carriage return in javascript (good catch)

@yewang
Copy link

yewang commented Apr 21, 2016

Does this change work for all of these test cases?

(;C[\];B[aa];W[bb])
Two nodes: comment ];B[aa in root node, white move in second node

(;C[\\];B[aa];W[bb])
Three nodes: comment \ in root node, black move in second node, white move in third node

(;C[\\\];B[aa];W[bb])
Two nodes: comment \];B[aa in root node, white move in second node

(;C[\\\\];B[aa];W[bb])
Three nodes: comment \\ in root node, black move in second node, white move in third node

@xcombelle
Copy link

@yewang yes it should works for these uses cases you can test it on https://regex101.com/r/bF0jQ5/6 (I tested the four)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants