Updated unicode post with naughty strings test

This commit is contained in:
Marcus Noble 2020-05-28 11:50:41 +00:00
parent 9066d93dc8
commit 5ae4a3d034

View File

@ -5,6 +5,8 @@ date: 2020-05-27
tags: golang tags: golang
summary: "With Go being a relatively modern programming language, first released in 2009, it is not unsurprising that it has great support for Unicode strings. What is surprising is just how far this support goes." summary: "With Go being a relatively modern programming language, first released in 2009, it is not unsurprising that it has great support for Unicode strings. What is surprising is just how far this support goes."
--- ---
_Updated 2020-05-28: Added big list of naughty strings test_
With Go being a relatively modern programming language, first released in 2009, it is not unsurprising that it has great support for Unicode strings. What is surprising is just how far this support goes. With Go being a relatively modern programming language, first released in 2009, it is not unsurprising that it has great support for Unicode strings. What is surprising is just how far this support goes.
@ -126,3 +128,61 @@ print(𝕧𝕒𝕣𝕚𝕒𝕓𝕝𝕖)
## In the wild ## In the wild
I've tried to find other examples of these non-Latin Unicode characters being used in real code but have so far come up empty other than Gomega. I had assumed there'd be examples of code written in Russian or Chinese that made use of this but I can't seem to find any. Perhaps having a mix of native language variables and functions mixed with the English build in library functions isn't such a desireable outcome. I've tried to find other examples of these non-Latin Unicode characters being used in real code but have so far come up empty other than Gomega. I had assumed there'd be examples of code written in Russian or Chinese that made use of this but I can't seem to find any. Perhaps having a mix of native language variables and functions mixed with the English build in library functions isn't such a desireable outcome.
## Update
After posting this it was suggested to me to try the [big list of naughty strings](https://github.com/minimaxir/big-list-of-naughty-strings) to see how many of them Go can handle. This list is a collection of strings that often cause problems for programs in one way or another.
I put together a [fairly simple test case](https://share.cluster.fun/golangnaughtystringstest.js) that used each string as a variable and then tested if the code could build. To ensure as many strings from the list could be attempted I removed all spaces from the strings.
The results were a bit surprising...
> 72 of the 506 strings are valid variable names in Go
(Note: This number may be higher than it should be due to removing spaces from strings)
Of those 72 valid strings there are some that we'd expect similar to what we covered above:
* `ﷺ`
* `𝕿𝖍𝖊𝖖𝖚𝖎𝖈𝖐𝖇𝖗𝖔𝖜𝖓𝖋𝖔𝖝𝖏𝖚𝖒𝖕𝖘𝖔𝖛𝖊𝖗𝖙𝖍𝖊𝖑𝖆𝖟𝖞𝖉𝖔𝖌`
* `田中さんにあげて下さい`
But there are a few that are really surprising:
* `nil`
* `true`
* `false`
So, it turns out this is a perfectly valid Go program:
```go
package main
import "fmt"
func main() {
nil := "Not a value"
false := 55
if !true() {
fmt.Println(false)
}
fmt.Println(nil)
}
func true() bool {
return false
}
```
When run this outputs:
```
55
Not a value
```
Please, please, never do this in your code.