punycode Release v0.15.0
Source: Dev.to
Background
I wanted to clean up my various GitHub repositories, but an old issue and an old branch caught my attention.
The issue (#34) concerned problems with punycode and certain Unicode characters, specifically emojis containing a Zero Width Joiner (ZWJ).
Issue Description
Problem:
Punycodes/Emojis containing a Zero Width Joiner are not handled correctly.
xn--8k8hlfr9nshould be 🧑🏾🎨, not 🧑🏾🎨.
The punycode for 🧑🏾🎨 showsxn--1ug6825plhas9rinstead ofxn--8k8hlfr9n.
The ZWJ is used to create compound characters by joining multiple characters together. In this case, it combines the base character (🧑🏾) with the additional character (🎨) to form the single compound character 🧑🏾🎨.
Fix Implemented
I discovered that the existing punycode implementation did not account for the ZWJ during encoding and decoding. The encoding and decoding functions were modified to handle the ZWJ correctly.
Testing
After the changes, I tested the updated implementation with various inputs, including those containing the ZWJ. The tests confirmed that the issue was resolved and the correct punycode representation was generated.
Updated CLI Output
With the latest update, the CLI now produces the expected results.
./punycode xn--8k8hlfr9n
🧑🏾🎨
./punycode xn--1ug6825plhas9r
🧑🏾🎨
I verified the decodings against several online implementations, and they all agree.
Discussion
The original issue states:
xn--8k8hlfr9nshould be 🧑🏾🎨, not 🧑🏾🎨- punycode 🧑🏾🎨 shows
xn--1ug6825plhas9rinstead ofxn--8k8hlfr9n
My client decodes xn--8k8hlfr9n to 🧑🏾🎨, while the reporter expects 🧑🏾🎨. Since the online tools align with my client’s output, I closed the issue as won’t fix, noting that the reporter may have the expectation reversed. I invited further clarification if needed.
Project Context
This utility was created while learning Go. With a decade of experience in DNS and domain name management, it felt natural to build a tool in an area I know well.
Key topics covered in this repository:
- Building CLI tools in Go
- Creating distributable executables
- Reading data from the command line and STDIN
- Testing CLI tools and the
mainfunction
The repository may never be perfect, but it has served its purpose. An alternative approach would be to limit handling to the DNS‑relevant subset of Unicode, which is itself a deep rabbit hole.
Contributions
Suggestions, ideas, or improvements are welcome. Please open an issue or a pull request on the repository.
Release
I released v0.16.0 to ensure compatibility with the latest Go version and modules.