Follow

So it looks like the next unexpected challenge to everyday , after the matter of SSL, is the matter of Unicode.

· · Web · 1 · 0 · 2

@roadriverrail omg yeah I will totally geek with you about unicode sometime if you want. I actually really like it even for all its warts, but I can imagine implementing it in restrictive hardware environments would be Not Trivial.

But, like, the predominance of older character sets pre-Unicode wasn't *just* because Unicode hadn't been invented yet: They were implemented with tighter hardware in mind. So another option for you might be to just skip back to those.

@epilanthanomai I think the problem here is that the "everyday" part becomes challenging in that "everyday" means being connected to a modern Internet. For example, you can see in this attached screenshot of my homepage that Jekyll is rendering my homepage using various UTF-8 characters, even just things like quote marks and ellipses, AFAIK. So, even once I get a good ANSI terminal, I'm just an email or webpage away from mild unintelligibility.

@roadriverrail Honestly for tools like a modern terminal-based browser or mua, I give em at least 50/50 odds of pulling your charset out of your env and converting stuff to handle it for you, even if you set your local env to use latin-1.

@epilanthanomai Hm...yeah, good point about the reported terminal type. Some of the better tools out there might get it right. And as much as I enjoy seeing my website on Spectrum Internet Suite browser, it's too slow to be usable.

@roadriverrail Out of curiosity I just did:

LC_ALL=C LANG=C links

and visited your site. links converted the quotes to ascii and the ellipses to three dots. With my default en_US.UTF-8 locale it renders the unicode characters.

@roadriverrail To be fair, I'm on Ubuntu rn, and I have zero idea if they had to go through any hassle to get it to do that for me ;-)

@roadriverrail Set LC_ALL and LANG environment variables to C and run the links browser.

LC_* environment variables are used for localization (in locale-aware programs), including charset selection. I don't know the precise definition of the "C" locale, but generally it's an 8-bit charset (prob ascii or latin-1).

Locale-aware software will convert output to the user's locale. So basically I asked links to convert output to ascii or latin-1 for output.

@roadriverrail I forget whether that particular charset transformation (unicode to either ascii or latin-1) is part of Unicode's huge collection of standard normalization tables or if it's external to unicode. Either way the localization library (probably part of gnu libc tbh) apparently knows how to translate unicode ellipses and curly quotes to appropriate ascii representations.

Seeing LANG was probably overkill on my part: It's for interface language.

@epilanthanomai Got it. I'd actually not heard of links, either, having used only lynx for the last 20+ years.

Granted, I made the files for those posts in vim, so I suspect things like the elision of three dots into an ellipsis character was jekyll's doing, and that kind of overly-clever stuff annoys me. :goose_honk:

@roadriverrail Ah, yep, your server is definitely serving it up as a utf-8 encoding of unicode ellipsis and curly quotes etc, so the conversion that way is almost certainly jekyll's doing. I imagine there's probably a way to disable it in jekyll if you want.

@epilanthanomai Thanks for the confirmation (web technology is not my first language, nor my second, nor my third). I might chase it down in jekyll, but I'm just kinda...annoyed...that faithful representation of the source characters isn't its default, yanno?

@epilanthanomai BTW, can confirm this makes links work with an ancient Apple IIGS terminal emulator. Now if I could only figure out why that particular emulator software has no P key in MAME (but not other emulators), I'd be golden.

Sign in to participate in the conversation
Signs & Codes

Signs & Codes is a private Mastodon instance built around a community of friends.