Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
rotor
Jun 11, 2001

classic case of pineapple on pizzadog derangement syndrome

Defghanistan posted:

Sup nerds, heard you guys like computers

you heard wrong.

Adbot
ADBOT LOVES YOU

MononcQc
May 29, 2007

computers are awesome as long as they work. When there's a problem it's always due to the shittiest reason and it's frustrating as hell. When there's a serious problem it's often due to a deeply rooted conceptual problem and it's depressing as hell.

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

computers are awesome as long as they work. When there's a problem it's always due to the shittiest reason and it's frustrating as hell. When there's a serious problem it's often due to a deeply rooted conceptual problem and it's depressing as hell.

and we're mostly dealing with legacy constraints made in good faith and short sight

MononcQc
May 29, 2007

tef posted:

and we're mostly dealing with legacy constraints made in good faith and short sight

this and everything having to do with l10n and i18n.

tef
May 30, 2004

-> some l-system crap ->
i've been looking at imap and it is a total clusterfuck. old protocols never seem to handle unicode well.

http://tools.ietf.org/html/rfc3501#section-5.1.3

we'll embed unicode using utf-7 (putting base64 in it yay). except with a different alphabet and control characters. hooray!.

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

this and everything having to do with l10n and i18n.


our written language is a legacy system from when we scratched things with implements.

MononcQc
May 29, 2007

"You know UCS-4 would be very nice to use with no surrogate pairs ever, it just would take a bit more storage for text, which is dwarfed by whatever loving JPEG you'll attach to your content. Instead let's make sure us English speakers and some of the other latin-1 retards we share bits of culture with get to keep our 8 bit character representation and make the final encoding have a variable width with surrogate pairs that make it impossible to know where you are in the whole thing so we can then support other people's languages. Once everyone understands this, we'll introduce them to the idea that code points are not a decent unit anyway and we need to go further with combining accents and grapheme clusters and poo poo."

"yeah let's do that, but in UTF-7 and base64, too!"

MononcQc
May 29, 2007

challenge for today: find a Unicode sequence which is larger to represent as an encoded UTF-8 string than its visual representation, either as a JPG, PNG-8 or GIF image. It is likely possible but I've not had the energy to do it.

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

"You know UCS-4 would be very nice to use with no surrogate pairs ever

http://www.unicode.org/history/unicode88.pdf

quote:

Are 16 bits, providing at most 65,536 distinct codes, sufficient to encode all characters of all the world's scripts? … The answer to this is Yes.

In other words, given that the limitation to 65,536 character codes genuinely does satisfy all the world's modern communication needs with a safety factor of about four.


quote:

, it just would take a bit more storage for text, which is dwarfed by whatever loving JPEG you'll attach to your content.


it isn't about the space saving property, it was the compatibility with ascii/byte based systems. the reason utf-8 is so popular it is that it is one of the easiest ways to retrofit your system.


quote:

Instead let's make sure us English speakers and some of the other latin-1 retards we share bits of culture with get to keep our 8 bit character representation and make the final encoding have a variable width with surrogate pairs that make it impossible to know where you are in the whole thing so we can then support other people's languages.

to be fair to utf-8 you do have a synchronisable bytestream so you know if you're in a multibyte bit or not. unlike the other proposals around at the time.

quote:

Once everyone understands this, we'll introduce them to the idea that code points are not a decent unit anyway and we need to go further with combining accents and grapheme clusters and poo poo."

to be fair, people have really weird languages and scripts. (english being no exception, what with having a script bearing not much relation to the spoken language).

quote:

"yeah let's do that, but in UTF-7 and base64, too!"

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

challenge for today: find a Unicode sequence which is larger to represent as an encoded UTF-8 string than its visual representation, either as a JPG, PNG-8 or GIF image. It is likely possible but I've not had the energy to do it.

probably via abuse of combining characters.

tef
May 30, 2004

-> some l-system crap ->
genuine unicode support means giving up the whole petulant notion of a text string being an array indexed by character. them's the breaks.

let's not get on to sorting :(

Meiwaku
Jan 10, 2011

Fun for the whole family!

MononcQc posted:

computers are awesome as long as they work. When there's a problem it's always due to the shittiest reason and it's frustrating as hell. When there's a serious problem it's often due to a PEBKAC

MononcQc
May 29, 2007

Some characters still require more than one code unit to be represented in UTF-16. Supplementary characters represented with surrogate pairs (adding one code point to the character to represent it anyway) fall into this with UTF-16 (and are incompatible with UCS-2), which include emoji characters, for example.


tef posted:

it isn't about the space saving property, it was the compatibility with ascii/byte based systems. the reason utf-8 is so popular it is that it is one of the easiest ways to retrofit your system.
in ways that end up breaking all the time, I guess. You're right about the compatibility aspect, but overall it still feels utf-8 itself is being popular due to its Western-centric approach. As far as I remember (and you may correct me on this), a lot of the lower Unicode code points share values similar to those of the latin-1 character set (and UTF-8 breaks a few of them only), compared to say, whatever is used to CJK characters in other standards.

It could just be an artifact of whatever people describing the spec would speak at the time though. I'm not exactly aware of the history behind it.

tef posted:

to be fair to utf-8 you do have a synchronisable bytestream so you know if you're in a multibyte bit or not. that's why it happened.


to be fair, people have really weird languages and scripts. (english being no exception, what with having a script bearing not much relation to the spoken language).
not gonna disagree with any of that.

MononcQc
May 29, 2007

tef posted:

genuine unicode support means giving up the whole petulant notion of a text string being an array indexed by character. them's the breaks.

let's not get on to sorting :(

Other poo poo that sucks: capitalization, title-casing stuff, comparison, string length.

A non-breakable space should be equivalent to a normal space, É and E should be seen as identical in some French texts, but rarely should é and e be seen that way (artifacts of typewriters, yay!), not speaking of words containing characters like 'œ' which are often seen as equivalent to 'oe' but not exactly, so whatever string length or reversal means now.

MononcQc
May 29, 2007

I heard you liked Time zones and calendars!

Offsets are not always straight on the hour, sadly. Some of them are non-standard and go for a half-hour offset. Nova-Scotia in Canada does this.

Then again, Nepal is UTC/GMT +5:45, so that's not exactly right for the rest (Chatham islands, NZ, follow this too). We need more precision.

So let's just add minutes and poo poo and overflow to hours, right?

Iran changes its timezone offset (DST) based on the Persian lunar calendar, not the gregorian one. So we have to be careful.

Also Kiribati has different timezones that make parts of the country be on different days at the same time.

Then we have more fun Calendar intricacies... Samoa had a 367-days year in 1892 after changing timezones where it got two 4th of July in the same year. Back in 2011 (I think?) they went back forward in time, skipping an entire Friday by going forward a day.

Sweden used its own calendar for 12 years (starting 1700), but things got a bit out of hand:

quote:

In November 1699, Sweden decided that, rather than adopting the Gregorian calendar outright, it would gradually approach it over a 40-year period. The plan was to skip all leap days in the period 1700 to 1740. Every fourth year, the gap between the Swedish calendar and the Gregorian would reduce by one day, until they finally lined up in 1740. In the meantime, this calendar would not only not be in line with either of the major alternative calendars, but also the differences between them would change every four years.

In accordance with the plan, February 29 was omitted in 1700, but due to the Great Northern War no further reductions were made in the following years.

In January 1711, King Charles XII declared that Sweden would abandon the calendar, which was not in use by any other nation and had not achieved its objective, in favour of a return to the older Julian calendar. An extra day was added to February in the leap year of 1712, thus giving it a unique 30-day length (February 30).

In 1753, one year later than England and its colonies, Sweden introduced the Gregorian calendar, whereby the leap of 11 days was accomplished in one step, with February 17 being followed by March 1.

Feb 30 is now a valid date, but only in Sweden in 1712. In 1853, Feb 18+ do not exist for Sweden.

We also have to care for leap seconds, different calendars, administrative changes, etc.

:sigh:

Shaggar
Apr 26, 2006
timezones and calendars are some bullshit.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

MononcQc posted:

I heard you liked Time zones and calendars!

Offsets are not always straight on the hour, sadly. Some of them are non-standard and go for a half-hour offset. Nova-Scotia in Canada does this.

Then again, Nepal is UTC/GMT +5:45, so that's not exactly right for the rest (Chatham islands, NZ, follow this too). We need more precision.

So let's just add minutes and poo poo and overflow to hours, right?

Iran changes its timezone offset (DST) based on the Persian lunar calendar, not the gregorian one. So we have to be careful.

Also Kiribati has different timezones that make parts of the country be on different days at the same time.

Then we have more fun Calendar intricacies... Samoa had a 367-days year in 1892 after changing timezones where it got two 4th of July in the same year. Back in 2011 (I think?) they went back forward in time, skipping an entire Friday by going forward a day.

Sweden used its own calendar for 12 years (starting 1700), but things got a bit out of hand:


Feb 30 is now a valid date, but only in Sweden in 1712. In 1853, Feb 18+ do not exist for Sweden.

We also have to care for leap seconds, different calendars, administrative changes, etc.

:sigh:

Who gives a poo poo about old dates, store them as a simple Strings and don't expect to do any manipulation with dates older than the epoch.

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

in ways that end up breaking all the time, I guess. You're right about the compatibility aspect, but overall it still feels utf-8 itself is being popular due to its Western-centric approach.

that's ascii compatibility for you. and there is a lot of it around.

quote:

As far as I remember (and you may correct me on this), a lot of the lower Unicode code points share values similar to those of the latin-1 character set (and UTF-8 breaks a few of them only), compared to say, whatever is used to CJK characters in other standards.

utf-8 is multibyte for anything outside of ascii.

the second unicode block is latin-1 http://en.wikipedia.org/wiki/C1_Controls_and_Latin-1_Supplement


quote:

It could just be an artifact of whatever people describing the spec would speak at the time though. I'm not exactly aware of the history behind it.

http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt

MononcQc
May 29, 2007

Hard NOP Life posted:

Who gives a poo poo about old dates, store them as a simple Strings and don't expect to do any manipulation with dates older than the epoch.

Well you still had this year's leap date, last year's Samoa jumping a day, Iran doing DST based on a lunar Calendar, etc.

This year Libya got their clock back one hour, next year they start respecting DST. Russia is stuck in permanent summer time. Jordan also cancelled DST switching this year. Mexico is discussing adding a timezone to their area.

The thing is, converting to some standard time like UTC is still ultimately very lovely and prone to error. Sure once you're in epoch manipulating stuff isn't that hard, but going from one to the other is fairly bad.

Some events are intimately tied to relative, non-global time. Google Calendar had interesting ones for that a few years ago (not sure if they changed it). Any event saved in a Calendar had the timezone of the time the meeting is set (when it is created) for the event. If you set a meeting time before a DST change, but happening after, once daylight saving time got in effect, the meeting time would be off an hour in the final result, but still right according to UTC. If the meeting is something international, then it made sense, but for local meetups for people within a city, the behavior was off.

Time is just the best way to get screwed.

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

Other poo poo that sucks: capitalization, title-casing stuff, comparison, string length.

let's not even get into right to left/left to right stuff.

oh and you should be case-folding too.


quote:

A non-breakable space should be equivalent to a normal space,

similarly shy hyphens should be hidden

quote:

É and E should be seen as identical in some French texts, but rarely should é and e be seen that way (artifacts of typewriters, yay!), not speaking of words containing characters like 'œ' which are often seen as equivalent to 'oe' but not exactly, so whatever string length or reversal means now.

ffuck

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

We also have to care for leap seconds, different calendars, administrative changes, etc.

:sigh:

timezones used to be much more localized and imprecise. then again we didn't have to synchronize with people on the other side of the planet.

gently caress utc, i want to use gps time.

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
you rarely have to care about nepal or iran or kiribati at all let alone one day in the 1700s in sweden. why worry about any of this until the customer complains, then it's extra $$$ to fix their weird nonstandard problems

MononcQc
May 29, 2007

I hope I never see the day space travel makes relativistic effects on time even more obvious than it is with GPS clocks.

That would be so much bullshit to deal with.

MononcQc fucked around with this message at 04:43 on Dec 6, 2012

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

I hope I never see the day space travel makes relativistic effects on time even more obvious than it is with GPS clocks.

That would be so much bullshit to deal with.

on the plus side I wouldn't have to worry about leapseconds.

today is September 7037, 1993

rotor
Jun 11, 2001

classic case of pineapple on pizzadog derangement syndrome

MononcQc posted:

I hope I never see the day space travel makes relativistic effects on time even more obvious than it is with GPS clocks.

That would be so much bullshit to deal with.

oh my god I never considered this

Nomnom Cookie
Aug 30, 2009



Science fiction writer consensus: it is some bullshit for real

Zombywuf
Mar 29, 2008

MononcQc posted:

challenge for today: find a Unicode sequence which is larger to represent as an encoded UTF-8 string than its visual representation, either as a JPG, PNG-8 or GIF image. It is likely possible but I've not had the energy to do it.

*inserts several megabytes of zero width spaces*

Zombywuf
Mar 29, 2008

MononcQc posted:

Some events are intimately tied to relative, non-global time. Google Calendar had interesting ones for that a few years ago (not sure if they changed it). Any event saved in a Calendar had the timezone of the time the meeting is set (when it is created) for the event. If you set a meeting time before a DST change, but happening after, once daylight saving time got in effect, the meeting time would be off an hour in the final result, but still right according to UTC. If the meeting is something international, then it made sense, but for local meetups for people within a city, the behavior was off.

I used to think about making better calendaring software. Then I thought about problems like wanting to schedule meetings on the second tuesday of each month while crossing timezones.

I don't think about calendaring software any more.

Cocoa Crispies
Jul 20, 2001

Vehicular Manslaughter!

Pillbug

Zombywuf posted:

I used to think about making better calendaring software. Then I thought about problems like wanting to schedule meetings on the second tuesday of each month while crossing timezones.

I don't think about calendaring software any more.

Arnold Schwarzenegger had the right idea to just not schedule any meetings.

Sapozhnik
Jan 2, 2005

Nap Ghost
I'm waiting for the glorious day when computing is neither a gold-rush fad nor a dismal cesspit of cost-cutting

basically that's going to come around when we finally line up and execute all of the MBAs so, not going to happen

0xB16B00B5
Aug 24, 2006

by Y Kant Ozma Post

Cocoa Crispies posted:

Arnold Schwarzenegger had the right idea to just not schedule any meetings.

lol all he'll say is ILL BE BACK

Zombywuf
Mar 29, 2008

Mr Dog posted:

I'm waiting for the glorious day when computing is neither a gold-rush fad nor a dismal cesspit of cost-cutting

basically that's going to come around when we finally line up and execute all of the MBAs so, not going to happen

Sometimes I wish I worked in a field where people die if I make a mistake.

0xB16B00B5
Aug 24, 2006

by Y Kant Ozma Post
whose the drone yosposter

people die when he does his job correctly!

qntm
Jun 17, 2009
zoneinfo is the answer to all of your timezone questions

dunno about calendars though

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Nomnom Cookie posted:

Science fiction writer consensus: it is some bullshit for real

poul anderson talked about this about once per book in the flandry series. actually maybe the whole polesotechnic league series

Stringent
Dec 22, 2004


image text goes here

Mr Dog posted:

I'm waiting for the glorious day when computing is neither a gold-rush fad nor a dismal cesspit of cost-cutting

basically that's going to come around when we finally line up and execute all of the MBAs so, not going to happen

the day when computers program themselves and all that's required is to explain precisely just what the gently caress is it you want them to do.

gonadic io
Feb 16, 2011

>>=

Stringent posted:

the day when computers program themselves and all that's required is to explain precisely just what the gently caress is it you want them to do.

perhaps we could develop some kind of specialised language in which to convey our ideas to computers

Cocoa Crispies
Jul 20, 2001

Vehicular Manslaughter!

Pillbug

Stringent posted:

the day when computers program themselves and all that's required is to explain precisely just what the gently caress is it you want them to do.

PrBacterio
Jul 19, 2000

Stringent posted:

the day when computers program themselves and all that's required is to explain precisely just what the gently caress is it you want them to do.
That's not actually going to help much with anything, like 99% of the problems with the software in existance right now is that there's no one who can figure out what the gently caress it is precisely that they want the software to do in the first place.

Adbot
ADBOT LOVES YOU

Stringent
Dec 22, 2004


image text goes here

PrBacterio posted:

That's not actually going to help much with anything, like 99% of the problems with the software in existance right now is that there's no one who can figure out what the gently caress it is precisely that they want the software to do in the first place.

:thejoke:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply