Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
comedyblissoption
Mar 15, 2006

How important is it to properly support (arbitrary querying, storing, fetching) in a web application for unicode outside of the Basic Multilingual Plane (requiring more than 2 bytes to encode) for the purposes of modern international markets?

Looking at Wikipedia, the only non-historical languages outside of the BMP is Chinese, Korean, and Japanese. Do these markets expect or want these characters to work properly? Wikipedia describes these character sets as historical/rare characters.

Adbot
ADBOT LOVES YOU

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug
I'm curious about what's prompting this question. My immediate thought on reading your post was "why would you not support all Unicode code points"? Your description of non-BMP characters requiring "more than 2 bytes to encode" kind of implies that you're on a platform in which strings are stored internally as UTF-16, since many BMP characters require 3 bytes in UTF-8.

Your output to the browser should of course be UTF-8, and if you're using a language in which strings are internally represented as UTF-16, I'd be very surprised if there are still problems transcoding this correctly on output. I remember reading at some point that Java and/or C# would do exactly the wrong thing with non-BMP characters when encoding as UTF-8: representing each surrogate code unit as the associated three-byte UTF-8 sequence. I haven't been able to find any mention of this with some quick searching, and I would expect that kind of bug to be fixed quickly once found.

Lysidas fucked around with this message at 02:23 on Feb 19, 2014

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
If you have to answer that question, you're probably doing Unicode wrong. Use an existing library, and then doing it properly is no more difficult than half-assing it.

comedyblissoption
Mar 15, 2006

C# makes it very easy to do the wrong thing if you ever enumerate over a string as a sequence of char. Edit: specifically, it will enumerate over a character as a sequence of 2-byte characters. This is of course usually the wrong way to handle UTF-16 unless you like manually handling surrogate pairs instead of using a wrapper library.

SQL server is incapable of certain types of queries and sorting on text outside of the BMP.

I haven't verified if the C# standard libraries typically handle characters outside the BMP properly.

I was mainly wondering is it worth the effort to enforce standards like never treat a string as an enumerable of char.

comedyblissoption fucked around with this message at 03:25 on Feb 19, 2014

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug
Wow, that's hideous. I wonder how big a deal it would be in practice -- how often do you actually iterate over the characters in a string in a web app? I've been involved in a few decent-sized ones and I can't remember seeing or doing this.

Sab669
Sep 24, 2009

I've been helping my roommate with his Intro to Comp Sci homework and he just had this really obnoxious assignment.

Write a method that takes a String and returns an Array that counts the number of times each letter appears in the string.
So that "abc" would return an int with [1][1][1][0][0][0]... for each letter of the alphabet
Or "aabcc" would return [2][1][2][0]... and so on
Count any non-alpha characters in a single index

This was the best I came up with:
http://pastebin.com/ryRVxifB

Admittedly, took longer than I should've to do it and I think I sort of "cheated" because he said he didn't remember learning about casting values or break/continue. Was just wondering if you guys could think of anything cleaner?

edit; Woops, I guess that should be (int)'z'+1, or <= (int)'z'

Sab669 fucked around with this message at 03:40 on Feb 19, 2014

comedyblissoption
Mar 15, 2006

The cleanest/most readable way would be storing the values in a dictionary of character values to integer and then converting that into an array of counts. Of course, using a dictionary would probably be considered 'cheating.'

A clean 'non-cheating' way would just be breaking up the problem into functions. Have a function that computes the index of the array corresponding to the character.

The array should probably have 26 characters (unless you treat casing as different). I guess the 27th is for anything that falls outside.

You should only have to have 1 loop through the string where you simply increment the corresponding index by 1. 2 loops is unnecessary.

comedyblissoption fucked around with this message at 03:45 on Feb 19, 2014

Moon Wizard
Dec 29, 2011

Quick and dirty:
code:
public static int[] countAll(string input)
{
    input = input.ToLower();
    int[] letters = new int[27];
    foreach (char c in input)
    {
        if (c >= 'a' && c <= 'z')
            letters[c - 'a']++;
        else
            letters[26]++;
    }
    return letters;
}
edit: can't read

Moon Wizard fucked around with this message at 04:08 on Feb 19, 2014

EAT THE EGGS RICOLA
May 29, 2008

DholmbladRU posted:

Dont know if this is the proper place, howeve I am sure some of youall freelance so I thought id give it a try. I just started doing freelancing and it is a learning process. If I am performing a development job based on a SOW that does not outline intellectual property rights, who owns the intellectual property?

The company that is paying you for the work almost 100% for sure owns the IP unless your contract says otherwise.

aerique
Jul 16, 2008

rrrrrrrrrrrt posted:

Anyone else spend way too much time waffling on languages and frameworks to use for their side projects? It's a recurring problem for me. Any strategies for coping with it?

Get really good at a single high-level language which has at least some library support for 'modern' developments (JSON, Ajax). This way you can ignore the plethora of new frameworks, ideas and languages that crop up over the years and just focus on your projects.

Not saying that there won't be good stuff among those new things put you can always pick it up once it has proven itself.

Rahtas
Oct 22, 2010

RABBIT TROOP FOREVER!

nielsm posted:

Your file initially contains:
1.jpg 2.jpg 3.jpg

In first iteration intRep is 1 and intNext becomes 2. You then replace all occurrences (a single one) of "1.jpg" with "2.jpg". Your file now contains:
2.jpg 2.jpg 3.jpg

In the next iteration, intRep becomes 2, and intNext becomes 3. You then replace all (two) occurrences of "2.jpg" with "3.jpg". Your file now contains:
3.jpg 3.jpg 3.jpg

One possible workaround is to do the replacement backwards, start with the largest number and work downwards.

Oh, that makes sense. Fixed it. Thanks so much! :)

Rahtas
Oct 22, 2010

RABBIT TROOP FOREVER!
Okay! I've got this almost done, just -one- more (for the moment) interminable problem.

I have a program that will take the text from pictures, put it onto the clipboard and I can paste it where I please. I've written some vbs stuff that will run the program, get the text from a picture and paste it into a notepad, then move on to my next file and repeat.
http://pastebin.com/0dk0qKWp

The problem I'm encountering is that the picture to text isn't perfect and will spit out stuff like this:
‘ Zolié Exclusive .
u
‘f,’ All animal creatures gain the charge +1 trait.
'r .
F "W/hen roused to action, (In: animals
become a single overwhelming force.”
ih .‘ I

or

‘ \‘ . 7‘ 4-1lJ=Burn
‘ 6 ® 11+ =2Bum Dem”

“’.»‘\ clussir nf fin‘ flu‘/I.\1 Effccriu‘ mugc. Imnf—I1mi/lg
Ilmmlgr, ¢'\Icn.'im'i Ivuru. A well Iinzul firulvu/I um In‘
the dtffrrcr/u" lvetwcm vitmn‘ and dcfcat. “
~ .\I.1,~'u'rv a/'[~'ir¢‘: A Primcr a/'Sp:*/Is

Far from perfect, but I'll take what I can get. Anyhow, the problem arises in the text-grabbing-software sometimes returning some unicode characters. Apparently that messes up my filetextobj.writeline (line 29), even though I've formatted the notepad file to unicode. But it crashes my program with an error 800A005; Invalid procedure call or argument. I'm 90% sure its the unicode because I've seen similar issues with it from other people online, and when I use the program to paste normal human words, it works fine.
How can I get my script to not crash and to paste the messy unicode into the text file? Any ideas?

leftist heap
Feb 28, 2013

Fun Shoe

aerique posted:

Get really good at a single high-level language which has at least some library support for 'modern' developments (JSON, Ajax). This way you can ignore the plethora of new frameworks, ideas and languages that crop up over the years and just focus on your projects.

Not saying that there won't be good stuff among those new things put you can always pick it up once it has proven itself.

Well I already have almost 10 years of experience in Java, which I think counts as a high level language. Generally looking for something else for my side projects. That's where the waffling begins. Scala? Clojure? Python? CL?

Hornet
Nov 13, 2007
Gamer
Hello! I have a question regarding setting up custom context menu options, and more specifically running them as administrator in Windows 7.

I have a batch file that I run to turn Windows Aero on and off, but to do so, it has to be run as administrator. I created a desktop context menu option in the registry editor that will run the batch file, but it will not work without running as administrator.

Is there a way to make the batch file run as administrator through the context menu?

I hope that question makes sense (and I hope this is the right place to ask it).

MrMoo
Sep 14, 2000

Shortcuts can be defined to run as Administrator, batch files can be written to ask for elevation.

onemanlan
Oct 4, 2006
Having an issue with recalling what command in Linux Fedora(if it matters) allows one to select a column of data. I have a compressed FASTA formatted file in which I'm interested in the 6th column of data I need counted for unique lines.

Accessing the file's location and trimming it otherwise along with unique line counting is fine, but I cannot recall how to select a column of data.

Mario
Oct 29, 2006
It's-a-me!

onemanlan posted:

Having an issue with recalling what command in Linux Fedora(if it matters) allows one to select a column of data. I have a compressed FASTA formatted file in which I'm interested in the 6th column of data I need counted for unique lines.

Accessing the file's location and trimming it otherwise along with unique line counting is fine, but I cannot recall how to select a column of data.

Cut should do the trick:

Tab delimiters:
code:
cut -f 6 filename | uniq | wc -l
Space delimiters:
code:
cut -f 6 -d " " filename | uniq | wc -l

onemanlan
Oct 4, 2006
Thank you very much!

Hornet
Nov 13, 2007
Gamer

MrMoo posted:

Shortcuts can be defined to run as Administrator, batch files can be written to ask for elevation.

You know, I searched around for a while for how to do this, but I never thought of searching "ask for elevation". After I did that, I figured it out.

Thanks.

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug
:raise: FASTA files don't have columns.

Pollyanna
Mar 5, 2005

Milk's on them.


Maybe he's thinking of FASTQ files, which have headers with tab-delimited (or space-delimited, I forget) information for each sequence. Or possibly a BAM file, or something.

I still don't like using the Unix terminal to grab information from those kindsa files, it always seems like less of a pain to just use Python or something.

aerique
Jul 16, 2008

rrrrrrrrrrrt posted:

Well I already have almost 10 years of experience in Java, which I think counts as a high level language. Generally looking for something else for my side projects. That's where the waffling begins. Scala? Clojure? Python? CL?

Right, I didn't get from your previous post that you're actually looking for something new to work in. I thought you got lost in the frameworks and languages forest and never got to work on a project.

There isn't really a good answer I think, just opinions.

CL is the answer.

onemanlan
Oct 4, 2006
Edit double post

onemanlan fucked around with this message at 14:33 on Feb 20, 2014

onemanlan
Oct 4, 2006
Ah yeah FASTA files are part of another problem. Currently working with a tab delimited file for this human proteome search problem.

Mario posted:

Cut should do the trick:

Tab delimiters:
code:
cut -f 6 filename | uniq | wc -l
Space delimiters:
code:
cut -f 6 -d " " filename | uniq | wc -l

Ok, here is the problem:

Needed to download human proteome and search it for unique protein sequences with a single line of bash and as many commands and/or pipes as need be. I have the file in download as a .tsv.gz file. Here is what I have


code:
zcat 9606.tsv.gz | tail -n+4 | cut -f 6 | uniq -cu | wc -l >humanprotein.sh
nano humanprotein.sh
[within humanprotein.sh] echo "#" [exit .sh file]
./humanprotein.sh
#
Alternatively I think can just use
code:
nano humanprotein.sh
	zcat 9606.tsv.gz | tail -n+4 | cut -f 6 | uniq -cu | wc -l  [save & exit]
./humanprotein.sh
#
Could be hellishly redundent, but I feel like I'm missing something big as it's spitting out the # of 70,000 or so, where there are far fewer within the human genome. Some how I'm overlooking something. Am I mistaking the .sh file for a line of bash? Is the command line that I fed into 'humanprotein.sh' considered bash? A bit lost, but having fun.

Can't I just make a the humanprotein.sh' file then add the above lines I previously listed into the .sh file and have it execute and read out a number instead?

onemanlan fucked around with this message at 14:54 on Feb 20, 2014

Kilson
Jan 16, 2003

I EAT LITTLE CHILDREN FOR BREAKFAST !!11!!1!!!!111!

onemanlan posted:

Alternatively I think can just use
code:
nano humanprotein.sh
	zcat 9606.tsv.gz | tail -n+4 | cut -f 6 | uniq -cu | wc -l  [save & exit]
./humanprotein.sh
#
Could be hellishly redundent, but I feel like I'm missing something big as it's spitting out the # of 70,000 or so, where there are far fewer within the human genome. Some how I'm overlooking something. Am I mistaking the .sh file for a line of bash? Is the command line that I fed into 'humanprotein.sh' considered bash? A bit lost, but having fun.

Can't I just make a the humanprotein.sh' file then add the above lines I previously listed into the .sh file and have it execute and read out a number instead?

If the file is a .gz file then it's probably compressed and you need to decompress it before you run your script.

Edit: never mind, missed that you used zcat

Computer viking
May 30, 2011
Now with less breakage.

First, to fix your bug: put a sort in there before the uniq. Uniq is a fairly stupid tool, and only works with sorted input.
code:
genlab4: /tmp > zcat 9606.tsv.gz | tail -n+4 | cut -f6 | uniq | wc -l
75131
genlab4: /tmp > zcat 9606.tsv.gz | tail -n+4 | cut -f6 | sort | uniq | wc -l
5523
genlab4: /tmp > zcat 9606.tsv.gz | tail -n+4 | cut -f6 | sort | uniq -u | wc -l
899
BTW, you are counting "IDs that only occur once", not "number of different IDs". (The -u flag to unique means "drop things that occur more than once"). That's fine if that was what you meant to do. :)

The entire line from zcat to wc is valid sh (and bash, and probably most shell) code, so you can either run it on the command line, or put it in a .sh file and execute that.

If you put it in a .sh file, you could easily make it take the filename as an argument instead of hard-coding it. Say you have a file test.sh that contains "echo $3 $2 $1", and run it as "./test.sh one two three": It will print "three two one". Given that you should be able to write something that would let you do "./humangenome.sh 9606.tsv.gz" .

Oh, and a personal thing: Using + with tail doesn't work everywhere; it fails on my FreeBSD machines (since -n+ is a GNU extension and not implemented in BSD tail). A more robust solution is to use grep -v '^#' , which drops all lines that start with a # . (For grep, -v means "everything except", and ^ is interpreted as meaning "the beginning of the line" ... so it reads as "everything except lines that start with #"). If you specifically need to delete the first 4 lines, you could also use "sed 1,4d" , which I think should work everywhere.

Computer viking fucked around with this message at 16:20 on Feb 20, 2014

omeg
Sep 3, 2012

I have a 32bpp bitmap of given size and a sorted list of memory pages (of constant size) that changed within. I need to convert that page list into a minimal set of non-overlapping rectangles that cover all changed pixels. Any fast algorithm that can do that?

E: I guess just a simple for loop will do :v:

omeg fucked around with this message at 18:33 on Feb 20, 2014

JetsGuy
Sep 17, 2003

science + hockey
=
LASER SKATES
I've been banging my head against what I'm sure is a stupidly easy problem. I'm sorry if this is a bad place for this, I couldn't find a MATLAB specific thread.

I'm writing a GUI in MATLAB where a user can select a file from a listbox in matlab and then there are several fields and groups of checkboxes that display to the user what data is in that file and other information.

I know I can make it so that as soon as the user edits any of these fields, I can use a get(hObject, 'Value') command in the callback to pull in the new value. This is an ok solution, but not the one I want.

I'd like for the GUI to understand when any of the fields or check boxes within the panel have been changed off their initial values and prevent the user from selecting a new file from the listbox until they confirm that the changes are what they want. I read about uiwait() but that seems like it'd be *too* restrictive. That is, if a user edits one field, I want them to be able to edit them all before committing the changes.

I guess part of the reason I've had trouble finding something is I don't really know how to articulate "partial wait" or some such in my searches.

Thanks a bunch y'all!

SurgicalOntologist
Jun 17, 2004

JetsGuy posted:

I'm writing a GUI in MATLAB

I'm sorry to hear.

JetsGuy posted:

I'd like for the GUI to understand when any of the fields or check boxes within the panel have been changed off their initial values and prevent the user from selecting a new file from the listbox until they confirm that the changes are what they want. I read about uiwait() but that seems like it'd be *too* restrictive. That is, if a user edits one field, I want them to be able to edit them all before committing the changes.

Is this what you want? Do this in your callback that runs when a selection is made:

code:
set(h, 'Enable', 'off')
where h is the handle of the listbox. Then in the callback that runs after changes are committed:

code:
set(h, 'Enable', 'on')
Note that how you get a handle h into a different callback depends on how you organize your code, whether you're using GUIDE, etc.

SurgicalOntologist fucked around with this message at 01:51 on Feb 21, 2014

JetsGuy
Sep 17, 2003

science + hockey
=
LASER SKATES

SurgicalOntologist posted:

I'm sorry to hear.

Yah. I know. Work. :smith:

SurgicalOntologist posted:

Is this what you want? Do this in your callback that runs when a selection is made:

code:
set(h, 'Enable', 'off')
where h is the handle of the listbox. Then in the callback that runs after changes are committed:

code:
set(h, 'Enable', 'on')
Note that how you get a handle h into a different callback depends on how you organize your code, whether you're using GUIDE, etc.

This looks like it'll work! Thanks! I'm new to MATLAB so sorry if this was an easy one (yeah I'm using GUIDE).

SurgicalOntologist
Jun 17, 2004

No problem, and no it's not easy. I figured it all out by trial and error, over a couple years of Stockholm Syndrome making Matlab GUIs.

Once you have access to all your handles, everything sort of falls into place. The doc page that lists the get/set attributes of each object is your best friend.

Pollyanna
Mar 5, 2005

Milk's on them.


JetsGuy posted:

I'm writing a GUI in MATLAB

Just letting you know that I had to scale down my portion of my senior year project precisely because of how loving horrible it is to make a GUI in MATLAB. I feel your pain. :(

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

I took a MATLAB class in college. Why would you ever need to do a GUI in it? That sounds awful.

Modern Pragmatist
Aug 20, 2008

MeramJert posted:

I took a MATLAB class in college. Why would you ever need to do a GUI in it? That sounds awful.

In the scientific community a lot of people only ever learned Matlab (albeit poorly) and therefore don't know the difference between that and a real programming language. As a result, they use it for everything. :ughh:

I have seen an encouraging trend lately of Python being taught over Matlab in introductory science / engineering courses. (Not like Python has a good GUI library either...)

ufarn
May 30, 2009
I am stumped by a GET URL scheme that keeps being incredibly convoluted.

I am doing it in Django, and it currently looks like this:

code:
/post/3/spell-checkers/manage/16/
In other words

code:
{{ POST_OBJECT }}/{{ POST_ID }}/{{ USER_PERMISSION }}/manage/{{ USER_ID }}/
code:
{{ POST_OBJECT }}
    /{{ POST_ID }}
        /{{ USER_PERMISSION }}
            /{{ VERB }}
                /{{ USER_ID }}
The POST request allows - or disallows - people to edit a post by appointing them spell-checkers of that post.

This project I abandoned a while ago, probably because I had something as abstruse as this at the time. Because of how long it's been since I were in the ditches with the project, my prospects of figuring it out probably haven't gone up since. The idea was to set up user permissions/roles for post objects.

This looks absolutely insane, and there is no way I am going to do it like this, but what would be a good way to do this?

ufarn fucked around with this message at 15:29 on Feb 21, 2014

JetsGuy
Sep 17, 2003

science + hockey
=
LASER SKATES

SurgicalOntologist posted:

No problem, and no it's not easy. I figured it all out by trial and error, over a couple years of Stockholm Syndrome making Matlab GUIs.

Once you have access to all your handles, everything sort of falls into place. The doc page that lists the get/set attributes of each object is your best friend.

Yeah, I've found that MATLAB isn't too too bad once you get the feel for the handles.

Although after dealing with the horrors that are IDL GUIs maybe I'm just biased.

Pollyanna posted:

Just letting you know that I had to scale down my portion of my senior year project precisely because of how loving horrible it is to make a GUI in MATLAB. I feel your pain. :(

It's bad, but remarkably better than when I had to do a python GUI a few years ago and it was on and for Apple machines only. Apparently, all the good libraries don't have full functionality on the Apple side at all for some dumb reason.

MeramJert posted:

I took a MATLAB class in college. Why would you ever need to do a GUI in it? That sounds awful.

You do what the job and your coworkers want my friend. All the machines in question have MATLAB. Not all the machines have python. I don't know any cpp, so MATLAB it is! :/

Modern Pragmatist posted:

In the scientific community a lot of people only ever learned Matlab (albeit poorly) and therefore don't know the difference between that and a real programming language. As a result, they use it for everything. :ughh:

I have seen an encouraging trend lately of Python being taught over Matlab in introductory science / engineering courses. (Not like Python has a good GUI library either...)

I did python largely all through grad school and as a postdoc. At best, I'd say I have a "scientific" level of coding ability in it. I also had to write some things in IDL (shudder) and edit some code in FORTRAN90 ( :suicide: ).

MATLAB I have to learn for the job I am at. Not my favorite language, but as you say, it's not like Python has any better in the way of GUI libraries.

Don't even get me started on how terrible IDL is overall.

JetsGuy fucked around with this message at 16:21 on Feb 21, 2014

JawnV6
Jul 4, 2004

So hot ...
Matlab has hooks for C#, right? I'm writing a lot of GUI's for nontechnical folks and C# has been a godsend.

Easy charting and i/o, nice hooks into the various DAQ and prototyping modules around.

SurgicalOntologist
Jun 17, 2004

I think if you're looking into extending Matlab, Java is the way to go, since the whole Matlab application is built in Java you can just straight up make Java calls.

More here:
http://undocumentedmatlab.com/

JetsGuy
Sep 17, 2003

science + hockey
=
LASER SKATES
by the way, the Enable/Disable worked exactly as I wanted. Thanks again, SurgicalOntologist.

Adbot
ADBOT LOVES YOU

onemanlan
Oct 4, 2006

Computer viking posted:

BTW, you are counting "IDs that only occur once", not "number of different IDs". (The -u flag to unique means "drop things that occur more than once"). That's fine if that was what you meant to do. :)

The entire line from zcat to wc is valid sh (and bash, and probably most shell) code, so you can either run it on the command line, or put it in a .sh file and execute that.

If you put it in a .sh file, you could easily make it take the filename as an argument instead of hard-coding it. Say you have a file test.sh that contains "echo $3 $2 $1", and run it as "./test.sh one two three": It will print "three two one". Given that you should be able to write something that would let you do "./humangenome.sh 9606.tsv.gz" .

Oh, and a personal thing: Using + with tail doesn't work everywhere; it fails on my FreeBSD machines (since -n+ is a GNU extension and not implemented in BSD tail). A more robust solution is to use grep -v '^#' , which drops all lines that start with a # . (For grep, -v means "everything except", and ^ is interpreted as meaning "the beginning of the line" ... so it reads as "everything except lines that start with #"). If you specifically need to delete the first 4 lines, you could also use "sed 1,4d" , which I think should work everywhere.

You're a boss. I was able to make the file you suggested and was able to learn a bit more about it. Pretty neat, but drat this mess could get complex. Going to be back here for a few more related questions to Linux bash commands soon.

onemanlan fucked around with this message at 22:59 on Feb 21, 2014

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply