terrible programmers: my god, it's full of chars

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > terrible programmers: my god, it's full of chars

«‹›1446 »

qntm: Jun 17, 2009

Bloody posted:

idk why you'd ever use git from a command line

every Git GUI I've ever used eventually gives up and tells you to run certain commands manually, or just gives up entirely

# ? Jul 13, 2016 15:44

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 21:59

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

HoboMan posted:

git Suits My Needs and is super good as long as you remember what the command is. i have to google how to remove a tag every time

i once found a tag called "rm". still makes me chuckle sadly

# ? Jul 13, 2016 15:44

jony neuemonic: Nov 13, 2009

Power Ambient posted:

for me its because every gui has loving sucked also i have a sweet mechanical kb so typing is very good

same and same.

somehow i've become the resident git expert at work. i think it's because i know how to rebase.

# ? Jul 13, 2016 15:51

Bloody: Mar 3, 2013

qntm posted:

every Git GUI I've ever used eventually gives up and tells you to run certain commands manually, or just gives up entirely

stop breaking poo poo so badly then i guess

# ? Jul 13, 2016 15:54

Luigi Thirty: Apr 30, 2006; Emergency confection port.

I use sourcetree and it is needs-suiting

# ? Jul 13, 2016 15:55

DONT THREAD ON ME: Oct 1, 2002; by Nyc_Tattoo; Floss Finder

GameCube posted:

lol this might be it. God dammit

lol plz keep us updated

i assumed that your thing was different because my thing happened in ruby which is prone to errors like that

# ? Jul 13, 2016 16:03

DONT THREAD ON ME: Oct 1, 2002; by Nyc_Tattoo; Floss Finder

also the go 'debugger' repl is so lol

# ? Jul 13, 2016 16:07

Sapozhnik: Jan 2, 2005; Nap Ghost

GameCube posted:

lol this might be it. God dammit

lol this is shameful

wouldn't it be great if http actually already had a dedicated status code for a uri that's too long? no, surely the protocol's designers would never think of doing that.

# ? Jul 13, 2016 16:17

HoboMan: Nov 4, 2010

wait, this is a thing? how long is "too long"? this might gently caress me down the road

# ? Jul 13, 2016 16:21

netcat: Apr 29, 2008

jony neuemonic posted:

same and same.

somehow i've become the resident git expert at work. i think it's because i know how to rebase.

lol same. I also know aboyt the awesome power of "git reflog" so I can magically restore everyone's broken branches when they inevitable gently caress up a rebase.

# ? Jul 13, 2016 16:36

abraham linksys: Sep 6, 2010

afaik there's no hard-defined url length limit in clients or servers, or all those sites that work by base64ing user-generated content in a query parameter wouldn't work? it's just something you have to configure on your server end (nginx: http://nginx.org/en/docs/http/ngx_http_core_module.html#large_client_header_buffers, gunicorn: http://docs.gunicorn.org/en/latest/settings.html?highlight=limit_request_line#limit-request-line)

# ? Jul 13, 2016 16:41

MononcQc: May 29, 2007

Mr Dog posted:

lol this is shameful

wouldn't it be great if http actually already had a dedicated status code for a uri that's too long? no, surely the protocol's designers would never think of doing that.

status 414 only works if the server complains, not if (probably) the client silently truncates the URL and sends it to the server, which correctly says it is a 404 if it could support longer.

# ? Jul 13, 2016 16:41

brap: Aug 23, 2004; Grimey Drawer

sourcetree is good for git

if im doing anything besides git add, git commit, git push, i usually do it in sourcetree

Zemyla posted:

Why is using snapshots instead of changesets a good idea?

it's a lot faster to do blames and poo poo

brap fucked around with this message at 17:11 on Jul 13, 2016

# ? Jul 13, 2016 17:09

vodkat: Jun 30, 2012; cannot legally be sold as vodka

Hey I'm trying to work with matching names between two quite large databases and I was wondering if anyone here had some tips.

Firstly are their any packages for python that will make this sort of thing more painless? for removing all of the edge case prefix and postfixes that people love to enter for no reason. And secondly, whats the best way to handle slight differences in names between the databases, for example inconsistent use of middle names, last name/firstnames etc? I've seen some stack exchange answers suggesting fuzzy matching them but I'm not sure what the best way to implement this is.

It seems like this would be the sort of thing that people must run into all the time, but as a p. lovely programmer I'm not really sure what I should be doing.

# ? Jul 13, 2016 17:59

Bloody: Mar 3, 2013

levenshtein distance can be a decent metric for fuzzy string matching

# ? Jul 13, 2016 18:28

Shaman Linavi: Apr 3, 2012

i used fuzzywuzzy for one little project and it seemed to work ok

# ? Jul 13, 2016 18:44

Deep Dish Fuckfest: Sep 6, 2006; Advanced
Computer Touching; Toilet Rascal

vodkat posted:

Hey I'm trying to work with matching names between two quite large databases and I was wondering if anyone here had some tips.

Firstly are their any packages for python that will make this sort of thing more painless? for removing all of the edge case prefix and postfixes that people love to enter for no reason. And secondly, whats the best way to handle slight differences in names between the databases, for example inconsistent use of middle names, last name/firstnames etc? I've seen some stack exchange answers suggesting fuzzy matching them but I'm not sure what the best way to implement this is.

It seems like this would be the sort of thing that people must run into all the time, but as a p. lovely programmer I'm not really sure what I should be doing.

there really isn't a one size fits all solution

cleaning and homogenizing data from different sources is always a huge pain. the "best" solution usually depends on what kind of errors are ok for whatever you're doing. in some cases false matches have to be avoided at all costs, so you just keep perfect matches and discard everything else. in other cases you know that one db has a tendency to have some prefixes or suffixes on names, so you just build up a list of the most common ones, do a first filter pass to remove those, and then do a perfect match between the result and the other db. or if false matches are ok with you, then yeah, doing some fuzzy matchings between the dbs and not giving much of a gently caress about it beyond that can work

there's also the issue of what "quite large database" means, because dealing with a few gigabytes versus something in the hundreds of terabytes range requires different approaches. if you're dealing with names though i'm guessing it's probably the former

# ? Jul 13, 2016 18:53

fritz: Jul 26, 2003

qntm posted:

I like that creating a branch is git checkout -b not e.g. git create branch

git branch branchname
?

# ? Jul 13, 2016 18:59

fritz: Jul 26, 2003

my stepdads beer posted:

i have to google how to unstage every time, or view staged diffs

"git status" tells you how to unstage:

code:

Your branch is up-to-date with '...'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
....
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
....
Untracked files:
  (use "git add <file>..." to include in what will be committed)

# ? Jul 13, 2016 19:00

qntm: Jun 17, 2009

fritz posted:

git branch branchname
?

okay so now the question is why there are two commands which do the same thing and every tutorial recommends the stupidly-named one

# ? Jul 13, 2016 19:28

vodkat: Jun 30, 2012; cannot legally be sold as vodka

YeOldeButchere posted:

there really isn't a one size fits all solution

cleaning and homogenizing data from different sources is always a huge pain. the "best" solution usually depends on what kind of errors are ok for whatever you're doing. in some cases false matches have to be avoided at all costs, so you just keep perfect matches and discard everything else. in other cases you know that one db has a tendency to have some prefixes or suffixes on names, so you just build up a list of the most common ones, do a first filter pass to remove those, and then do a perfect match between the result and the other db. or if false matches are ok with you, then yeah, doing some fuzzy matchings between the dbs and not giving much of a gently caress about it beyond that can work

there's also the issue of what "quite large database" means, because dealing with a few gigabytes versus something in the hundreds of terabytes range requires different approaches. if you're dealing with names though i'm guessing it's probably the former

The database is just short of a gig which I guess is pretty small fry for most of the people here but as an academic and very lovely programmer its starting to test my knowledge and abilities quite a bit.

Having looked at fuzzywuzzy it seems like that might be what I need use but how do you define when a match is good enough? is it a matter of simply plugging in number and seeing what the result is or is there a better way than trial and error testing?

# ? Jul 13, 2016 19:30

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

why is HEAD always written in all caps, and please tell me it's case sensitive because that would be the most unixy thing ever

# ? Jul 13, 2016 19:30

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

qntm posted:

okay so now the question is why there are two commands which do the same thing and every tutorial recommends the stupidly-named one

because gently caress you, that's why

NihilCredo posted:

why is HEAD always written in all caps, and please tell me it's case sensitive because that would be the most unixy thing ever

because gently caress you, that's why

# ? Jul 13, 2016 19:36

Shaman Linavi: Apr 3, 2012

vodkat posted:

Having looked at fuzzywuzzy it seems like that might be what I need use but how do you define when a match is good enough? is it a matter of simply plugging in number and seeing what the result is or is there a better way than trial and error testing?

that is exactly how i used it to check if user input is in a list.
if the input isnt in the list i have fuzzywuzzy check the input against the list and rip out strings over a certain value.
i just fudged around with the value until common spelling errors were giving back what i thought they should from the list.

# ? Jul 13, 2016 19:38

VikingofRock: Aug 24, 2008

qntm posted:

okay so now the question is why there are two commands which do the same thing and every tutorial recommends the stupidly-named one

Those commands don't actually do the same thing. git branch foo creates a branch called foo but doesn't switch to it, whereas git checkout -b foo creates a branch called foo and switches to that branch.

# ? Jul 13, 2016 19:55

qntm: Jun 17, 2009

VikingofRock posted:

Those commands don't actually do the same thing. git branch foo creates a branch called foo but doesn't switch to it, whereas git checkout -b foo creates a branch called foo and switches to that branch.

then there should be a variant of git branch which also switches to the newly-created branch, putting it on git checkout makes no sense

# ? Jul 13, 2016 20:00

VikingofRock: Aug 24, 2008

Presumably the tutorials use git checkout -b foo because it's one less command you have to type, and git made the combo command "git checkout -b" instead of "git branch -c" because thought the checkout was the more important half of the operation. Or maybe they just want checkout to do literally everything.

disclaimer: git branch -c or the like might be a thing, but I'm on my phone so I can't check

# ? Jul 13, 2016 20:02

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

all of this git talk is really confusing to me because i think we have our own tooling on top of whatever svn already does. when i make a "branch", i get a branch on the server and then all of that is checked out into a folder named after the branch on my computer and that's where i do my work. then i commit that to the branch on the server. when it's ready to go to trunk, our internal tool merges my branch into a local copy of trunk on my computer. then i commit that to trunk on the server.

how does that translate to gitspeak?

# ? Jul 13, 2016 20:40

HoboMan: Nov 4, 2010

my git workflow

code:

# git status
# git commit -a -m "poo poo is probably less broken now"
# git status
# git tag v69(r219)
# git log
# git push --tags

# ? Jul 13, 2016 20:47

Deep Dish Fuckfest: Sep 6, 2006; Advanced
Computer Touching; Toilet Rascal

vodkat posted:

The database is just short of a gig which I guess is pretty small fry for most of the people here but as an academic and very lovely programmer its starting to test my knowledge and abilities quite a bit.

Having looked at fuzzywuzzy it seems like that might be what I need use but how do you define when a match is good enough? is it a matter of simply plugging in number and seeing what the result is or is there a better way than trial and error testing?

since it fits in memory then you can do more or less whatever you want with it, so that's good

if you do go with fuzzy matching stuff, then yeah, there's no way around the fact that you'll have to define some arbitrary threshold as to what constitutes a match and what doesn't. most of the time it gets chosen through a very technical empirical process called "loving around with it until it looks good enough". i mean, doing that usually means writing a bit of code to figure out basic stats like how many match a given threshold gives you or how many rows in one db match to more than one row in the other (which will skyrocket if your threshold is too lenient), but that sort of stuff is exactly why cleaning up data always sucks

you should try to do as much processing to make things homogeneous before you try the fuzzy matching, though. the stuff like removing common prefixes or suffixes should be easy enough to do if you have a lot of that, and it will help with the fuzzy matching afterwards. for example if you're using edit distance (the levenshtein distance bloody mentioned earlier), then "Mr. Bob Smith" with "Mr. " removed would match right away with "Bob Smith" instead of requiring a threshold of 4 to account for the deletion of the prefix, which would also make it match with "Mr. Bob Smithwick" (4 character insertions) or "Mr. John Smith" (3 character replacements and 1 insertion) neither of which are what you want

# ? Jul 13, 2016 20:55

Sapozhnik: Jan 2, 2005; Nap Ghost

Progressive JPEG posted:

lol if every commit isn't just swearing with increasing intensity

# ? Jul 13, 2016 21:18

Bloody: Mar 3, 2013

YeOldeButchere posted:

since it fits in memory then you can do more or less whatever you want with it, so that's good

if you do go with fuzzy matching stuff, then yeah, there's no way around the fact that you'll have to define some arbitrary threshold as to what constitutes a match and what doesn't. most of the time it gets chosen through a very technical empirical process called "loving around with it until it looks good enough". i mean, doing that usually means writing a bit of code to figure out basic stats like how many match a given threshold gives you or how many rows in one db match to more than one row in the other (which will skyrocket if your threshold is too lenient), but that sort of stuff is exactly why cleaning up data always sucks

you should try to do as much processing to make things homogeneous before you try the fuzzy matching, though. the stuff like removing common prefixes or suffixes should be easy enough to do if you have a lot of that, and it will help with the fuzzy matching afterwards. for example if you're using edit distance (the levenshtein distance bloody mentioned earlier), then "Mr. Bob Smith" with "Mr. " removed would match right away with "Bob Smith" instead of requiring a threshold of 4 to account for the deletion of the prefix, which would also make it match with "Mr. Bob Smithwick" (4 character insertions) or "Mr. John Smith" (3 character replacements and 1 insertion) neither of which are what you want

set a threshold by scoring a ton of poo poo against random strings, calculate the standard deviation, and multiply by three

# ? Jul 13, 2016 21:25

HoboMan: Nov 4, 2010

remember to make sure your random strings have uniform distribution!

# ? Jul 13, 2016 21:36

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

Mr Dog posted:

i used some semi-inappropriate word as a temporary variable name while i was testing something and then immediately regretted it when i forgot to remove it and committed it. i caught it in review and removed it.

that was the first time i forgot to remove a temporary testing variable like that. it was also the first time i used something that was not just "qqqqq" as the name.

immediately it bit me.

# ? Jul 13, 2016 21:41

HoboMan: Nov 4, 2010

Wheany posted:

i used some semi-inappropriate word as a temporary variable name while i was testing something and then immediately regretted it when i forgot to remove it and committed it. i caught it in review and removed it.

that was the first time i forgot to remove a temporary testing variable like that. it was also the first time i used something that was not just "qqqqq" as the name.

immediately it bit me.

lol, i came here to post that i just sent a poo and fart filled test framework for code review, woops

# ? Jul 13, 2016 21:49

Bloody: Mar 3, 2013

HoboMan posted:

remember to make sure your random strings have uniform distribution!

hmm actually shouldn't their distribution match like typical letter distribution?

# ? Jul 13, 2016 21:59

Luigi Thirty: Apr 30, 2006; Emergency confection port.

that's what they want you to think

# ? Jul 13, 2016 22:01

HappyHippo: Nov 19, 2003; Do you have an Air Miles Card?

YeOldeButchere posted:

since it fits in memory then you can do more or less whatever you want with it, so that's good

if you do go with fuzzy matching stuff, then yeah, there's no way around the fact that you'll have to define some arbitrary threshold as to what constitutes a match and what doesn't. most of the time it gets chosen through a very technical empirical process called "loving around with it until it looks good enough". i mean, doing that usually means writing a bit of code to figure out basic stats like how many match a given threshold gives you or how many rows in one db match to more than one row in the other (which will skyrocket if your threshold is too lenient), but that sort of stuff is exactly why cleaning up data always sucks

you should try to do as much processing to make things homogeneous before you try the fuzzy matching, though. the stuff like removing common prefixes or suffixes should be easy enough to do if you have a lot of that, and it will help with the fuzzy matching afterwards. for example if you're using edit distance (the levenshtein distance bloody mentioned earlier), then "Mr. Bob Smith" with "Mr. " removed would match right away with "Bob Smith" instead of requiring a threshold of 4 to account for the deletion of the prefix, which would also make it match with "Mr. Bob Smithwick" (4 character insertions) or "Mr. John Smith" (3 character replacements and 1 insertion) neither of which are what you want

also if the number of close matches is small enough you can possibly resolve them manually. assuming this is an operation you only want to do once.

# ? Jul 13, 2016 22:36

HoboMan: Nov 4, 2010

Bloody posted:

hmm actually shouldn't their distribution match like typical letter distribution?

probably

ok, be sure to find the character probability of your set and then make a distribution skewed to match that probability (including average string length)!

at least i think for the matching problem you want the probability of your set and not the general occurrence probability

# ? Jul 13, 2016 22:43

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 21:59

Sapozhnik: Jan 2, 2005; Nap Ghost

I'm gonna open source some hobby code I wrote a while back and the poo poo I was writing even five years ago is goddamn embarassing. And it's all there in the Git history for people to point and laugh at.

At least I'm in the right thread!

# ? Jul 13, 2016 22:44

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > terrible programmers: my god, it's full of chars

«‹›1446 »