Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Jo
Jan 24, 2005

:allears:
Soiled Meat
I'm totally stoked because I finally solved an issue that had been plaguing me for probably two weeks.

I had made some changes to a job that ran regularly. Due to a bad merge (or a good merge, but with unexpected integration side-effects), one of the tasks was failing at startup. I dug around for a while, found the issue, and fixed it. Shortly thereafter, a subtask was raised saying the task was running three times. I, having just hosed with the code, figured I'd hosed up. I tried digging around, thinking maybe there was an old copy of the application scheduler sitting around on the machine, but nope. Okay, said I, I'll just roll back my change. Still seeing it. Maybe the integration/import screwed up things? Ugh.

The job was scheduled to run every five minutes, so it was a matter of "make change, recompile, deploy quickly, wait," while juggling other tickets and keeping the higher-ups happy with sprint velocity. "Maybe the task is failing and rescheduling itself? Maybe it's a race condition with two workers grabbing the same task? Maybe it's maybeline."

After getting nowhere for a week, putting a breakpoint, marching up and down the function, I had to consult the stupidly obvious solution that I should have considered from the get-go. Maybe the task and code are fine, but there are multiple copies of the scheduler running. I shut down the scheduler on the machine and wait, letting tail -f dump the log to console.

Two copies scheduled.

Progress!

Now I know there are two machine on the network that are running the scheduler. How? We've only got maybe three machines under our supervision. The QA environment is fine. Staging is fine. Staging scheduler is fine. I've killed the scheduler on all those boxes, but still two tasks are made every five minutes. I sit, listening with netstat for connections. Unfortunately, the connection seems to be negotiated and closed faster than netstat can pick it up. I finally dump connections that are made to the database and see two machines that I didn't know about. One of them is a sparkly new development machine that was just given to us. The other is unknown. I ask around the office.

Turns out that someone on another team asked for an instance of our application so he could play with our not-yet-released software. DevOps made a clone of our staging environment and didn't change any of the environment variables, so we had two schedulers pointed to the same workers.

In hindsight, it's a stupidly simple problem I should have solved quickly, had I not been confused by a coincidence in timing. :downs: ? Yes, but I'm so glad it's solved.

Adbot
ADBOT LOVES YOU

Jo
Jan 24, 2005

:allears:
Soiled Meat

ChickenWing posted:

Java devs: how do you feel about intellij idea? I'm working with Spring Tool Suite at work (Spring-focused eclipse distro) and I'm interested in seeing what idea has to offer, but I'm having issues finding out how to do all the stuff I'm used to doing in eclipse and I want to know if it's worth it or not.

IntelliJ is brilliant and I will fight with anyone speaking to the contrary. Coming from the world of Eclipse it's so much faster and less cumbersome, and it seems to integrate better when I've got custom gradel files. I can say, "Run this gradle file and attach the debugger" and it works.

It's not perfect. Their update system is stuck in an older era and loving with language levels is a pain in the rear end (particularly for multi-project builds). All the same, I swear by it.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Khisanth Magus posted:

I hate places where you can't point out the person who actually broke something and tell them to fix their poo poo. At a minimum you should be able to talk directly to the person and tell them that they broke X and need to fix it.

I pushed a change to our QA branch and the database access count went through the roof after a deploy. I said, "Hey X, I think the problem is related to this operation. Could you take a look?" to give him an out in the hopes that he'd recognize his mistake (a missing prefetch) and fix the problem. Instead, he replies all, "Hey, Jo, your code broke this and caused the storm. I fixed it." He removed a non-functional line that I had added and, in his commit, also subtly included a fix for his fuckup.

I still get a bit flustered when I think about it.

Jo
Jan 24, 2005

:allears:
Soiled Meat

I should clarify "non-functional". In this context, it means an operation which evaluates to the same thing.

Django rest framework supports objects called serializers. Someone had written the following:

code:
class ModelSerializer(blah):
  this_output = serializer.SerializerMethodField("someMethod")

  def someMethod(self, ob):
    return ob.child.value
The equivalent operation is

code:
class ModelSerializer(blah):
  this_output = serializer.CharField(source="child.value")
The code does the same thing, but saves the overhead of a function call on each object. Likely whoever wrote it didn't know about that or didn't realize it. All unit tests (and eyeball tests) and theory shows that the code should return the same values.

Jo
Jan 24, 2005

:allears:
Soiled Meat

IAmKale posted:

Do you guys have any good resources for practicing for technical interviews? I'm trying to make a slight career adjustment away from overly broad "do everything IT-related" into a more backend development-oriented line of work, but I've never never had to face interviews more suitable for programming jobs.

I've done a few technical interviews, so if you want I'm happy to grill you over a specific position.

Jo
Jan 24, 2005

:allears:
Soiled Meat
Jumping on the earlier topic: we used to do pair programming at my office, and I rather liked for specific tasks. It was handy when we didn't know precisely how to architect a solution but knew what we wanted. We never end up writing a LOT of lines of code, maybe 100 at most. The biggest thing is designing what kind of models we want in our solution and figuring out what the best way to save and modify those would be.

Revisiting a topic I touched upon even earlier: a lot of the work that I do (when I'm not on fire covering for people) involves long-running background tasks. We've got workers that are supposed to churn through terabytes of text data and run LDA. That takes a long time and leaves tickets worth three or four points in the sprint for months. It's kinda' bugging me out to see my tickets still open across two months. Not sure if there's a better way to handle it.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Vulture Culture posted:

JawnV6 started to get to this, and it's something that's obvious but needs to be explicitly stated: it's equally important to actually review problematic estimates in your retrospectives. Was something much easier than you expected because some of the plumbing was done for some unrelated story some number of weeks ago? Was it much harder because the interaction flow for this feature was incompatible with the flow through the rest of the product and substantial portions had to be completely redesigned? These are qualitative lessons that the team should be learning, that should be part of the oral history of your team. Like Lean, Agile is beyond useless if you're not constantly learning. Moving the goalposts to make yourselves feel better about your estimates isn't learning.

I very much agree that it's important to look back at estimates, but I also have to wonder if the variability in performance for single developers is high. I'd imagine the natural variance in a single person's performance contributes about as much as a bad estimate.

Jo
Jan 24, 2005

:allears:
Soiled Meat
I think I'm burning out pretty hard and pulling down the rest of my team. Not entirely sure how to broach the subject with my boss.

:siren: Whiny rant. Venting. Please ignore. :siren:

I've been doing the AI/ML work for our project, particularly as it relates to search and document clustering. My stuff is way behind, but I get to the office and sit down with a task I've broken into what I think is a tiny digestible piece like, "write document models to preproduction," and can scarcely muster the energy to do that. Partially, I think it has to do with the knowledge that it's going to take a few days to write all the documents to the database, there's a 50/50 chance it will crash in the middle and I'll have to start over, and ideally I'll catch poo poo from our DBA who insists I need to find a smarter way to insert records. He's right, of course -- I'm not a good developer. But loving hell, it's a multi-million dollar company and we have maybe 60 users of the database. Is it really unreasonable to insert a few million rows on a Friday afternoon?

I feel like everything I do takes longer than it should by a factor of 10. I'm about 99% sure it's my fault, too, but when it comes to actionable solutions, I'm coming up blank and the continuous pressure of underperforming has crushed my will. I don't even feel like working on my personal projects any more.

:siren: This has been a test of the emergency whiny rant system. :siren:

Jo
Jan 24, 2005

:allears:
Soiled Meat

AskYourself posted:

I'm not sure you were looking for technical advice or not but if yes :
Are you using SQL Bulk Insert and SQL Merge or the equivalent of your RDBMS ? Inserting a few million rows using individual insert statement is a big no-no.

Bulk insert, though it could conceivably be improved further by making a file and running COPY. Of course, to do that I'd have to write to disk first, but my disk isn't big enough, so I could get an external hard drive. I tried that once, but it turns out the network copy speed for big files is a little too slow, so instead I wanted to put it on a machine that's on the network, except none of them have drives big enough. That's okay, I can rewrite stuff to produce batches of data and push each incrementally. How do I know I've gotten all the records inserted? That's not hard, I'll just sort them.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Mniot posted:

I'm not sure what DB you're using. "Insert 1 million rows" should absolutely take less than a day, but it requires implementation-specific knowledge and getting it wrong is fairly easy. If this is PostgreSQL, you are making a huge mistake to not use COPY, which was 20x faster than INSERT last time I measured. (The actual difference for you will completely depend on your data and tables, of course.) You can stream the data to the server; you don't need to have a file on the server's disk.

But yeah, if you've got a DBA, they should be coming to you saying "I noticed you've got an INSERT job that's been running for the last 10 hours. Please read this documentation of bulk ingestion via COPY and tell me before you try again so that we can watch the performance together." Ask him to help you search for the solution on StackOverflow.

Yup. Postgres. I'll talk to him about COPY.

EDIT: He says I need to make files. No streaming. I'm probably misunderstanding.

Jo fucked around with this message at 18:59 on May 12, 2017

Jo
Jan 24, 2005

:allears:
Soiled Meat
:drat: COPY TO is fast. Even with my poo poo connection that's a huge difference in performance time. Took like 30-60 minutes instead of all day.

Jo
Jan 24, 2005

:allears:
Soiled Meat

smackfu posted:

We just have an always releasable master, short lived feature branches, and a production branch off of master when we do a release. Pretty simple and has worked well for us.

After a lot of bickering, we finally decided on this system, too, and it's working pretty well.

To get a sense of the before and after, we used to have QA, Stage, and Production/Master. Development would happen on short-lived branches and get merged into QA for testing. If approved, QA got merged into staging. Staging got deployed to the preproduction boxes where full regression happened (and emergency bugfixes were tested). Finally, everything was merged to master and deployed.

Now: we branch off of master, make changes, and merge into master. QA is always deployed from master. When we prep for a release, we tag the master branch and deploy that to staging and eventually to prod. The biggest advantage to this is just avoiding merge issues. Whatever gets deployed to stage gets deployed to prod.

Jo
Jan 24, 2005

:allears:
Soiled Meat
It's been a long day and I need to take a minute to vent.

For the past year or so we've been moving away from Postgres to Solr, a change which I was rather opposed to. It has turned out... okay. More and more, though, our application feels like it's built upon twigs. We've removed what I think was a good, solid foundation made of concrete, with solid iron IO pipes and instead taped on this popsicle stick Solr thing which is connected with leaky plastic tubes. Fortunately for me, I can mostly ignore the horrifying thing that our application has become and I can focus on more interesting things like research. However, as I've been the only backend dev on staff the past couple of weeks, I've been unable to avoid the unpleasant reality.

I've spent three solid days trying to figure out why this HTTP call to Solr's ELB is failing.

OPS insists that the nodes are fine and the ELB is performing as expected.

The calls fail intermittently. Same code. Same call. Intermittent failure. Fire ten of the same request, get back 2 replies and 8 400-errors. They say they don't see the timeouts.

If I switch things up and point them at a node directly instead of the load balancer, all the requests magically work. Between that and the intermittent nature of the failure, it seemed pretty compelling evidence that there was something FUBAR'd with the load balancer. Again, though, OPS is very insistent it's fine and the problem is in our application.

I can suck it up, bite my tongue, and try to find someone that will work with me. Today, though, all the Solr cloud instances went down. Again. In all our applications.

I'm out of ideas on how to approach this. I have the result traced to a single line in our code: where we make the POST request. It happens with CURL, too, but they say the problem is in our app.

:sigh: I'm dancing the line between wrathfully angry and completely out of energy.

Jo
Jan 24, 2005

:allears:
Soiled Meat

sarehu posted:

Elasticsearch not being a database is only true in the sense that it's a marketing decision. The same might be true of Solr, I don't know anything about it, but there's no reason why a search index would be a distinct piece of software from a database.

There are some guarantees made by a traditional database that are not made by Solr. ACID to BASE, if memory serves. Atomicity, concurrence, isolation, durability. While Solr favors Basically Available, Soft-state, Eventual consistency.

Jo
Jan 24, 2005

:allears:
Soiled Meat
Another rant. Sorry everybody.

I hate Apache Spark so loving much. I'm so sick of it. I spent at least a solid 40-hour week trying to get it to read data from S3. Unfortunately, you need to use v1.7.4 from 2012 of the Hadoop-S3 jar and v1.7.81 of the AWSjdk downloaded from http://fuckery.clownpenis.fart from before Jan 16 and and and.

Finally I said gently caress it and dumped a small subset of our data to a CSV and tried to run it, but Spark won't push your local data to the workers.

Okay, whatever, I'll read from the DB directly. Now I'm fighting with the JDBC drivers.

I already feel like enough of a moron without being forced to set up and use a cluster.

Jo
Jan 24, 2005

:allears:
Soiled Meat

return0 posted:

Hmm I dunno. What's better at general etl/analysis than spark?

I'd take just about anything over Spark at this point. I spent about two hours trying to do a concat/group by of a string in Spark. Last month I had a project I was going to do in it, but gave up and wrote the netcode + job system in the time it has taken me to get Spark to do the concat I mentioned above. It's a surreal, frustrating experience where I feel like I'm in grad school again defending a dissertation and unable to get anything to work or answer questions. It should be so simple! Why do I need a special class of function to do a map? Why can't I use a plain Java lambda on this agg? Why can't map or flatmap take a lambda?

Oh. It crashed after 18 hours. Okay. :suicide:

Jo
Jan 24, 2005

:allears:
Soiled Meat

GutBomb posted:

Because some exec told some other exec on the golf course that they used Spark.

In short, this. I've expressed my opposition. We do have a few TB of data to process, so I guess they think it's the only option.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Keetron posted:

Do we all need to learn how to use Python?
https://insights.stackoverflow.com/...e-growth-python
https://stackoverflow.blog/2017/09/06/incredible-growth-python/

Just when I was getting the hang of this java thing...

Pollyanna hit it pretty much on the head. Python can do a lot of things acceptably, but a few things well.

"But should I learn Python?"

If you...
1) Do a lot of numerical or scientific computation, then yes.
2) Need more functionality than a standard calculator or spreadsheet, but not enough to merit building an entire application, then yes.
3) Have spoken with your doctor and decided Python is right for you, then yes.
4) Need to be able to quickly prototype algorithms and play with them interactively, then yes.

If you...
1) Are learning a programming language for the first time, then no.
2) Need to be able to ship a standalone project with minimal effort that people can just run, then no.
3) Are doing embedded work or CPU critical work, then no.
4) Are building a large project which you expect to keep maintained for many years to come, then no.

IMHO, 90% of Python's strength is numpy. You can build some prototype applications from it very quickly (I still have a good amount of respect for Django and its ORM), but once you have to refactor you're almost better burning your app to the ground and rebuilding it. The REPL/IPython/Jupyter, though, are probably some of my favorite things. For data-munging and modeling and general machine learning, I've not found anything I like better.

Some use cases I run into and the language I tend to use:

Games: I like Java and libGDX for these. It has about the right level of hands-free-ness that I can pay attention to gameplay. If I need to test an equation, though, I'll use the IPython shell and quickly type out an algorithm to see if it works and interact with it.

Databases: I had to run a sorta' complicated query that kept getting killed by our DBA, so I used Psycopg2 to connect to the database, run the query in smaller batches (offset+limit), detect when it was disconnected, wait, reconnect, and continue. (Before anyone gets up in arms: I did tell them I was doing this. I maintain it's bullshit I get kicked off our dev box while doing dev work on the weekend.)

Visualizations: If you want to do stuff like histograms or pick out subtle things in images, PIL/Pillow are very useful. It's handy to be able to quickly open an image, normalize contrast, and save it out.

Text munging: One of the few places dynamic typing comes into handy is when munging CSVs or TSVs. Pandas can technically be used, but I've not found it easier in general.

Python fills the hole in my heart that Perl and Matlab left so long ago. It's the benefits they provide with less insanity and fewer of their weaknesses.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Ither posted:

Why shouldn't Python be someone's first language?

I believe it hides too many of the important details of the underlying system and induces some bad habits. I feel like if you first are forced to feel for yourself how to do an array copy or check for duplicates in a list it will give you an understanding of the hardware underneath that's very badly needed. You can teach program flow and logic with most any language, but you can't teach someone about how computer memory works if you don't have pointers. It's harder to drive home _why_ doing a single lookup in a 1D array (to which a 2d space is mapped) incurs less overhead than an array of arrays that requires two lookups.

It feels to me like teaching a drawing class and using a perspective distortion tool instead of learning how perspective works.

Or maybe I'm an old fart. :rotor:

EDIT: Is the rotor emoticon no longer a thing? Jesus. Maybe I am getting there.

Jo
Jan 24, 2005

:allears:
Soiled Meat
Before anyone gets the wrong idea: don't think for a moment I'm making GBS threads on Python. It's part of my daily work and I'd not have it any other way. I do, however, want to be realistic with it's flaws and imperfections.

Jo
Jan 24, 2005

:allears:
Soiled Meat
I'm debating a change of positions, too. I really like all the members of my immediate team* but I feel utterly incompetent at completing my work. I've been trying to get some models running since basically when I started, but I've not had much luck improving things over the baseline. I'm torn between changing jobs so that I don't interfere with people who are _actually_ getting stuff done and sticking with it because I'm one of a few people who understands the machine learning component of things, even if I'm really bad at it. I had told myself I'd consider switching after the product had gone public, but it's been pushed back by about half a year.

*Except our DBA who I professionally loathe and is 50% of my reason for wanting to leave.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Keetron posted:

Anyone here experienced with RStudio?
https://www.rstudio.com/

What kind of thing is this?

Reason for asking is that they have a bunch of remote positions that seem like it could work for people in this thread...

It's not bad for general stats stuff. The UI is actually pretty nice. R itself is based on the old AT&T SPSS language if memory serves. I wouldn't use it for general purpose development, but it's the right tool for some jobs.

Adbot
ADBOT LOVES YOU

Jo
Jan 24, 2005

:allears:
Soiled Meat

geeves posted:

I rather like Jersey, but we're writing a new API from scratch for another project and using Spring and Kotlin and while I'm not working on it directly, I have done some code reviews and I really like what I see. I'm hoping that eventually the API will be the backbone of new development in time.

Keep us posted on this. I enjoy Kotlin and have been thinking about building my next app in Spring.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply