Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I inherited a project with a production Spark cluster that has... a grand total of 45GB of memory across the entire cluster, driver + workers. Even this seems to be oversized.

What are the advantages of Spark at very small scale? My layman's assumption is that Spark is only a tool you reach for when a single dataset in memory outgrows what commodity instances can provide, but I'm deep diving on spark docs and exercises to make sure I'm not missing something.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got some fairly basic Thread-related questions while coming back to Java from a long time away. The last time I wrote Java professionally, it was an applet talking CORBA to the backend, I was using Eclipse, and checked code into CVS. I feel like I've got a working understanding of common concurrency issues and safety from spending time in Go recently, but Java's heavier thread abstraction leaves me with some questions.

Specifically, my team inherited an Android app. It is using Go mobile bindings for some of its functionality, which we're working to port over to Kotlin or Java. I need to get up to speed on good JVM networking practices in general though because we also will be developing the Scala backend going forward. It's doing a lot of not-HTTP network calls, which means things that would normally be handled by a library have been hand-rolled.
  1. Are there any reasons to prefer sync over async IO on Android in 2020? This app is doing a lot of sync i/o with what feels to me like big threadpools, 32+ threads. Where it's easy and convenient to rip out & make async, is this generally a good idea?
  2. For that matter, what's a smell that you have too many threads? Is hundreds in an Android app client perfectly normal? What about on the backend, is thousands of threads when targeting middle-weight VMs (~8 core/16GB) perfectly appropriate? Or is number of threads not a smell and I should just monitor and see?
  3. Where I pass callbacks to a library (OkHttp for doing async http) am I likely to cause problems if I do CPU-heavy stuff or blocking I/O on the thread that ran the callback, managed by the library?
  4. Am I making a mountain out of a molehill? Should I just make even bigger threadpools and think about Threads more like goroutines, AKA just use them whenever I/O is going to happen?

Finally, is this a sane way to handle doing blocking tasks in response to UDP packets? I know it needs error handling. Let's say that doPacketProcessing is either CPU heavy, does blocking I/O, or both.

code:
    Executors.newSingleThreadExecutor().execute(() -> {
        while (true) {
          byte[] buffer = new byte[8192];
          DatagramPacket packet = new DatagramPacket(buffer, buffer.length);
          socket.receive(packet);

          executorService.submit(() -> doPacketProcessing(packet));
        }
    });

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Assigning variables is atomic, right? Long story short, I've got a config provider that needs to be giving other threads a reference to a cached version of a bit of config. It stores this in a private field that is only updated by a specific thread whole job is to update the config.

Many other threads get the config from this provider. When the thread that gets a newer copy of the config to cache does so, it constructs a new object, assigns it to a local variable, and then assigns it to the private field.

My belief when setting this up is that because I have one writer, many readers, AND it's not important that readers ALWAYS get the newest version of what is written, I don't need any synchronization or volatile keyword or anything at all. Is this accurate? Am I setting myself up for mysterious pain down the road?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jabor posted:

The setup as you describe might be safe, but it's very brittle. For example, if any of the fields of your Config object aren't final, then it's not safe. Are you sure that nobody is ever going to introduce a non-final field in any later code changes?

Just make your shared field volatile.

They all are. I'm actually using Kotlin and the Config object is a data class, and all fields are val, the equivalent of final. I may go ahead and mark it volatile anyway, to guarantee ordering of reads/writes.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jabor posted:

I hope you have a plan for informing other developers of the requirement to never add any non-final fields. Perhaps a unit test that reflects on the class and makes sure everything is final?

The other thing to keep in mind is that, even if it's an acceptable for a thread to not see the updated value immediately, it's often a requirement that the updated value will eventually be seen at some point. Depending on how the rest of your code is structured, this may not actually be true, and it can be difficult to reason about.

What are you trying to avoid by not using volatile?

Honestly? My team and I are either novices or really rusty at dealing with true concurrency issues, and by default fall down the "just add locks until it works". It's pretty clear to me here that we don't need a lock, and I don't have a great understanding of what volatile actually gets you, which is part of why I'm asking here.

If the Config object is marked volatile, will that carry over to all of its fields? If someone does add a mutable field in the future, will volatile carry down to all objects referred to by the Config object, or at that point would we need to synchronize reads anyway?

We've done concurrency in Go more recently, with the general model of "one thread owns writing to any given thing", and are just looking to carry that pattern over to the JVM.

Twerk from Home fucked around with this message at 17:57 on May 15, 2020

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jabor posted:

Essentially, the Java memory model defines a partial ordering (called happens-before) on various things that you do in your program. If thing A happens-before thing B, then any thread that observes thing B will subsequently also observe thing A. If there is no happens-before relationship, then you don't get that guarantee.

Some examples of happens-before are that a thread releasing a lock happens-before any other thread takes that same lock. Another example is that a write to a field while a lock is held happens-before that lock is released. When you combine those together, you get the very useful property that if you write something to a field while holding a lock, then release the lock, the next thread to take that lock is guaranteed to see your write.

volatile fields get similar happens-before guarantees - any previous write happens-before the write to a volatile field, and so any thread which sees the new value of the field will also see all the writes you made beforehand. So, it's sufficient to mark the top-level field as volatile - anything that sees the new value of the field will also see all the writes that were made as part of creating the object.

E: if you do want to change individual fields after the object has been made available to other threads, you're back at needing to think carefully about things.

I appreciate this. It seems like the safest thing is to just acquire a lock on the object before updating anything or reading anything, and be done with it.

If we did want no lock and to have mutable fields on the Config object after construction, then those fields would also need to be marked volatile? Assuming still that there is only the one single thread ever writing to it.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

smackfu posted:

I have one thread that has to generate a String value every ten minutes, and a bunch of other threads that have to access that value, and also not block while the value is being generated.

I don’t do much manual thread stuff anymore... Is there any more modern way to do this than just throwing the synchronized keyword around a lot?

I believe that’s exactly the scenario I asked about above, so you want to have the member marked volatile and that’s about it as long as there’s only one writer.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got a set of AWS lambdas that I'm wanting to fold into an existing Spring Boot application. However, they're deployed in a way right now where the hostname is basically what's being used for routing. Is there any sane way to accomplish hostname-based routing in what is intentionally a very vanilla Spring Boot setup?

Existing API as deployed with many API Gateway instances and lambdas:

cat.api.mycompany.com/doSomething
dog.api.mycompany.com/doSomething

I'm wanting to use the hostnames to be able to route cat and dog to different Controllers. I could stick a proxy in there to do hostname based routing like:

cat.api.mycompany.com/doSomething -> unified.api.mycompany.com/cat/doSomething
dog.api.mycompany.com/doSomething -> unified.api.mycompany.com/dog/doSomething

but I havent dug into if I can configure the existing ALB to do that, which is the only proxy in the system right now, or if I should be sticking a sidecar container to the spring boot container. I could whip up an nginx config in no time flat, but I'm sure that Envoy or Zuul could handle that as well. I'd prefer if I could do host-based routing in Spring Boot itself in addition to path-based, so that calls to either end up in the same controller.

Edit: It looks pretty easy to make the ALB do that, but I'm still curious if anybody could manage to do it inside of the application itself.

Twerk from Home fucked around with this message at 05:14 on Jul 24, 2020

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
For greenfield Spring development, is it pretty much always worthwhile to start off with WebFlux nowadays, or is going with the much simpler imperative + JPA still fine in the long run? Usage case is calling out to external http services in parallel, which I can do with WebClient from either reactive or non-reactive just fine, and using Postgres as persistence.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Kotlin Spring question: If I've got an incredibly slow network call (6+ seconds) that can safely be cached for moderate periods of time (5 minutes), and using stale old cached values is acceptable, is there any clean way for me to get background threads to refresh the cache without any real requests paying that horrible penalty?

Basically, what I want is for this to schedule cache renewal, and somehow keep serving the old cached value while waiting for the new one to populate.

code:
@Scheduled(fixedRate = 300000)
fun evictTheSlowCallCache() {
	cacheManager.getCache("slowAssCall").clear()
	slowAssCall()
}

@Cacheable("slowAssCall")
fun slowAssCall(): ResponseType
That's clearly not what these annotations actually do together, as this would first evict the cache and then refill it, leaving all of the requests that come in before it's filled again to make the slow call themselves.

If I do sync="true" in @Cacheable, then they'll at least use the first response rather than waiting on their own, but will still not use the previously cached value and be stuck waiting. I don't see an easy way to do this other than stepping pretty far outside of easy Spring norms and manually managing a volatile property and managing the in-memory caching myself.

I guess what I would do there is have a @Scheduled method that makes the slow call, reads it into a new object, and then in a single atomic operation reassigns the volatile property on the class.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jabor posted:

@CachePut seems like the obvious solution.

CachePut does exactly that and I have no idea how I had missed it, thank you!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I prefer how Kotlin and Scala just use if as the ternary.

https://kotlinlang.org/docs/reference/control-flow.html
https://alvinalexander.com/scala/scala-ternary-operator-syntax/

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Manager Hoyden posted:

Speaking of learning Java, I really need some instruction carrying me from intermediate to real-world-actual-development skills. Is there anything like that out there? Basically getting over the hump from theory to real-world practice.

I am an old (kinda) who took the condescending advice and is learning to code, but I'm getting nervous about my level of knowledge. I have learned a ton of concepts, but I have no drat idea how to put them together to do anything useful and I don't see my last few classes remaining in the degree changing that level of knowledge much.

This is like learning the theory of how like 50% of the parts of a car work but nothing about how a whole car works. And also the parts I learned about mostly aren't even used in the real world anymore. And from there I'm supposed to know how to design and build a car.

If you want, I've got a bunch of tiny applications I made for fun that I'd be willing to give you access to source, I could even suggest some features to implement. For example, https://www.tromd.com/ displays horrifying photos of the president, random and new every time you reload.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Is anybody running Spring in resource constrained environments lately? What even counts as resourced constrained nowadays? I tend to size Spring boot REST API containers with around 2-4GB of RAM per container and heap set to 75% of that. Definitely overkill, but for normal web API type stuff CPU tends to be more of a limit than memory anyway, and I like to be able to use at least 1 whole vCPU.

Anyway, I dropped into a new team at work, and for some reason they're running Spring Boot in docker containers limited to 0.128 of a single vCPU and -xmx 256M. They report "it's always been like that, and always worked well", but my eyeballs fell out of my head seeing them horizontally scale out containers sized like that instead of scaling up a little bit. Giving Spring less than 1 full thread seems silly to me, you've got the overhead of running 8 JVMs and Spring contexts just to get a whole vCPU of CPU power.

Anybody really experienced with container / instance sizing that can explain what Spring Boot container sizing you've been using and why? I had put very little though into my own choice of 1-2 vCPU and 2-4GB of RAM other than that's the most vCPU to RAM that ECS Fargate allows, and compute is so cheap and we didn't have anything at huge scale that optimizing sizing has a big payoff.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Combat Pretzel posted:

Tell me about data structures. Java doesn't formally have structs, everyone tells me to use final classes. If I do that, will it act like a struct internally? I mean in regards to allocating huge arrays. Will I get a packed array in memory, or will I get an array of N object references with N objects being spawned, including all the overhead? I seem to remember this being a complaint nearly two decades ago, is it still true?

If you allocate an array of primitives, it will be contiguous in memory. If you need to model your own data types like this, Java isn’t your language. If you can make a small test and actually measure performance though, you might be pleasantly surprised.

If you know for a fact that your application will benefit massively from value types, then you may want to look at C# or Go, both of which have structs with value types. In the vast majority of applications all three of these perform pretty similarly, though.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

F_Shit_Fitzgerald posted:

I'm trying to relearn Java after not touching it for years, and that's taking the form of jumping into the deep end of the pool and just writing code. My current project is writing a program to simulate a deck of cards, but I'm feeling dumb because I can't seem to figure out how to use the enum I declared for my Card class.

In my Card class I declared my enum as

code:
public enum Suit {HEART, DIAMOND, CLUB, SPADE};
with the intention to declare a Card object as something like this

code:
Card c = new Card(HEART, "2");
But HEART "cannot be resolved to a variable" when I compile. I know I'm probably making a boneheaded mistake, but I'm wondering if someone can fix my stupid (thanks in advance).

Are you importing Card.Suit.HEART or just Card? If your Suit enum is a member of the Card class, you'll either need to reference it as Card.Suit.HEART, or import the member you want to reference directly.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

dantheman650 posted:

Hey ya'll - I just accepted a new job offer and am making a transition from five years of React/JS front-end development to Java. The OP is super old - are there any recommended resources for Java beginners with programming backgrounds? I like physical books and structured courses the best. I obviously need to learn the syntax but I'm also curious about best practices, directory structure, testing, etc.

What environment are you doing Java in? Is it maintenance + extension of applications from the same era as the OP, because keeping those projects internally consistent is probably what you'll be doing. Are you doing Android development? Android Java got stuck somewhere between Java 7 and 8, so Java for Android is similarly out of date.

If you're doing modern small services with a widely-used framework (Spring Boot, Dropwizard, Quarkus, Micronaut), then you could just stand up one of those from any of their project initializers and explore your way around that.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Good Will Hrunting posted:

I'd go this route. Spin one up and create an endpoint or two, do some serde, mess with ORM of your choice, pull in a few free AWS service libraries to toy around with even maybe? Wire it all together with Guice or another DI framework of your choosing (maybe Spring Boot includes DI? I've never used it).

Spring boot, Quarkus, and Micronaut all include a perfectly acceptable DI framework, and I'd use the out of the box solution as much as possible. If you want an actual small project to do that's motivating, you could use Spring Boot, Quarkus, or Micronaut to make a slack bot: https://slack.dev/java-slack-sdk/guides/supported-web-frameworks

If you want to take something primitive and make it better, you could give me a hand on the Somethingawful Slack integration that I've been toying with, PM me for details. It's spring boot and at 3 hours in, I considered it "shipped" because it can expand the text of SA posts in slack.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I feel great shame in asking this, but what's the best way to run java web start on a Mac in 2021?

I'm giving https://openwebstart.com/ a shot, which looks promising, is currently supported on latest java 11, and has active development so I would assume that there's years of support ahead of them, along with a move to Java 17.

Anybody else have solutions for using java web start applications?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Small White Dragon posted:

Which JDK is best for use on Apple Silicon?

I cannot believe I have to say this, but it depends, and I'm having to juggle multiple JDKs depending on usage case, above and beyond having both 8 and 11 available.

If you are running a packaged application that uses JNI and has a native component, it probably doesn't have ARM binaries included if it targets desktop/laptop/server. This means that you need to use an x86_64 JVM, because Rosetta cannot handle inter-mixing aarch64 and x86_64 in a single application, so you need the whole thing to be x86. If you are compiling the application yourself, is the native part portable? Are you sure? A ton of JNI code assumes x86, and in a best case you may be able to slap something in there like sse2neon, which I've used with several applications, or worst case it will die in cryptic ways. In this situation, you want an x86 JDK again.

The crappy part is that running x86 code is markedly harder on battery life, so if you have an application that is pure Java, no JNI, or explicitly targets Apple Silicon, then you really do want to run an aarch64 JVM and enjoy better performance and longer battery life.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jabor posted:

An easy way to do this is to use a static class field:

code:
public class MyClass {
	private static int nextId = 0;

	private int id;

	public MyClass() {
		this.id = nextId++;
	}
}

Is incrementing always atomic, or would this also require the constructor to be synchronized?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'd appreciate some guidance about throughput-oriented thread scaling with the Parallel GC. We've got some Java batch processing tools that have no latency sensitivity, and we care a lot about throughput per CPU-hour more than throughput per wall hour. We're using a job scheduler that is not limiting process CPU consumption with cgroups or anything similar, but all of the tools take number of worker threads for them to use as an argument anyway. Based on previous testing, we run jobs with 4-16 threads, as some of them are known to scale much worse with threads, so we'll just do 4 threads with more processes at once.

I don't know working set size for all of these. We have more RAM than we need for these jobs because some other genomic tools are incredibly memory hungry, so for most of these I use a 31GB heap, although they'd probably run happily in 8GB.

Our compute nodes range in size from 64 threads to 112 threads per node at this time. We're using Java 17 for tools that don't need Spark, and Java 8 for tools that do need Spark.

I would like to know a decent place to start testing things in our own environment to improve the situation. My baseline assumptions are:
  • Parallel GC is going to be more CPU throughput efficient than G1, or especially newer lower-pause GCs.
  • Parallel GC multithreading will scale non-linearly with the number threads doing GC, so I certainly want to limit it to a set number that is less than the total system
  • Running much larger heaps than these jobs actually need will result in greater total throughput.
  • Keeping heap less than 32GB will still enable compressed OOPs, which should help with memory / cache pressure a tiny bit.

I'm in new-to-me territory, because most of my previous Java experience is various types of web or network service where we cared a LOT about latency, but for big batch processing it doesn't matter at all.

I did find a paper about this that covers a couple of the tools we're using: https://www.researchgate.net/publication/337114965_Recommendations_for_performance_optimizations_when_using_GATK38_and_GATK4, but it mentions that they suggest users do analysis themselves, after seeing performance regressions with 4-16 Parallel GC threads on 40C Skylake nodes, and their overall conclusions are pretty bizarre, recommending either 2 or 20 Parallel GC threads.

What else am I missing for throughput-optimized Java?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
You might try the Jetbrains decompiler? I've had success with that, it's pretty nice.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm curious if H2 has some benefits that I'm missing, or if I've somehow been using it wrong?

I make some little toy apps in Spring Boot just because I'm very familiar with it, but I've learned the hard way over the years that H2 is not to be trusted in the long run with data, because it seems to be very vulnerable to corruption if the process ever has a dirty shutdown in a way that I haven't seen other databases fail. I've seen this basically as long as I've attempted to use H2, and I get the impression that H2 is only ever used for testing applications locally with transient data.

I've compensated by replacing H2 with SQLite, but sqlite usage seems to be outside of the (non-android) Java culture, and I'm curious to hear what people are reaching for in an embedded database. Is SQLite king of the space like I think it is, or should I check out any other options?

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Janitor Prime posted:

I really dislike teaching newbies the streaming APIs, what was wrong with doing it the old fashioned way with for loops. I feel like stuff was so much simpler to reason about.

I realize you were just having a minor gripe, but the streams interface is so much nicer for simple parallelization than alternatives like OpenMP #pragma omp parallel for or something in C++-land. Also, the streaming interfaces are much easier to turn into reactive non-blocking concurrent code as well if your workload works better that way.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply