Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Monkey Fury
Jul 10, 2001

KernelSlanders posted:

Has anyone had any success using Spark's DataFrame object? Am I missing something or is the whole thing just a horrendously designed API that can't possibly be useful for anything? Like in pandas you can df['c'] = df['a'] + df['b']. Is there a simple way to do that in spark Dataframes? What about df['idx'] = df.id.map(lambda x: np.where(ids == x)[0][0])?

Nope. You can try something like .withColumn(), I guess, but IIRC there's no straight-forward way to do it.

How many other folks in here are doing Scala and Spark work? We just launched a small test cluster recently, and my life is now figuring out how we leverage that plus our existing Vertica (please kill me) store. Also now learning Scala because I've found functional programming in Python - my primary day to day language - a bit of a hassle, especially since most Spark docs and examples and use cases happen in Scala or apparently Clojure.

Adbot
ADBOT LOVES YOU

  • Locked thread