Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hed
Mar 31, 2004

Fun Shoe
Cross-posting since this is more data than Python (thanks WHERE MY HAT IS AT)


Is there a standard tool to look at parquet files?

I'm trying to go through a slog of parquet files in polars and keep getting an exception:

Python code:
Traceback (most recent call last):
  File "log_count.py", line 57, in <module>
    daily_output = result.collect()
                   ^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\polars\lazyframe\frame.py", line 1937, in collect
    return wrap_df(ldf.collect())
                   ^^^^^^^^^^^^^

polars.exceptions.ComputeError: not implemented: reading parquet type Double to Int64 still not implemented
I know what this means, but I don't have a good way to diagnose what errors the files are in, and so end up moving groups of files around until it works, then putting them back in one by one until I find the offender.

I understand I'm trying to have the efficiency of polars in lazy mode, but I'd love to know where it specifically blows up to help figure out the problem upstream.

Is there a better place to ask polars / data science questions?

Adbot
ADBOT LOVES YOU

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply