|
After hating on the XML processing in Python (4suite is sluggish and buggy) I found LXML and P4X. So use them unless you want to break your brain.
N.Z.'s Champion fucked around with this message at 00:30 on Nov 5, 2007 |
# ¿ Nov 4, 2007 23:54 |
|
|
# ¿ May 3, 2024 09:46 |
|
m0nk3yz posted:Which version of Python were you using? 2.4 and 2.5, but that doesn't really change the crappiness of 4suite xml processing (which is all I'm trying to warn people off).
|
# ¿ Nov 5, 2007 00:30 |
|
devilmouse posted:Ugh, can't believe I'm asking this... but does anyone know any modules to go about extracting data/images from Powerpoint (ppt/pptx) files? The closest thing I've found uses ActivePython/Win32 COM stuff, but that's not so helpful when me and the server are running OS X and Linux. PPT files vary so much that only an office suite will be able to deal with all the vagaries of the format. You can use OpenOffice in a server-mode (headless, no X server) and stream documents to it with PyODConverter. If you're on Debian you can install OpenOffice in a server mode by apt-get installing docvert-openoffice.org (I make docvert) If by "data" you mean the slides and text, then I suggest converting PPTX to ODP (OpenDocument Presentation) with OpenOffice and parsing that format because it's considerably more sane than PPTX. ODP files are also ZIPs of XML and binaries, and you could use lxml (either conventional node iteration or perhaps XSL-T) to extract the useful parts. N.Z.'s Champion fucked around with this message at 10:39 on Oct 26, 2010 |
# ¿ Oct 26, 2010 10:24 |
|
Python projects go in this thread right? For the last few months I've been busy porting Docvert from PHP to Python and what the software does is convert Office files to Docbook and clean HTML, lists are made hierarchical , any vector diagrams are converted to SVG/PNG, and so on. It needs pyuno/libreoffice but the conversion from OpenDocument to DocBook and HTML is done in Docvert.
N.Z.'s Champion fucked around with this message at 23:48 on Mar 30, 2011 |
# ¿ Mar 30, 2011 23:46 |