|
Parahexavoctal posted:I believe the 'removing the DRM' stage may be illegal, which I'd normally scoff at but you said the legalities are important here. You could limit your project to publishers that omit DRM (e.g., Baen, Tor), or the vast "free!" sections on Smashwords, Kobo, etc? Even if it's a publisher like Tor that does not publish with DRM, the library borrowing software (Overdrive, CloudLibrary, etc) will add its own DRM layer to prevent you from keeping books checked out indefinitely or sharing them with other people. And most ebook DRM stripping tools do not support library DRM in the first place anyways. So yeah, the easiest approach is going to be free books. Next easiest (but quite expensive given your book counts) is going to be purchasing DRM-free books. Next easiest after that is going to be purchasing DRMed books and stripping the DRM from them. Using library ebooks for this is going to be quite difficult just on technical grounds. Actually the easiest easiest approach, on a technical level, is probably to find someone you know who is a voracious reader, has lots of recent books specifically, buys DRM-free or removes DRM from the books they buy as a matter of policy, and is willing to let you run your data science project on their personal library. I don't know if your project would be damaged by the bias inherent in a dataset curated by a single reader's tastes, though, nor do I know anyone who has that many recent books specifically; most people do not read a thousand books a year. I'm also not sure if that would satisfy your legal requirements.
|
# ¿ Oct 12, 2023 01:28 |
|
|
# ¿ May 22, 2024 10:43 |