Prosecraft is the latest example of ripping off art for “helpful” tech

Rows and rows of books, from the floor to the ceiling, inside The Book Exchange, a used books and puzzles shop at 8719 South U.S. Highway 1, in Port St. Lucie. "It's like a dream come true," said Meghan Wood, the new owner. The shop specializes in selling and trading quality used paperback and select hardback books.Tcn Book Exchange
Rows and rows of books, from the floor to the ceiling, inside The Book Exchange, a used books and puzzles shop at 8719 South U.S. Highway 1, in Port St. Lucie. "It's like a dream come true," said Meghan Wood, the new owner. The shop specializes in selling and trading quality used paperback and select hardback books.Tcn Book Exchange /
facebooktwitterreddit

This week, a website by the name of Prosecraft.io caused quite a bit of drama and sparked renewed conversations about the way art is being safeguarded (or not) in the era of AI. If you follow authors on social media, you’ve likely seen it talked about quite a bit. If not, strap in for a cautionary tale about artistic consent.

The website Prosecraft and its related writing program Shaxpir (pronounced Shakespeare) were designed by a tech entrepreneur named Benji Smith. Prosecraft is “dedicated to the linguistic analysis of literature, including more than 25,000 books by thousands of different authors.” It counts up how many words are in a given novel and tries to provide breakdowns such as which words add to its “vivideness.” The site is framed as a service meant to help authors improve their writing.

The rub is that Smith fed those 25,000 books into his program without seeking any consent from the authors or publishers responsible for putting them out into the world. And while Prosecraft isn’t new — Smith has actually been working on iterations of it for upwards of a decade — it came under a wave of scrutiny this week when authors finally caught wind of how their works were being repurposed for this site. Those 25,000 books are all copyrighted material; Smith didn’t seek any permission to use them, instead stating that he thought he was “honoring the spirit of the Fair Use doctrine” since he only published snippets of the 25,000 books, not the full text (although the full-text was run through Prosecraft’s AI-powered algorithms.

Author Zach Rosenberg was one of the first to bring this issue to the attention of the wider authorial community when he asked that his book to be removed from Shaxpir’s analysis program, emphasizing that neither he nor his publisher consented to Smith’s use of his work. From there, the snowball gathered speed. Little Fires Everywhere author Celeste Ng counted 20 novels each from Stephen King and Jodi Picoult in Prosecraft’s library, as well as plenty more from authors like The Song of Achilles writer Madeline Miller and The Hate U Give author Angie Thomas.

Days earlier prominent sci-fi and fantasy authors like Devin Madson, Megan E. O’Keefe, Andrea Stewart and Richard Swan noticed that their works were being used by Prosecraft without their consent as well. Their inquiries about how Smith acquired their novels went unanswered, until the pressure from Rosenberg’s viral thread finally prompted Smith to respond.

Prosecraft’s Shaxpir program has been taken down 

Under pressure from so many authors, Smith wrote on his blog that he was taking down Prosecraft. He discussed how he had the idea for Prosecraft while he was writing a memoir, and realized that he needed more information about how many words the average book has in it in order to better craft his own. That led to constructing spreadsheets based around the books on his own shelves, before he eventually gathered many, many more books by “crawling the internet.” Smith doesn’t go into detail about what exactly that means, but it’s worth noting that many of the novels in question are not available for free online, barring piracy.

While Smith sounds sincere enough in his apology, people are still wary. He makes no mention of actually purging Prosecraft’s library of dubiously obtained books, which means that there’s nothing to stop them being used in a future version. Nonetheless, Smith told “the community of authors” that “I hear your objections” and offered his “sincerest apologies.”

"In the future, I would love to rebuild this library with the consent of authors and publishers. I truly believe these tools are useful for creative people. But now is not the right time. I understand. And I’m sorry."

Part of why people got so upset about this is because of ongoing issues surrounding how AI might change the profession of writing. As Gizmodo’s Linda Codega points out, Smith’s “AI algorithm” isn’t exactly the same as a Large Language Model AI like ChatGPT. But the fact that Prosecraft similarly obtained all this data without consent essentially means that it runs on…well, stolen data. It’s a similar enough issue to what we’re seeing with generative AI that authors’ alarm is more than understandable, especially when large publishers are already pushing to exploit authors’ works with AI.

Dear tech bros: If you want to be writers, just write more

I’d like to close with a thought. One of the main arguments for “helpful” AI-based programs like Shaxpir and Prosecraft is that they’ll help democratize things like writing and art, allowing anyone to get their ideas out on the page without actually putting in the hours to master a craft. In his apology, Smith says that he wanted to bring authors “a suite of ‘lexicographic’ tools” so they could “compare their own writing with the writing of authors they admire.” He says that after working in computational linguistics and machine learning for upwards of two decades, he was frustrated that “the fancy tools were only accessible to big businesses and government spy agencies.”

I have to push back on this, because all of this data that Smith claims is so hard to find, such as word counts, is just…not? Most of the relevant data, such as how many words are in your average book, can be found relatively easily. It’s not some great, impenetrable mystery; it just requires gumption and Google. Yes, it can be a pain to research things, but the argument that these tools will be the great equalizer falls apart pretty quickly when you realize that most of the secrets they’re claiming to make freely available are already freely available, provided you have the motivation to seek them out.

So I guess what I’m trying to say is, tech bros, rather than spending untold hours on AI-based programs to help you write stories, maybe just…write more? Move into a cabin in the woods, forsake the bonds of your former human life, and devote yourself to the written word and observing the geese like T.H. White. It’s not rocket science! Just writing.

This remains an interesting time in the arts. Something tells me this is far from the last time we’ll be talking about a story like this one.

Next. Bridge is a haunting multiverse thriller from Shining Girls author Lauren Beukes. dark

To stay up to date on everything fantasy, science fiction, and WiC, follow our all-encompassing Facebook page and sign up for our exclusive newsletter.

Get HBO, Starz, Showtime and MORE for FREE with a no-risk, 7-day free trial of Amazon Channels