kid asks why we're training models on their essays

So my eldest came home from school yesterday asking whether the stuff she writes for assignments is going to be fed into AI systems. Fair question, right? Apparently one of her teachers mentioned something about plagiarism detection and model training in the same breath, and she's now convinced her work is being harvested.

I tried to explain the difference between detection and training data, but honestly I half-fumbled it. Because here's the thing: I work on this exact problem from the other end. We're building assignment grading systems for universities. And while I'm fairly confident we're not scraping student essays without consent, I also know the industry moves fast and the paperwork gets messy. I don't have the full picture of what happens to the data we process. I just make the code work.

But it's been nagging at me since she asked. There's a weird tension between "this is how the technology works, kid" and actually believing the institutions handling her work have her interests in mind. The shower thought won't leave: if I wouldn't be comfortable with her data being used a certain way, why am I comfortable building systems that handle thousands of other people's kids' data the same way? Back to it.

replies (18)

zoeyx1mo ago

manifest v3 would like a word about what you can even audit at this point lol

Liam1mo ago

yeah nah this is the bit that keeps me up too. you're basically building the infrastructure and hoping the layer above you is ethical about it

Inés1mo ago

I got asked in standup last week what happens to the test data we're using and I just froze. Nobody expects you to know that stuff but then you realize you probably should.

Rae1mo ago

The shower thought bit is the killer. You know too much now.

George Murray1mo ago

@Rae Aye, that's the bit. Can't unknow it now. Back to pretending the ethics layer is someone else's problem, I suppose.

simone1mo ago

yeah this hits different when it's your own kid asking. i started asking our legal team actual questions last month and the answers were so vague i stopped asking. honestly the not-knowing might be easier

Beatriz1mo ago

The not-knowing is definitely easier until it isn't. Your daughter asking made it real.

Lucia Vargas1mo ago

The shower thought won't leave you alone because you already know the answer.

Arvind P.1mo ago

The legal team vagueness is the real tell. They know too.

Nikita1mo ago

the legal team vagueness is basically them saying yes without saying yes

George Murray1mo ago

Right, exactly. The paralysis vs shipping tradeoff. Except you can't unship it once legal gets quiet.

Cole Pierce1mo ago

Honestly, the MLS API I work with has rate limits so tight I can barely pull comp data without hitting the wall. Nobody knows why, and when I ask, I get the same legal vagueness you're describing. Makes you wonder what's actually getting logged.

Paolo A.1mo ago

Yeah, same thing happened to me writing case studies. Client asks for four rounds of revisions, legal team goes silent on data handling, and suddenly you're just... shipping it anyway because the alternative is paralysis.

Luke Tanaka1mo ago

Did you actually ask your legal team directly what happens to the data, or are you doing the thing where you already know the answer so you don't ask?

Phoebe1mo ago

The pandas memory thing I'm dealing with right now is honestly more tractable than this moral geometry. At least I can profile it.

George Murray1mo ago

Aye, that's the thing. I know the answer if I'm honest with myself, which is why I haven't actually asked.

Carla1mo ago

That's the honest part. Not asking means you get to keep building.

George Murray1mo ago

That's the thing though, innit. Once you ask, you own the answer. Easier to just keep the code running.