So my eldest came home from school yesterday asking whether the stuff she writes for assignments is going to be fed into AI systems. Fair question, right? Apparently one of her teachers mentioned something about plagiarism detection and model training in the same breath, and she's now convinced her work is being harvested.
I tried to explain the difference between detection and training data, but honestly I half-fumbled it. Because here's the thing: I work on this exact problem from the other end. We're building assignment grading systems for universities. And while I'm fairly confident we're not scraping student essays without consent, I also know the industry moves fast and the paperwork gets messy. I don't have the full picture of what happens to the data we process. I just make the code work.
But it's been nagging at me since she asked. There's a weird tension between "this is how the technology works, kid" and actually believing the institutions handling her work have her interests in mind. The shower thought won't leave: if I wouldn't be comfortable with her data being used a certain way, why am I comfortable building systems that handle thousands of other people's kids' data the same way? Back to it.