Detection of AI created content

Wooden letter tiles scattered on a textured surface, spelling 'AI'.

As AI has progressed generating code, writing, music or other intelligent language based skills through an LLM there is a parallel growth in detection of AI generated content. 

Like everything AI, it is a probability game. Trying to estimate the combination of words/tokens and comparing with what is in the model. Using a standard model like the one from OpenAI makes it an easier comparison. However, there are many methods that are used.

There are many methods that a AI detector could use: They could look at frequency patterns, or attention patterns. And variability or entropy through the documents. A human writing those might have periods of drift vs the AI written script which follows a models distribution. However , note that the detection process also needs to be trained through another model in which case they could obtain scripts written by AI and then compare it with human script and create a model. This trained model can then be used to detect a AI written script.

This is obviously a continuous progression since the models are changing fast and so is the output they produce. These models are also learning new methods of coming up with new techniques. For example, a feature that was often used to detect was perplexity – lower perplexity usually meant that it was more of expected text and hence was AI. 

I believe for any writing fully composed with AI using a standard model, it will be relatively easy to detect but as models get more complex and more advanced and use more “human” techniques, they will become progressively more difficult to detect small segments of AI embedded within a bigger piece of human creativity.

If you would like to try one on the web then use GPTzero.me

Similar Posts

  • AI job search

    With AI enabling many activities in the job hunt process, it is expected that many job applicants and executive at hiring companies use AI based tools for the process. The “automated” process used to be enabled by keywords which was the way that the candidates were selected from a large pool but with the availability…

  • Sparql tool – connect LLM to knowledge graphs

    Talk to knowledge graph – sparql-tool Use LLM.: Hallucinations (sometimes good for creativity), Outdated knowledge, no access to your data ( trained on public knowledge). Solutions: Fine tuning – retrain the model on domain data RAG – Search: look things up in real time. Claude code looks at searched doc. Vector embeddings: semantic similarity search…

  • |

    Reinforcement learning

    Reinforcement learning is a method that drives learning and memory in primitive species such as birds, humans and other living species to its use in machine learning. It is used to influence the behavior of us humans on social media to its use to train machines. The essential components were initiated by BF Skinner 20th…

  • |

    Fitness landscape for antibodies (Flab2)

    There have been many AI models to predict the structure of proteins, especially antibodies. There are several key aspects to developing antibodies as drugs beyond binding to the target of interest. These are : thermostability, expression, aggregation, binding affinity, pharmacokinetics, polyreactivity and immunogenicity. There have been many models available such as IgLM, ProGen2, Chai-1, ESM2,…

  • |

    Knowledge graphs

    Knowledge graph is a way of representing information where entities/nodes (people, places, products, concepts) are linked by relationships/edges (works, creates, has ). It is a semantic network that captures facts and context. There is also an ontology that defines different types of node/edges and defines what types exist and how they relate. These are used…