Mimecast Threat Intelligence: How ChatGPT Upended Email
Mimecast researchers built a detection engine to showcase whether a message is human- or AI-generated based on a mixture of current and historical emails, and synthetic AI-generated emails
Key Points
- AI tools enable threat actors to generate well-constructed and contextually accurate emails.
- Overall, generative AI emails have a polished and professional tone, making them more convincing.
- Analyst investigations should increasingly look for words and phrases associated with generative AI models and not just the sender information and payloads.
UPDATE: This blog was originally published on September 30, 2024 as a follow up to Mimecast's Global Threat Intelligence Report 2024 H1. We have since updated the data reflected herein as we have continued to closely monitor the use of generative AI tools like ChatGPT by cybercriminals in their efforts to craft more believable phishing emails and other malicious content. Be assured that we will continue to monitor this attack method and report updated findings again when relevant.
Across most cybersecurity media, references to generative AI are not only increasing exponentially, but publications are citing its use by malicious actors which has the strong potential to negatively impact any organization targeted.
When interviewing Mimecast threat researchers for our Global Threat Intelligence Report 2024 H1, questions were asked about the pervasiveness of AI in phishing emails – but no metrics could be quantified. This left unanswered questions, including how prevalent is this, and can it be measured? Our data science team took on the challenge to help, by building a detection engine to determine if a message is human- or AI-generated based on a mixture of current and historical emails, and synthetic AI-generated emails.
The research indicated a point in time when we started observing an increasing trend in AI-generated emails correlating with the release of ChatGPT. We also observed malicious AI-generated BEC, fraud, and phishing emails. The net effect of this was a need for understanding by analysts/security teams and end users of the indicators of AI-generated content which could help them spot these attacks.
Telltale Signs of AI-Generated Emails
ChatGPT made AI-assisted email writing accessible to everyone, even malicious actors but this is not the only set of tools available to them. In a previous blog post we outlined some of the generative AI tools they use. Previously, such tools were mainly for businesses. Now, anyone can use AI to write well-crafted emails suited to various situations. As AI-generated content becomes more prevalent, the ability to discern between human-written and machine-generated text has become increasingly difficult. One of the most notable characteristics of AI language models is the use of complex words and sentence structures, which can reveal their involvement in writing. Researchers at Cornell University found AI language models favor certain words in scientific writing. “Analyzing 14 million papers from 2010-2024, they noticed a sharp increase in specific ‘style words’ after late 2022, when AI tools became widely available. For example, ‘delves’ appeared 25 times more often in 2024 than before. Other AI-favored words include ‘showcasing,’ ‘underscores,’ and ‘crucial.’”
How We Know ChatGPT Changed Email
Mimecast’s data science team started with the intention to train a model on the differences between human- and AI-written emails. In total, over 20,000 emails were utilized from Mimecast’s data coupled with LLM generated – OpenAI’s GPT4o, Anthropic’s Claude 3.5 Sonnet, Cohere’s Command R+, AI21’s Jamba Instruct and Meta’s Llama3 – synthetic data. The deep learning model created determined what characteristics make each data point related to the language utilized to either be human or AI written. For testing, to ensure that our model did not overfit to our training set, but could generalize well, we used four datasets:
- 10,000 human-written emails samples from Mimecast
- 7,000 LLM generated synthetic data
- Human and LLM dataset from Kaggle (link)
- Fraud dataset from Kaggle (link). All emails are assumed to be human-written, as they were collected before the rise of LLMs
Once training was complete, our model was shown one email after the other and asked to determine whether that example was written by a human or AI. We repeated this exercise 200,000 times on different sets of emails. We were able to use it to analyze a subset of emails to predict whether it was written by a human or AI. The results from this exercise can be found in Figure 1 which also highlights the increase of AI-written emails. It is important to note that the model was not looking to identify malicious AI-written emails, but rather to estimate the pervasiveness of AI. Prior to undertaking this study it was known that AI-written messages were being seen but we did not know the scale.
Figure 1 – Human- vs. AI-Written Benign and Malicious Emails Per Month
We sampled 2,000 emails per month from January 2022 to March 2025. These statistics show that up to 10% were AI-written in a single month for the dataset, as depicted in Figure 2. But importantly, the line chart is showing not only a marked increase in the use of AI to write emails but the reduction in human writing which continues to fit with what is being seen in publications. Whether this is attributed to non-English language speakers or the use of AI to aid in writing to try and make them better is unknown at present.
Figure 2 – Percentage of Human- vs. AI-Written Emails Per Month
We then used the end user submitted data identified as malicious to calculate the frequency of words in the name topic ranked by frequency. As you can see in Figure 3, there is a lot of commonalities relating to tax, banking, and other phishing topics. This also shows how yearly themes are pervasive more than anything else.
Figure 3 – Top Topics in MML-Written Emails
We then analyzed distinctive features of malicious LLM-generated emails and identified the top 30 common words, as seen in Figure 4. The frequent presence of formal greetings like "I hope this message finds you well," and repetitive closing phrase "Thank you for your time" allows us to further validate the usage of LLMs based on our earlier findings.
Figure 4 - Top Words in LLM-Written Emails
Examples of AI-Generated Emails
During the process of reviewing the submissions, a few malicious examples were found containing distinctive language.
Example #1 of Gen AI spam message
Indicators:
- "delves into the intricacies of", "navigating through the complexities of"
- Overuse of bullets
Example #2 of Gen AI BEC message
Indicators:
- ‘I hope this message finds you well'
- Repetition of the words ‘gift cards’ and ‘surprise’
Example #3 of Gen AI BEC message
Indicators:
- ‘Hello!’
Example #4 of Gen AI phishing message
Indicators:
- ‘delve deeper into this’
- ‘stumbled’ or ‘stumbled upon’
- Long ‘-’ utilized across ChatGPT
Recommendations
These findings indicate that manual phishing investigations should remain a crucial layer of defense, especially when flagged by end users. It's vital that threat researchers scrutinize the language for specific markers that align with our findings; by cross-referencing indicators such as “delve deeper into this” or “hello!”, particularly among end users who commonly don’t use such language with known threat patterns, you can identify phishing threats more effectively, reducing remediation time and mitigating organizational risk.
As always, security teams should ensure their indicators evolve alongside large language models and new data sets.
To learn more about Mimecast's most recent threat research discoveries, please be sure to read our Global Threat Intelligence Report 2024 H2 and visit our Threat Intelligence Hub.
**This blog was originally published on September 30, 2024.
Subscribe to Cyber Resilience Insights for more articles like these
Get all the latest news and cybersecurity industry analysis delivered right to your inbox
Sign up successful
Thank you for signing up to receive updates from our blog
We will be in touch!