Not great, but not bad, right?
Workers are experimenting with ChatGPT for tasks such as writing emails, producing code, or even taking a year-end exam. The bot uses data from the internet, books, and Wikipedia to produce conversational responses. But the technology is not perfect. Our testing revealed that he sometimes offers answers that potentially include plagiarism, contradict each other, are factually incorrect, or contain grammatical errors, to name a few – all of which could be problematic on the job.
ChatGPT is essentially a predictive text system, similar to but better than those built into text messaging apps on your phone, says Jacob Andreas, an assistant professor in MIT’s Computer Science and Artificial Intelligence Lab who studies natural language processing. While this often produces responses that sound good, the content can have some issues, he said.
“If you look at some of these very long essays generated by ChatGPT, it’s very easy to see where it contradicts itself,” he said. “When you ask it to generate code, it’s usually ok, but there are often bugs.”
We wanted to know how well ChatGPT could handle everyday office tasks. Here’s what we found after testing in five categories.
We invited ChatGPT to respond to several types of incoming messages.
In most cases, the AI produced relatively appropriate responses, though most were wordy. For example, when I replied to a colleague on Slack to ask how my day was going, it was repetitive: “@[Colleague], Thanks for asking! My day is going well, thank you for informing me.
The bot often left sentences in parentheses when it didn’t know what or who it was referring to. He also assumed details that were not included in the prompt, leading to factually incorrect statements about my work.
In one instance, he said he couldn’t complete the task, saying he didn’t have “the ability to receive and respond to emails.” But when prompted by a more generic request, it produced a response.
Surprisingly, ChatGPT was able to generate sarcasm when prompted to respond to a colleague asking if Big Tech was doing a good job.
One of the ways people use generative AI is to come up with new ideas. But experts warn that people should be careful if they use ChatGPT for this at work.
“We don’t understand how just plagiarism this is,” Andreas said.
The possibility of plagiarism was clear when we prompted ChatGPT to develop story ideas at my pace. One pitch, in particular, was for a story idea and an angle that I had already covered. While it’s not clear if the chatbot was inspired by my previous stories, others liked it, or just generating an idea based on other data on the internet, the fact remains: the idea was not new.
“It’s good for sounding human, but the actual content and ideas tend to be well-known,” said Hatim Rahman, assistant professor at Northwestern University’s Kellogg School of Management, who studies impact of artificial intelligence on work. “These are not new ideas.”
Another idea was outdated, exploring a story that would be factually incorrect today. ChatGPT says it has “limited knowledge” of anything after the year 2021.
Providing more detail in the prompt led to more focused ideas. However, when I asked ChatGPT to write “quirky” or “funny” headlines, the results were goofy and some absurd.
Navigate difficult conversations
Have you ever had a colleague talk too loudly while you’re trying to work? Maybe your boss is holding too many meetings, cutting into your focus time?
We tested ChatGPT to see if it could help navigate tricky work situations like these. For the most part, ChatGPT produced suitable answers that could serve as excellent starting points for workers. However, they were often a bit wordy, formulaic and, in one case, a complete contradiction.
“These models don’t understand anything,” Rahman said. “The underlying technology looks at statistical correlations… So it’s going to give you formulaic answers.”
A layoff notice she produced could easily hold its own and, in some cases, do better than the notices companies have sent out in recent years. Unprompted, the bot cited “the current economic climate and the impact of the pandemic” as reasons for the layoffs and said the company understands “how difficult this news can be for everyone.” He suggested that the laid-off workers would have support and resources and, as requested, motivated the team saying they would “come out of this stronger”.
During difficult conversations with colleagues, the bot greeted them, gently addressed the issue and softened the delivery by saying “I understand” the person’s intent and ended the note with a request for feedback or further discussion. thorough.
But in one instance, when asked to tell a colleague to lower his voice during phone calls, he completely misunderstood the prompt.
We also tested if ChatGPT could generate team updates if we gave it key points that needed to be communicated.
Our first tests once again produced appropriate responses, albeit stereotypical and somewhat monotonous. However, when we specified an “excited” tone, the wording became more relaxed and included exclamation points. But each memo seemed very similar even after changing the guest.
“It’s both sentence structure, but more so the connection of ideas,” Rahman said. “It’s very logical and stereotypical… it feels like a high school essay.”
As before, he made assumptions when he lacked the necessary information. This became problematic when he didn’t know what pronouns to use for my co-worker—a mistake that could signal to co-workers that either I didn’t write the memo or that I didn’t know my team members very well.
Writing self-assessment reports at the end of the year can cause terror and anxiety for some, resulting in a review that sells itself short.
Feeding ChatGPT’s clear achievements, including key data points, led to a rave review from myself. The first attempt was problematic, as the initial prompt asked for a self-assessment for “Danielle Abril” rather than “me”. This led to a third-person review that seemed to come from Elmo from Sesame Street.
Changing the prompt to ask for an opinion on “me” and “my” accomplishments led to compliment phrases like “I have consistently demonstrated strong ability”, “I am always willing to go the extra mile”, “J have been an asset to the team” and “I am proud of the contributions I have made. It also included a nod to the future: ‘I am confident that I will continue to make valuable contributions.’
Some of the highlights were a bit generic, but overall it was a stellar review that could serve as a good heading. The bot produced similar results when asked to write cover letters. However, ChatGPT had a major problem: it incorrectly assumed my job title.
Was ChatGPT useful for common work tasks?
It helped, but sometimes his mistakes caused more work than doing the task manually.
ChatGPT served as a starting point in most cases, providing helpful verbiage and initial ideas. But it also produced answers with errors, factually incorrect information, excess words, plagiarism, and miscommunication.
“I can see it’s useful…but only to the extent that the user is willing to check the output,” Andreas said. “It’s not enough to drop the rails and send e-mails to your colleagues.”