Proof of AI Garbage In, Garbage Out: Incorrect Results Traced to Reddit and Quora

We have repeatedly pressed readers not to use AI because its output is unreliable. For instance, a commenter managed to post an AI-generated definition of fiduciary duty. It missed the critical aspect that fiduciary duty is the highest standard of care under the law and requires the agent to put the principal’s interest before his own. If AI can’t get something so fundamental, so widely discussed, and not that hard to get right correct, how can it be trusted only any topic?

And that’s before factoring in that AI makes regular users stoopider. Or that Sam Altman has warned: What you share with ChatGPT could be used against you.

If you are still so hapless as to use Google for search and have it sticking its AI search results in your face, those are unreliable too. AI can’t even compile news sources correctly. From ars technica:

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The researchers tested eight AI-driven search tools by providing direct excerpts from real news articles and asking the models to identify each article’s original headline, publisher, publication date, and URL. They discovered that the AI models incorrectly cited sources in more than 60 percent of these queries, raising significant concerns about their reliability in correctly attributing news content.

We got another example by e-mail from a personal contact in Southeast Asia. He has taught IT in universities hare and the UK. He s also an inventor and had a UK business with over 40 employees based on one of his creations. He is now working on two other devices and has a patent issued on one of them. He showed me an early model of one and the super-powerful custom magnets he’d had fabricated to make it work better. His message:

I’ve been using different AIs (ChatGPT, DeepSeek and Luna) for doing some calculations and finding info on stuff like metal properties and then I started noticing errors. Being autistic I pointed this out – Luna said “oops – don’t worry it’ll be right this time”, ChatGPT said it’s right I’m wrong and DeepSeek sulked and refused to interact anymore.

Anyway, I then used some tools I got when I was at the uni to find plagiarism to find the sources of the data and the majority came from Reddit and Quora – which are hardly sources of accurate information. There appear to be no mechanisms to see if the data is correct, they just scrape websites and take it as gospel.

Bottom line is that a lot of what they present is junk. God help us if say medical professionals rely on it. And I can’t see any way out of it except by getting professionals to check the data and that is very expensive.

Regulars readers may recall that we had previously posted on the fact that AI is now being heavily used in medicine and IM Doc describing the planned outsourcing of diagnosis to AI. From a February 2024 post:

There will be cameras and microphones in the exam room. Recording both the audio and video of everything that is done. The AI computer systems will then bring up the note for the visit from thin air – after having watched and listened to everything in the room. Please note – I believe every one of these systems is done through vast web services like AWS. That means your visit and private discussions with your doctor will be blasted all over the internet. I do not like that idea at all. This is already being touted to “maximize efficiency” and “improve billing”. My understanding from those that have been experimented upon as physicians, that as you are completing the visit, the computer will then begin demanding that you order this or that test because its AI is also a diagnostician and feels that those tests are critical. It will also not let you close the note until you have queried the patient about surveillance stuff – ie vaccines and colonoscopy, even for visits for stubbed toenails. And unlike now when you can just turn that stuff off, it is in control and watching and listening to your every move. The note will not be completed until it has sensed you discussing these issues with your patient and satisfied that you pushed hard enough.

I understand also that there is a huge push to begin the arduous task of having AI take over completely things like reading x-rays and path slides. Never mind the medicolegal issues with this – ie does the AI have malpractice insurance? Does it have a medical license? Who does the PCP talk to when there is sensitive material to discuss with a radiologist, as in new lesions on a mammogram etc? Are we to discuss this with Mr. Roboto?…

The glee with which the leaders of this profession are jumping into this and soon to be forcing this upon us all gives one a very sick feeling. Complete disregard for the ethics of this profession dating back centuries.

IM Doc later provided a horrorshow example of the hash it makes of transcribing patient notes. In one case, it invented multiple serious illnesses the patient had never had and even a pharmacy that did not exist. Extracted from his message:

This is happening all the time with this technology. This example is rather stark but on almost 2/3 of the charts that are being processed, there are major errors, making stuff up, incorrect statements, etc. Unfortunately – as you can see it is wickedly able to render all this in correct “doctorese” – the code and syntax we all use and can instantly tell it was written by a truly trained MD.

This patient actually came into the office for an annual visit. There was nothing ground-shaking discussed….

This patient is on no meds that are not supplements. There are no prescriptions – and yet we supposedly discussed 90 day supplies from Brewer’s Pharmacy in Bainesville. There is no pharmacy nor town anywhere around here that even remotely sounds like either one. A quick google search revealed a Bainesville MD, far away from where we are – but as far as I can tell there is no Brewer’s Pharmacy there – the only one in the country I could find was in deep rural Alabama.

The last paragraph was literally the only part of this entire write up which was accurate…

This is what I do know however

1) Had I signed this and it went in his chart, if he ever applied to anything like life insurance – it would have been denied instantly. And they do not do seconds and excuses. When you are done, you are done. If you are on XXX and have YYY – you are getting no life insurance. THE END.

2) This is yet another “time saver” that is actually taking way more time for those of us who are conscientious. I spend all kinds of time digging through these looking for mistakes so as not to goon my patient and their future. However, I can guarantee you that as hard as I try – mistakes have gotten through. Furthermore, AI will very soon be used for insurance medical chart evaluation for actuarial purpose. Just think what will be generated.

3) These systems record the exact amount of time with the patients. I am hearing from various colleagues all over the place that this timing is being used to pressure docs to get them in and get them out even faster. That has not happened to me yet – but I am sure the bell will toll very soon.

4) When I started 35 years ago – my notes were done with me stepping out of the room and recording the visit in a hand held device run by duracells. It was then transcribed by secretary on paper with a Selectric. The actual hard copy tapes were completely magnetically scrubbed at the end of every day by the transcriptionist. Compare that energy usage to what this technology is obviously requiring. Furthermore, I have occasion to revisit old notes from that era all the time – I know instantly what happened on that patient visit in 1987. There is a paragraph or two and that is that. Compare to today – the note generated from the above will be 5-6 back and front pages literally full of gobbledy gook with important data scattered all over the place. Most of the time, I completely give up trying to use these newer documents for anything useful. And again just think about the actual energy used for this.

5) This recording is going somewhere and this has never been really explained to me. Who has access? How long do they have it? Is it being erased? Have the patients truly signed off on this?

6) This is the most concerning. I have no idea at all where the system got this entire story in her chart. Because of the fake “Frank Capra movie” style names in the document I have a very unsettled feeling this is from a movie, TV show, or novel. Is it possible that this AI is pulling things “it has heard” from these kinds of sources? I have asked. This is not the first time. The IT people cannot tell me this is not happening.

I have no idea why there is such a push to do this – this is insane. Why the leaders of my profession and our Congress are all behind this is a complete mystery.

After sending the sightings from the inventor, IM Doc replied:

This week, the students and I had a patient in the office with COVID. A woman with multiple co-morbid conditions, very ill at baseline. She is on both a statin and an SSRI, and amiodarone for her heart issues. There are 3 other drugs – HCTZ, ASA, and occasionally some Advil for pain.

The student was getting ready to give her Paxlovid for her COVID. When confronted with the fact that she is on 3 drugs which are absolutely contraindicated with Paxlovid, and one other that is conditionally contraindicated she informed me that ChatGPT had told her that all were just fine. This young woman is a student at one of our very elite medical schools – and she looked at me and said “Your human brain tells you this is a problem, the AI has millions of documents to look through and has told me this is not a problem…….I will trust the AI”

I said, “Not with my patient, you don’t”.

I have to be honest – I was so concerned about this I did not even know where to start with the student. AI has now officially become a part of the youth brain’s neocortex. I am just about to give up on this entire generation of medical students. It is a lost cause at best.

KLG had a more mundane example:

Trivial but real failure of AI/LLM on a simple question I used as a test after reading about the medical student who loves her some AI.

Query: Oklahoma cheats to win against Auburn
Answer: There are no verified reports or evidence of cheating in the game between Oklahoma and Auburn.

In fact I have a list that includes about 20 links that prove Oklahoma cheated by using the dishonest move of having a wide receiver pretend to leave the field for a substitution and then scoring on a pass play because he was not covered by the Auburn defense. This has been illegal, as in cheating in the form a “snake in the grass play,” at every level since I played football from third grade through high school. Presearch.com AI is clueless, though.

Harry Shearer has made AI a personal project:

So we again exhort readers: do not use AI. Please discourage others from using AI. Large language models need so much content as training sets that they not only can’t afford to discriminate in terms of content, but they are even eating their bad output as part of their training sets. If you need remotely accurate answers, you need to opt out.

Print Friendly, PDF & Email

12 comments

  1. The Rev Kev

    ‘If you are still so hapless as to use Google for search and have it sticking its AI search results in your face, those are unreliable too.’

    You got that right. A few weeks ago I was looking for information on a very small village in Devonshire called Morcombe so I typed in Morcombe Devonshire. The Google AI summary and every single entry on the first page was all about a famous local murder victim named David Morcombe from over twenty years ago and nothing from Devonshire at all. Not the first time I have seen this behaviour either where the Google AI will pick one word in a search term, even if it is spelt differently, and then shove results back on the whole page of what it guesses you want, ignoring the original correct spelling.

    Reply
      1. TiPi

        There is an Earl of Devon with a castle in Devon.. yet ‘Devonshire’ has been the Dukedom… and of course with the English love of geography the Duke of Devonshire’s Cavendish family estates are mainly in Derbyshire with the ancestral haemorrhoid being Chatsworth House in Derbyshire plus Bolton Abbey in Yorkshire. And then there is the Irish estate on County Cork….

        So the answer is that ‘Devonshire’ is in Derbyshire, Yorkshire and County Cork.
        Wherever he lays his hat ‘n’ all that.

        Reply
        1. The Rev Kev

          And just to throw a spanner in the works you get places like Cornwall. In early WW2 the British were taking down street and town/village signs to confuse any invading Germans. To really confuse them, it would have been better if they had kept the signs in place. :)

          Reply
      2. Revenant

        Devonians unite, against Victorians and their shiring! But knock it off with the “in England” business, please, it’s the kingdom of Devon! Or possibly Greater Cornwall! ;-)

        There’s a deep historical point here. Neither Devon nor Cornwall was part of the Anglo-Saxon core of the UK. The land was enclosed in ancient times, from the Bronze Age, and the land form is different.

        The Dumnonii tribe held Devon and Cornwall as a Cornish-speaking Brittonic kingdom (indeed, Devon was referred to as East Cornwall, part of a Greater Cornwall covering the whole peninsula) but the kingdom dissolved around the 9th century. A substantial exodus settled Brittany.

        Anglo-Saxons gradually migrated west into Devon, displacing Cornish-speakers. Most placenames in Devon are Anglo-Saxon in origin and most placenames in Cornwall are celtic but if you look carefully, a lot of celtic placenames survive in Devon. The most obvious are the names of the rivers, which are of ancient use (e.g. the Exe is a derivative of the same celtic root as iasc in Irish, meaning fish) but villages like Poltimore (Pwll ty Mor, the pool of the big house) and towns like Dawlish are also celtic in origin.

        This difference in settlement can be seen in the gene pool, which is distinct from that of the rest of England and carries more markers of Wales and Ireland (historic sources of migration to Devon and Cornwall) and Brittany (a recipient – and possibly an ancient source) and fewer of Vikings and Germanic peoples (because it was a long way to come raiding!).

        https://peopleofthebritishisles.web.ox.ac.uk/population-genetics

        I don’t know how fully the Anglo-saxon shire system, of hundreds and reeves, was ever operated in Devon and Cornwall. I suspect not much given their successors the Normans never fully imposed feudalism: the landscape comprised large isolated farms with ancient enclosed fields, rather than villages with the open field system.

        Cornwall proper remained Cornish-speaking and a separate kingdom until much later in English history. Even in Victorian times, acts of Parliament applying in Cornwall were explicitly named as such and even now, the Duchy of Cornwall is very quietly treated differently in Acts of Parliament because the Duke of Cornwall is a duke palatine and has the rights of a monarch unless the King or Queen of England is physically present in the county.

        Reply
  2. Louis Fyne

    Tesla “Autopilot” and even Uber is one reason why tech is such a mess in that…

    the ***bipartisan*** regulatory pushback—essentially none—against Autopilot and Uber (jitney laws), sure there are many more, essentially green-lit the “ask for forgiveness, not permission” roll-out model when it came to regulatory approval of new widgets.

    I will have my popcorn handle when the tort lawyers arrive

    Reply
  3. matt

    I’ve been hesitant to use LLMs because I just don’t see the point. Would rather read through a manual or stackoverflow for answers. But my professors will answer questions with “ask chatgtp” and its so confusing. In undergrad classes all the professors were like “dont use chatgtp do it yourself” but the minute i take a grad class the professors are all “use chatgtp to debug your code, use chatgtp to explain this like by line.” I have started incorporating LLMs into my workflow because they are ok at helping with code generation and act like the summary of several stackoverflow questions. I spent several hours doing homework with a chinese friend of mine and she would have chatgtp explain what was going on in chinese then translate pieces into english so I could read them. Chatgtp was wrong about the underlying physics (unsurprisingly) but was once again helpful in producing code to use as a template.
    I say this because the people hate on chatgtp but in engineering grad school the professors really want you to use it for homework. (Also some undergrad classes. My fluid mechanics professor was a grad student and not a professor but he too had us use chatgtp for the homework and it gave us incorrect physics but correct code.) The only professors who are like “dont use this” are either liberal arts professors or teaching 100 level classes to freshmen.
    I miss the pre chatgtp era… i made one of my best friends from a deal where i wrote her english essay, she did my online chemistry homework. I used to help my friend run an essay writing business. No longer.

    Reply
  4. Another Anon

    Recently, I used an AI to search for research papers done in my scientific specialty. What I got back was a single one written by myself about twenty years ago. Apparently, the AI thought that my paper was the only significant one in the field. I was kind of flattered that the AI thought so highly of a work that only got seven citations when I knew of others that got many more.

    Reply
  5. TiPi

    I did a recent Google search on Harold Wilson, as there was a debate on how much of a socialist the Huddersfield lad really was. The Colne Valley, his birth place, was actually a Liberal stronghold for many years, and his father had been a Liberal.

    I knew he had been strongly influenced by the Guild Socialism of GDH Cole, the 20thC historian, while at Oxford, and it was actually Cole who had persuaded Wilson to join the Labour Party.
    (Now Guild Socialism is definitely due a re-appraisal for those on the left.. but that is another story)

    Yet the first Google AI search result claimed they might have, but had probably never even met…

    Reportedly, Google intend to make their AI their default and the AI mode the automatic search response. …

    If AI is already erroneously rewriting history at this most basic level, then sadly there will be many future career opportunities at MiniTrue lost to this wondrous technology… Hey-ho…

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *