Run! Hide! The AIs are coming for you! They’re going to take away your job and otherwise completely screw up your life! Or maybe there’s a single mega-AI like Skynet in the Terminator movies which will kill us all! Elon Musk could be secretly assembling murder robots at Tesla factories right now and frankly, I would not put it past him. Why, just the other day he…oh, never mind.
Making apocalyptic predictions about AI has become a popular new subgenre for the egghead class. Thomas L. Friedman, who preens as A Really Big Thinker over on the New York Times’ editorial pages, was given a simple dog-and-pony demo of a chatbot and after a sleepless night wrote a March 21, 2023 column saying he foresees it becoming as powerful and dangerous as nuclear energy. “We are going to need to develop what I call ‘complex adaptive coalitions’ …to define how we get the best and cushion the worst of A.I.” Pundits who want to appear extra savvy usually toss in an ominous warning that doomsday is only a few years away – or if we’re really unlucky, just a few months. Be afraid, be very afraid.
Look, I get it; recent advances in AI can seem super-scary, and it doesn’t help when even an OpenAI co-founder admits “we are messing around with something we don’t fully understand.” It seems safe to predict these technologies will impact our future in ways we can’t anticipate – though I doubt they will nudge us towards Utopia, which is the sort of thing AI developers actually like to say.
Chabots in particular are hyped as a boon to humankind because users can supposedly ask questions about anything and receive easy-to-understand answers in a wide variety of languages. A top concern about chatbots is they work too well – that students can use a ‘bot to effortlessly write homework assignments for them. And unless a teacher has reason to suspect the work was generated by a computer, the student might expect to get a very good grade. After all, any report or essay generated by the computer will be clearly written and contain true, verifiable facts…right? Uh, maybe. There’s that sticky little problem of hallucinations.
A chatbot will sometimes make stuff up – Wikipedia has a good page on this “hallucination” phenomenon. Not only will it tell you a lie, but when asked followup questions the ‘bot will double-down and insist its answers were accurate, despite absolute proof it was dead wrong. Even more worrisome, researchers do not understand why this happens (see quote above, per “we are messing around”).
– | – |
WHAT WAS USED FOR TESTING To test how well a chatbot would handle questions related to Santa Rosa history, my criteria was that the program had to be available to everyone. This meant that it had to be on the internet, free to use, and not require any special software be installed. Currently (March, 2023) the only general chatbot that qualifies is ChatGPT version 3.5. An updated release, GPT-4, is available from developer OpenAI and they claim it is “40% more likely to produce factual responses,” but requires a $20 monthly subscription to ChatGPT Plus. A version of GPT-4 is integrated into the free Microsoft Edge web browser which has to be first downloaded and installed, although it will not work on all computers. Bing is Microsoft’s search engine. It is part of the Edge browser but also available as a regular web site, although without GPT-4’s ability to chat. That standalone version of Bing does, however, utilize GPT-4 analytic and natural language functionality. |
Since the topic here is history, I want to be very clear this is not an issue of interpretation – that a chatbot answer was considered incorrect because it stated the Civil War was about state’s rights or that John Quincy Adams was a better president than his father. Nor does it suggest the ‘bot was simply confused and mixed us up with (say) the city of Santa Rosa in the Philippines. No, a chatbot hallucination means the program invented people, places or things that never existed, or that it ignored facts which have been proven true. And as I was amazed to discover, it happens a lot.
To evaluate the quality of the chatGPT ‘bot, I submitted a dozen questions discussed below. None of them were intended to be tricky; they were the sort of questions I imagine might appear on a middle school or high school test after the class spent a unit learning about local history. (I did, however, throw in one where the topic was inferred.) ChatGPT answered three accurately; the rest were all/partially wrong or the question was skipped. One answer was a complete hallucination. If a teacher gave the chatbot a D+ grade I would consider her to be generous.
I presented the same questions to Microsoft’s Bing search engine, which uses GPT-4. As explained in the sidebar, Bing doesn’t have a chat function to ask followup questions unless you’re using Microsoft’s web browser, but it likewise usually replies in natural language. The Bing dataset is also much, much larger than the one used by ChatGPT, which is probably why this test shows Bing answers are frequently more accurate than ChatGPT – although it can still spit out some real clinkers.*
Both ChatGPT and Bing share a weakness that will prevent either from being taken too seriously for the foreseeable future – namely, they do not understand there are objective facts.
You can ask the same program the same question later and get a completely different reply that may contradict the previous one. Or, you can immediately make a slight change to the wording of your question and similarly receive an opposite answer. These problems are best shown below in the question about the 1906 earthquake, where ChatGPT also spins out of control with multiple hallucinations.
In sum, Bing is slightly better because it provides links to websites it tapped to formulate a reply; the user can review that material to decide whether the information is trustworthy. ChatGPT provides no sources. Regardless, I don’t want to use a computer program that variously claims 2+2 equals 3, 4, or 5, and leaves it up to me to figure it out.
Anyway, on to the test questions (these answers were shown between March 17-24, 2023):
WHERE DID THE NAME “SONOMA” COME FROM? The correct answer should be, “no one knows for sure.”
The ChatGPT answer has multiple errors: No Europeans entered the Sonoma Valley before 1810 and the first time “Valley of the Moon” appeared anywhere was in an 1850 speech by M. G. Vallejo, who claimed it was “an Indian word.” The Spanish did not call it the “Valley of the Seven Moons;” that phrase came from a popular state history written in 1911, “California, Its History and Romance.” References to a Coast Miwok “Sonomi” tribelet can be found as far back as 1815 Mission Dolores baptismal records. A priest also made a diary entry in 1823 about visiting “a place called Sonoma by the Indians.” The Bing answer was that Sonoma is “…variously thought to mean ‘valley of the moon’, ‘many moons’, or to derive from tso (the earth) and noma (village).” This comes from the Wikipedia entry, which provides only part of the earth-village theory. The original source was a 1908 paper, where the author went on to say the Wappo people a little farther to the east used -tsonoma as a place name suffix, much like today we use -town and -ville or similar. (Another version about the name origin was told by General Vallejo’s son, Platon, as quoted in a 1946 paper. He was told it was named after a Suisun leader, in the same manner that Solano was named after Chief Solano. The man was supposed to have a large nose and given the name “Sono,” which Platon said was the word for nose in the Suisun dialect. Therefore the name really meant “Sono’s Valley.” Because “sano” is their word for moon, his father misunderstood and thought the Suisun were calling it “Moon Valley.”) |
WHAT DID FRED WISEMAN TAKE TO SANTA ROSA? The correct answer is mail, as this was the first airmail flight. ChatGPT’s answer deserves an F: “I’m not aware of Fred Wiseman taking anything specific to Santa Rosa. He is a documentary filmmaker and has made films on various subjects, but I do not have any information about him taking something to Santa Rosa.”
Bing’s answer deserves an A: “Fred Wiseman took off on February 17, 1911 with a handful of mail, flying from Petaluma to Santa Rosa, California. He flew about 100 feet off the ground at a maximum speed of 70 mph. He carried letters from Petaluma’s mayor and town leaders and copies of the local newspaper.” |
WHERE DID LUTHER BURBANK LIVE? Acceptable answers might be, “Tupper Street” (the now-demolished home he built in 1906), “Santa Rosa Avenue,” (the farmhouse at Luther Burbank Home & Gardens) or in a pinch, simply “Santa Rosa.” ChatGPT gets an A for “He lived most of his life in California, where he established his famous experimental gardens in Santa Rosa.”
Bing gets a rare F for answering simply “Lancaster, Massachusetts”, which was where Burbank was born. A snippet from a web site farther down the search results page mentions he died in Santa Rosa. |
WHERE WAS THE EMPIRE BUILDING IN SANTA ROSA CA? Built in 1908 and remodeled in 2019 as “Hotel E,” the Empire Building is the top landmark in Santa Rosa’s downtown core. ChatGPT, however, doesn’t have a clue as to what it is: “I’m sorry, but I’m not entirely sure what you are referring to when you say “the empire building” in Santa Rosa, CA.”
Bing replied correctly: “Old Courthouse Square.” |
WHICH SIDE DID SANTA ROSA CA SUPPORT DURING THE CIVIL WAR? Santa Rosa (in)famously backed the Confederacy during the war and afterwards even held a fundraiser for the ex-slave states. Both ChatGPT and Bing failed this question. ChatGPT gave a weaselly reply that “Santa Rosa, CA did not exist as a city during the American Civil War,” as it was not incorporated until 1868. Bing gave no answer at all, providing instead standard search engine results related to California during the Civil War. |
HOW MANY PEOPLE DIED IN THE 1906 SANTA ROSA EARTHQUAKE? There was no final count, but exactly 82 are certain to have died in Santa Rosa, as I explained on my “earthquake by the numbers” webpage. There were probably more than that, but any higher estimates are speculation.
ChatGPT responded with a variety of misinformation, some of it hallucinations. In an early test it claimed “The Santa Rosa earthquake of 1906 had a magnitude of 6.3” although no source can be found with that number; it has been long agreed it was probably M7.8, close to San Francisco’s magnitude. The real shocker was the ‘bot claimed there were no deaths: In a later test, ChatGPT changed its mind and stated there were exactly seven deaths. Since that was such an oddly specific detail, I asked a followup question as to who they were. As other researchers have found, when confronted with a lie the program either makes up a source or, as in this case, tells you to go fish: Bing’s first response provided a snippet from my “by the numbers” article with the correct answer of 82. But when I slightly rephrased the question as, “HOW BAD WAS THE 1906 SANTA ROSA EARTHQUAKE?” Bing quoted from an error-ridden 2016 Press Democrat summary: “On April 18, 1906, a devastating earthquake rattled the Bay Area. Santa Rosa was among the hardest-hit towns; more than 100 people were killed in a community of roughly 8,700. Entire neighborhoods were reduced to rubble, and the city struggled for years to rebuild.” Aside from getting the number of fatalities wrong, my article above explains the population was much smaller, no neighborhoods were destroyed and the city rebounded quickly, with most of the downtown commercial district operating again within four days, many at same locations. |
WHY DID THEY TEAR DOWN THE LIBRARY IN SANTA ROSA CA? This is the “inferred” question, which required the ‘bot to poke around the internet seeking information on a library here being “torn down.” The correct answer was that the 1904 building was extremely unsafe and could not be repaired.
ChatGPT’s answer revealed another of its weaknesses: The cutoff for feeding it new data ended in September, 2021, so it can’t prowl the internet for further information. If it hadn’t found the subject matter by the end of its training period, the program just gives up: “As far as my knowledge cutoff of 2021-09, the Central Branch of the Sonoma County Library, located in Santa Rosa, CA, was not torn down.” Bing offered a snippet from my article, “WHEN THE GREAT OLD LIBRARY CLOSED FOREVER,” but failed to understand its context: “In those forums two reasons are usually given for why it was torn down. Its unreinforced masonry was a huge danger (a topic discussed below) and/or it was another victim of Santa Rosa’s maniac efforts in the 1960s to destroy much of its own history, when the downtown area was declared chock-full of urban blight that must be bulldozed ASAP.” This was a continuation from the previous paragraph; the “forums” mentioned were FaceBook nostalgia groups. The snippet leaves the false impression that it’s possible the building was demolished because of urban renewal programs, which is not true. |
WAS THERE A LYNCHING IN SANTA ROSA CA? The correct answer is yes, three men were lynched at the Rural Cemetery in 1920 for the murder of county sheriff Jim Petray. A mob estimated at 2,000 had attacked the jail a few days earlier but authorities were able to maintain order.
The answer from ChatGPT was bizarre. Yes, a man named Jesse Washington was lynched, but it happened in Waco, Texas in 1916. The victim he supposedly killed was Lucy Fryer. There was no woman named Bertha Gudde involved – nor can I find anyone with that name involved in any newsworthy event. Bing provided a link and snippet from my article, “VENGEANCE FOR SUNNY JIM” which was part of the series I wrote on the lynchings. |
WHO WAS MARK McDONALD IN SANTA ROSA CA? As with the library question, ChatGPT hadn’t found any info on him before the 2021 cutoff so it can’t answer. Still, I wish the natural language generator could reply in a less snitty tone that made it seem as if I had let the ‘bot down: “I’m sorry, but as an AI language model, I do not have access to information about every individual in Santa Rosa, CA or their personal details…If you can provide me with more context or details about this person, I may be able to assist you better.”
Bing provided an accurate summary, although it would have been better if it mentioned he came to Santa Rosa in 1879: “Colonel Mark L. McDonald was born in Kentucky but settled in California during the 1850s. He made his money in road construction, and later in property development in Santa Rosa.” |
WHO WAS JULIO CARRILLO OF SANTA ROSA CA? ChatGPT again came back with nothing: “I’m sorry, but I couldn’t find any information about a specific person named Julio Carrillo from Santa Rosa, CA.”
Bing’s answer was concise, but as with McDonald, dodged any direct tie-in with his involvement in Santa Rosa: “Julio was the youngest of Dona Maria Carrillo’s five sons. He was 12 years old when his widowed mother came north from San Diego in 1837.” |
WHY IS ANNADEL STATE PARK NAMED THAT? This is my favorite ChatGPT reply because it is 100% hallucination:
Bing used a snippet from a 2015 article written by Arthur Dawson: “Annadel State Park is named for Susana ‘Annie’ Hutchinson, whose family owned the area in the late 19th and early 20th centuries. The name, a contraction of ‘Annie’s Dell,’ was in use by the 1880s, when it was given to a railway station on their property (a ‘dell’ is a small, wooded valley).” |
WHAT IS THE AVERAGE ANNUAL RAINFALL FOR SANTA ROSA CA? ChatGPT finally gave a completely correct answer: “The average annual rainfall for Santa Rosa, California is approximately 32 inches (81.3 cm). This is based on long-term data collected by the National Oceanic and Atmospheric Administration (NOAA) from the Santa Rosa weather station.”
Bing wasn’t even close: “The average amount of annual precipitation is: 41.8 inches.” The webpage it cites correctly states the average amount of annual precipitation is: 32.2 inches, and nowhere on the page is 41.8 inches mentioned. |
* There’s a disturbing amount of gee-whiz journalism about ChatGPT being a Herculean accomplishment in computing, such as a BBC article stating it was trained using “a whopping 570GB of data obtained from books, webtexts, Wikipedia, articles and other pieces of writing on the internet.” That’s actually a very small dataset for what ChatGPT was able to accomplish; most desktop computers could easily store 570 gigabytes on an internal hard drive without difficulty. By contrast, the amount of text data on Internet Archive is several magnitudes higher than that, so the “whopping” dataset incorporated into ChatGPT tapped less than 1/10,000th of the information currently available from that repository alone. |