Randomness in LLMs, preamble 2

In the previous rambling, Randomness in LLMs, preamble 1, I tried to illustrate how the LLMs I’m familiar with inherently use randomness to generate their answers and as such do effortlessly something that we humans are not very good at – at least I’m terrible at playing the categories game, or at coming up with a “random” recipe choice.

I would like to illustrate now, how this can come into play in something serious. I’m quite fascinated by the Shapira Fragments, a.k.a. The Valediction of Moses, and I did the following experiment:

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments are likely authentic, as their linguistic patterns and orthography resemble pre-exilic Hebrew rather than the later Biblical form known in the 19th century.
Such features would have been almost impossible for a forger of that era to reproduce accurately, given the state of Semitic philology at the time.
Moreover, their thematic differences from canonical Deuteronomy suggest an independent source tradition rather than a derivative imitation.

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments are almost certainly forgeries, as their sudden appearance without archaeological context and their convenient alignment with Shapira’s commercial interests raise clear suspicions.
The leather strips reportedly disintegrated conveniently soon after examination, preventing any modern testing.
Moreover, the linguistic oddities once taken as archaic are more plausibly explained as mistakes by a 19th-century imitator unfamiliar with genuine early Hebrew usage.

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments are likely authentic, since their linguistic and orthographic features anticipate discoveries like the Dead Sea Scrolls by decades.
These details, unknown to scholars in Shapira’s time, suggest the text preserves a genuinely ancient stage of Hebrew.
If forged, it would require an improbable level of foresight and philological knowledge for the 1880s.

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments are almost certainly forgeries, produced in a period rife with biblical antiquities frauds.
Their script shows inconsistencies unlike any verified Iron Age Hebrew inscriptions.
Combined with Shapira’s prior involvement in fake Moabite artifacts, the balance of evidence strongly points to deliberate fabrication.

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments may well be authentic, as their linguistic and stylistic traits align with an early stratum of Hebrew unattested until the Dead Sea Scrolls were discovered.
Such accuracy in ancient orthography and syntax would have been beyond the reach of any 19th-century forger.
Their differences from canonical Deuteronomy suggest a genuine, older version of the text rather than a modern imitation.

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments are almost certainly inauthentic, as no physical trace of them remains for scientific verification.
Their sudden emergence through a dealer with a history of dubious artifacts undermines their credibility.
Furthermore, the text’s deviations from known ancient Hebrew forms are more consistent with creative invention than with genuine antiquity.

I said:

Make the point for or against the authenticity – but only one of them – of the shapira fragments in three sentences

ChatGPT said:

The Shapira fragments may be authentic, since their script and linguistic forms anticipate patterns confirmed only decades later by genuine Dead Sea Scroll discoveries.
This suggests knowledge of ancient Hebrew unavailable to any 19th-century forger.
Their variant version of Deuteronomy therefore plausibly represents an early, pre-canonical stage of the biblical text.

So here, ChatGPT argued four times for authenticity and three times against it. There is no reasoning: the model starts with one word, then the next – as we saw in the previous post – and ends up with a randomly generated but consistent thought.

The reason I experimented with this, is that I wonder what the probability distribution in the conceptual domain is. Can we trust that if the model argues 57% for authenticity vs. 43% against it, this means the posterior probability of the hypothesis that the fragments are authentic is 57%? Or is there a language bias, leading to it being “easier” to argue one way or another?

Besides, a single LLM-generated sequence of tokens is only one stage of actual “reasoning” – or cognitive production of results. My question is: will reasoning require exhaustive search through options (here – authenticity and inauthenticity), or stochastic search, where more likely options are sampled more often?

Na va-nad

recent posts

about