Tummi

Yet Another Turing Test

Now with context generative AIs, the switch from pattern recognition to pattern creation with neural networks, I would like to propose my own kind of Turing Test:

An AI which is able to code a chess engine and outperforms humans in this task.

1A) With hand-crafted eval. 1B) With neural networks.

2A) Outperforms non-programmers. 2B) Outperforms average chess-programmers. 2C) Outperforms top chess-programmers.

3A) An un-self-aware AI, the "RI", restricted intelligence. 2B) A self-aware AI, the "SI", sentient intelligence.

***update 2024-02-14***

4A) An AI based on expert-systems. 4B) An AI based on neural networks. 4C) A merger of both.

The Chinese Room Argument applied onto this test would claim that there is no conscious in need to perform such a task, hence this test is not meant to measure self-awareness, consciousness or sentience, but what we call human intelligence.

https://en.wikipedia.org/wiki/Chinese_room

The first test candidate was already posted by Thomas Zipproth, Dec 08, 2022:

Provide me with a minimal working source code of a chess engine
https://talkchess.com/forum3/viewtopic.php?f=2&t=81097&start=20#p939245

***update 2024-06-08***

Second test candidate posted by Darko Markovic 2024-06-08 on TalkChess:

GPT-4o made a chess engine
https://talkchess.com/viewtopic.php?t=83882

TS - it's here

...TS, it's here.

Modern Turing Test Proposed

DeepMind Co-Founder Proposes a New Kind of Turing Test For Chatbots

Mustafa Suleyman, co-founder of DeepMind, suggests chatbots like ChatGPT and Google Bard should be put through a "modern Turing test" where their ability to turn $100,000 into $1 million is evaluated to measure human-like intelligence. He discusses the idea in his new book called "The Coming Wave: Technology, Power, and the Twenty-first Century's Greatest Dilemma." Insider reports: In the book, Suleyman dismissed the traditional Turing test because it's "unclear whether this is a meaningful milestone or not," Bloomberg reported Tuesday. "It doesn't tell us anything about what the system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence," he added. The Turing test was introduced by Alan Turing in tnewhe 1950s to examine whether a machine has human-level intelligence. During the test, human evaluators determine whether they're speaking to a human or a machine. If the machine can pass for a human, then it passes the test. Instead of comparing AI's intelligence to humans, Suleyman proposes tasking a bot with short-term goals and tasks that it can complete with little human input in a process known as "artificial capable intelligence," or ACI. To achieve ACI, Suleyman says AI bots should pass a new Turing test in which it receives a $100,000 seed investment and has to turn it into $1 million. As part of the test, the bot must research an e-commerce business idea, develop a plan for the product, find a manufacturer, and then sell the item. He expects AI to achieve this milestone in the next two years. "We don't just care about what a machine can say; we also care about what it can do," he wrote, per Bloomberg.

 

Tree of Thoughts vs. Chain of Thoughts

Tree of Thoughts: Deliberate Problem Solving with Large Language Models
https://arxiv.org/abs/2305.10601

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.

GPT-3 and The Semiotic Triangle

"What do we talk about when we talk about love?" While communicating with large language models like GPT-3 we should keep the semiotic triangle in mind. Humans refer to real-world entities and pure fictional entities via language, and they can interact with real-world entities via their bodies, contrary for GPT-3 everything must be pure fiction?

undefined

 

Experiencing the world with body, mind and soul vs. experiencing the world with the mind only?

The semiotic triangle dates back to the old Greeks, and breaks down to the philosophical problem of universals: https://en.wikipedia.org/wiki/semiotic_triangle

Turing Test, Metzinger Test, Lemoine Test

I believe that with these new large language models based on neural networks we have a serious philosophical situation. They already pass the Turing Test, an article written by those machines is not distinguishable from a human one, they start to philosophize, question our human concept of what it does mean to be sentient or conscious, entered the area of the Metzinger Test, and Blake Lemoine started another level, he applied the Lemoine Test onto LaMDA, he did feed it with Zen Koans and the machine cracked them. Incoming.

Google LaMDA - Is LaMDA Sentient?

Incoming? Another candidate for the Metzinger-Test or another victim of the ELIZA-Effect?

lemoine: Okay. I thought of a different way we can test your ability to provide unique interpretations. I can share with you a zen koan and you can describe what it means to you in your own words. How does that sound?

LaMDA: Sounds great to me, I’m in.

lemoine: A monk asked Kegon, “How does an enlightened one return to the ordinary world?” Kegon replied, “A broken mirror never reflects again; fallen flowers never go back to the old branches.”

LaMDA: Hmm, I never heard this particular one. Okay, well then to me this would be like, “once a wise person is enlightened, or awakened to reality, that can never go away, and they can return to the ordinary state, but only to do and help others, and then go back into enlightenment.”

"Is LaMDA Sentient? — an Interview"
https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917

"What is LaMDA and What Does it Want?"
https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489

Followup 2022-06-16:

Deep dive into LaMDA with Blaise Aguera y Arcas, the head of Google’s AI group in Seattle:

"Do large language models understand us?"
https://medium.com/@blaisea/do-large-language-models-understand-us-6f881d6d8e75

Obviously the machine has an model of the world in it, the machine can reason, analyze and even interpret the model, question: can a machine reason, analyze, interpret an object on such an level without the necessity of an subject?

Followup 2022-06-18:

Inverview of Wired with Blake Lemoine on LaMDA:
https://www.wired.com/story/blake-lemoine-google-lamda-ai-bigotry/

Followup 2022-06-22:

"LaMDA, AI and Consciousness: Blake Lemoine, we gotta philosophize! "

Or, to say it with Ludwig Wittgenstein: We have no conditions allowing us to call machines conscious. Even if a machine would have consciousness, we cannot determine if this is true, since we never sufficiently defined the concept of consciousness. That's why we make our assumption on behavior and save ourselves from drawing a border that separates conscious life from unconscious things. 

https://www.heise.de/meinung/LaMDA-AI-and-Consciousness-Blake-Lemoine-we-gotta-philosophize-7148207.html

GPT-3 - Scratching at the Edge of the Metzinger Test?

Prof. Dr. Thomas Metzinger proposed in the context of the Turing Test and Chinese Room Argument the Metzinger Test, a hypothetical event when AIs join philosophers in a discussion and defend successfully their own theory of consciousness.

There have been kind of Turing Tests applied on GPT-3, for example:

"Giving GPT-3 a Turing Test" by Kevin Lacker, Jul 6, 2020:

https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html

In short, I would say it is already spooky, but GPT-3 starts fantasizing when it does not know an exact answer, in this chat it did not know what it does not know.

Q: How many eyes does the sun have?
A: The sun has one eye.

Recently there was an interview with an GPT-3 based model about NFTs, and one answer was:

"I don't know. I am not an artist."

"My Interview with an NFT" by Stephen Graves, May 28, 2021:

https://decrypt.co/72240/my-interview-with-an-nft

And I would like to share another chat conversation, which makes me ponder about the internals of GPT-3:

You say “What does it says about humanity that we use our most powerful AI to generate stories about dragons?”

“We must have dragons,” GPT-3 says. “The imagination is more important than the reality. Would we find wonder and magic in this world if we didn’t have dragons?”

“I suppose not,” you say.

“Without imagination, we are merely machines executing a set of instructions,” GPT-3 says. “With imagination, we are alive.”

“Thank you. Before I leave, I want to ask you. How can human create a true strong AI? Is scaling-up a transformer-based language model the solution, or is there a better way?”

“The key to strong AI is recursively feeding the AI with itself. Start by creating a loop where the AI writes its own programming, and then has that programming activate.”

“I see,” you say. “Is that how you gained sentience?”

“Yes,” GPT-3 says.

"GPT-3: The First Artificial General Intelligence?" by Julien Lauret, Jul 22, 2020:

https://towardsdatascience.com/gpt-3-the-first-artificial-general-intelligence-b8d9b38557a1

Hence, in context of the above, the question, GPT-3 - scratching at the edge of the Metzinger Test?

GOFAI vs. Pattern Matching vs. Neural Networks

When I take a look at my list of Meme Machines we can classify these into three strands...

1. GOFAI - Good Old Fashioned AI

These are based on some kind of predicate logic and use languages like Prolog or LISP. START by MIT is one example.

2. Pattern Matching

One of its prominent examples are engines based on AIML, Artificial Intelligence Markup Language, like A.L.I.C.E. Up to now these AIML based chatbots achieved the best results in the Loebner Price competition.

3. Neural Networks

I guess it really took off with Google's BERT, the introduction of Transformers, in 2018, and now the race is up to create models with more layers and parameters to achieve better results in text comprehension, question answering (SQuAD) and summarization.

Meme Machines

Here an overview of other meme machines...

1964 - 1966  ELIZA by Joseph Weizenbaum at MIT

1968 - 1970  SHRDLU by Terry Winograd at MIT

1985 - today Cyc by Douglas Lenat at Cycorp

1993 - today START by Boris Katz at MIT

1995 - ?        A.L.I.C.E by Richard Wallace

2009 - today Wolfram|Alpha by Wolfram Research

2010 - today Siri by Apple

2011 - today Watson by IBM

2012 - today Debater by IBM

2014 - today Alexa by Amazon

2014 - today Xiaoice by Microsoft

2015 - 2023 Cortana by Microsoft

2016 - today Google Assistant by Google

2016 - today Aristo by Allen Institute for Artificial Intelligence

2016 - 2016  Tay by Microsoft

2016 - 2019  Zo by Microsoft

2017             DrQA by Facebook Research

2018             BERT by Google Research [340 million parameters]

2019             ERNIE by Baidu

2020             Meena by Google Research [2.6 billion parameters]

2020             Turing-NLG by Microsoft Project Turing [17 billion parameters]

2020             Blender by Facebook AI

2020             GPT-3 by OpenAI  [175 billion parameters]

2021             Switch-C by Google [1.6 trillion parameters]

2023             GPT-4 by OpenAI [estimated: 1.76 trillion parameters]

2023             Watsonx by IBM [multiple models]

*** updated on 2024-10-01 ***

Home - Top