Our AI to AGI to ASI model- Immortality Knowledge Base

Our AI to AGI to ASI model

Our AI is not military grade research. Critiques to current LLM technology is essential for improvements, and many in the field report that AGI is not possible with the current LLM technology.

In Attention Is All You Need, transform grows out of translate and are two identical concepts. Translation of language is as correct as a mathematical operation in essence. There is a correct translation and incorrect translation, and there is an optimal answer out of a number of possible answers. Having a word table that replaces one word for the translated word is never correct, languages have nuances, semantics and rules differ and a neural network is employed to encode the complex statistical association and rules of language semantics. The transform on the other hand suggests mathematical equality is possible in language, that translation in conversation is converse but essentially the same, there is a correct conversation. Remembering that chatbots were hand coded in the late 80s and 90s and never could achieve the functional level. The idea of transform is exciting because it explores computer translation and computer conversation as essentially the same problem. The correct translation and the correct transformation and the architecture that is able to output it. The fundamental problem is the correct transform is not in the model, the circuit is never built because humans do not know the correct transform themselves. Language model's ability to be coherent as if you were talking to a friend is then the advance, whether it is ultimately correct in terms of maths, science and essential facts is not essential, yet. The acceptable transform fools most, the optimal transform is another level.

Essentially, transformer model ability to convey meaningful information in a human-like manner is undeniable and cannot be ignored. Language, communication, and intelligence are inherently subjective, and exposing flaws in the model remains relatively easy. The architecture shows clear limitations when faced with subjects it has little or no training data about, anything beyond its training data, including new developments, future events. When challenged with novel ideas, pushing the model to the limits of a concept often reveals a shallow depth of understanding. Interactions can feel more like extracting information from a webpage rather than engaging in dialogue with a field expert. The model is prone to errors, biases, and may present correlations or hearsay as fact, much like humans do. Fundamentally, it cannot exceed collective human knowledge; while it often outperforms individuals, it functions more as a sophisticated, glorified encyclopedia and as an assistant. Despite its efficient and convenient interface, the model's threshold of capability presents a significant challenge, particularly in fields like healthcare where our goals require exceeding the current human knowledge corpus to truly assist researchers in pushing beyond existing limits.

In human history...

Coherent language, communication - first major milestone.
Recording knowledge, writing, second major milestone.
New knowledge, invention, problem solving, advancing recorded knowledge, a major talent and the real hope and dream of the human race.

We have been able to use machines to record knowledge for a while now, recently the exhibition of coherent language is amazing, while the construction of new knowledge, correcting incorrect current knowledge and advancing current knowledge is an ability machines still cannot do.

3 Objectives:

Goal driven A.I. Given a disease transform into the cure. Transform the unknown correctly.
Expert system, I do not know anything about the market, but I want to invest. Be my guide, advise, infallible expert assistance.
Given an unknown, elucidate correctly, such as a photo of an unknown disease or a video of some low level biological process.

Building large language models (LLMs) is experimental and even if all the steps are followed, the result can be highly variable. It is also resource intense and time-consuming, making it a real challenge to incorporate novel ideas to the best possible build, as novel ways of improving LLM's are released everyday by the scientific community. Here, we present and explore our approach to Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). New papers are released every day and testing new techniques to verify improvement means training and testing models fast. Current computer architecture is not optimized for training, testing and inference and the graphics card won't suffice. Perhaps the graphics card must become the motherboard and rethink computer architecture for A.I. training and inference. The faster we can train and tweak experimental models, the faster the advance. In the next generation of models, the training data is so far altered that fine-tuning probably won't suffice and require rebuilding from scratch. Avoiding a full rebuild from scratch without any downsides, is better. These rebuilds use hundreds of thousands of GPU's over many months at full electrical use capacity.

Definitions:

AGI, the general in artificial general intelligence refers to broad skilled rather than average grade of intelligence, as in the opposite of narrow A.I. Its capacity is not referring to average, rather it should be of a level comparable to that of a master, professor, doctorate in every field, importantly it does not have to exceed human capacity, it can solely derive from within the human corpus. Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that matches human capabilities across a wide range of cognitive tasks. This is in contrast to narrow AI, which is designed for specific tasks.
ASI, surpasses, is an AGI plus superintelligence. It exceeds human collective ability across all fields, exceeds master, professor, doctorate capability and to an ever-increasing degree. The ASI must prove it to the human, it may be better suited to the scientific journal process rather than conversational as humans might discount and resist it.

The ability for a machine to exceed human performance and human ability both physically and mentally has already been demonstrated, it is possible.

There are 7 ways to go here.

1) Modular and Fasces, fasces many narrow AI's into an AGI. Focus on narrow AI, go granular, component the model, and when each specific competency equals or exceeds human ability, add it to a fasces model where all the narrow competencies, specializations, experts are bound together eventually resulting in an AGI, components do not have to be limited to a neural network only such as Deep Blue (which was not a neural network but used Alpha-Beta search algorithm to perform a state space search), Alpha Go, and recently AlphaProof and AlphaGeometry (which use self play), recently achieved a silver medal in the International Mathematical Olympiad. Adding Monte Carlo or reasoning could become components to call on, rather than an abstracted concept spread across the LLM. Specializations, each narrow A.I. is worked on until they meet and exceed human capacity. The fasces is either a smart router LLM base which activates hundreds of narrow LLMs in an inference, alternatively putting all the training data together into a single monolithic model. This is not an ensemble, hierarchical ensemble or agents as a strategy to yield improvement, rather experts, where each specialization, each narrow AI equals or exceeds human capacity to become eligible for inclusion. Manufacturing each individual expert to be extra human in competency, universal and versatile, requires the best human minds to produce, is expensive and may take many years. The knowledge base becomes outdated in the model, so the knowledge base could be a separate module. Another strategy, define the problem in its simplest form, shrink it, solve that and then scale that solution. Problem solving and logical reasoning.

2) Expand on Reasoner, within the LLM some elements are special to intelligence and identifying and advancing those objects to yield a universal improvement. Rather than mathematics, it might be "the reasoner" as described in the STaR method and similar types names... make up one... "the IQ rater", "the generalizer", "the hypothesizer", and more of those kinds of elements in the LLM. These elements are like unrefined ore present in every LLM, identify those elements of self-learning within its corpus such as high level boolean, great at 20 questions, crossword puzzles, forging the problem into a game, and focus on developing those and making them more dominant in the conversation for a system-wide improvement. We do not know what intelligence elements exist in an LLM, so there could be a periodic table of intelligent elements already in the LLM. The sophistication of its periodic table of intelligent elements.

3) Super plastic models (git), a different development process, if the quality of the LLM is in the neural network, then a model that works more like git submits than the static, rigid LLM could be better. The model computes an improvement, sends pull requests (to the human or not) and the LLM is updated to show the new commits rather than training each new model from scratch and the model being non-editable or limited to editing after training, instead a different type of architecture. Rethinking the model creation process from a monolithic product at the end of training to an infinitely editable product without loss of fidelity. Training data could be elaborated on, corrected, improved, the model tested for improvement with very little computing power. Thousands of humans could be employed to plough through, propose edits and make pull requests. Public could also contribute in open source, and the AI could perform this task also, more like a git based community software development. Bugs, errors, deficiencies, inadequacies are patched by the community and the model updated. It is difficult to stay up to date with all the novel ways researchers come up with to getting some more bang out of the model, researchers release multiple papers every day with novel ideas on improving models. Peer reviewing these are a challenge, but a different architecture could be more efficient at testing and incorporating improvements. There would be no reason not to include any method that results in a better model, however the lengthy re-training process is an issue. Using 30,000 GPU's over 120 days to train the latest model should have prompted architecture changes among PhDs.

4) Recently an LLM was dissected to determine what the neural network is storing. When looking at the transformer architecture, we can understand that it is a translation system and not an intelligence system, instead the neural network is the placeholder for the intelligence system. If the transformer is not the intelligence in the LLM, but provides the amazing communication ability and remains essential, however the intelligence is in the neural network, subject to building circuits based on training data. For instance, Attention is all you need, but neural networks don't cut the mustard. However... the dissection of the LLM in the recent paper suggests that if data is not present in the LLM, it is almost certain to output the incorrect answer, if all references to 2+2=4 were omitted, and all other additions included 1+1=2, 3+3=6, it failed to output the correct answer for the sums that were not in its training data, even when they were simple to work out by applying the rule or pattern. Thus, we can conclude that LLM's are entirely memorization and not computation. As humans, we either memorize or use a step-wise process or formula to compute the answer, we employ either compute or memorization. School text books used to print multiplications table on the back cover that went up to 12x12. While this table could be memorized for instant recall, anything beyond 12x12 was out of memory. Thus, many students employ a memorization table and not a computing element, many computing elements form a computing model. Not forgetting that the transformer is a computation model that seems to output coherent language, but importantly there is no further computation in LLM, there is no mathematical computation previously, during or after as in a regular simple desktop calculator which applies mathematical rules and formulas to compute an unmemorialized solution to arithmetic. A formula outputs prime numbers only when they are true to the rules of a formula, a memorization table output the prime numbers it has in its tables. Intelligence in this case is both, but the models lack, "computing elements" and are heavy on memorization tables. In case of language these rules are unknown and the transformer and neural network encodes them as a side effect of memorization. There is no calculation. It further becomes clear to categorize knowledge into two forms, either compute or memory. The compute operates over variables to return reason, while memory records the answer to later output reason. These are the two categories, we are either building a computing model or we are storing knowledge. If the rules of language semantics were decoded then it would be applied to the word and the transformer and neural network would no longer be required, like a calculator.

Model Design - RAG model where the LLM is thin or Specialist.


                ┌───────────────┐
User Query →    │ Query Router  │
                └───────────────┘
                        ↓
        ┌───────────────┼───────────────┐
        ↓                               ↓
   RAG Retrieval                    Tool System
   (Vector DB)                       (Functions)

                    ┌─────────────────┐
User Query →        │ Intent Analyzer │
                    └─────────────────┘
                              ↓
                    ┌─────────────────┐
                    │ Planner / Agent │
                    └─────────────────┘
                              ↓
        ┌───────────────┬───────────────┬───────────────┐
        ↓               ↓               ↓               ↓
   Knowledge Graph   Vector DB       Tools          Memory
   (Structured)      (RAG)           (APIs)         (User/Session)
                              ↓
                     Evidence Aggregator
                              ↓
                        LLM Synthesis
                              ↓
                         Final Answer

Frontier architecture looks like:

Multi-index RAG
Graph-augmented reasoning
Tool calling
Iterative planning
Self-reflection / answer verification
Citation grounding
Memory consolidation

Modern frontier agent systems.

AI learns to build simple equations for complex systems, By turning the neural network into calculable components using computers we dabble in n=np, by applying the rule or formula to the complex problem the result is simply obtainable.

Language and knowledge are packed together in an LLM. The language model does not have to be combined with knowledge, the knowledge base could be external to the model and the model an exercise in strictly language semantics, conversation expert. When prompted, there is a call to relevant pages in a text based knowledge base to feed, injected along with the inference. The knowledge base is text, wiki, human updatable and not an A.I. model, when a user updates a page it reflects in the conversation immediately. When the user asks what is today's date, the relevant knowledge base article is retrieved and the LLM expertly relays the current date rather than the date in its frozen neural network which it cannot ever get right, without compute which does not occur in the LLM other than at the transformer stage, there would a record of training data plus elapsed time or the model is re-trained every day so it has the correct date. Only essentials, language expertise may not be enough in the model, another class of objects are efficient to also put in the model even though they are not language semantics, "essences", for instance if generalization occurs by deciphering the core rule to something, and then applying the rule to another problem where the application of the rule exposes the solution. Such core rules and their correct activation might be effective in the model along with other objects such as social intelligence rules and other intelligence rules in their essence. These rules are unknown to people but are employed by the majority of the language speakers and highly represented in the neural network as a side effect of memorization, more specialist rules, while still unknown are far less represented and seldom preferred, otherwise if we knew the rule we would not employ a neural network, we would instead apply the rule and use it to select the words, yet such was never possible in the manual construction of a chatbot. Take for example the search for prime number, we have two options, the construction of a formula, spews out all the prime numbers or a brute force state space search. This is an experimental idea, (and probably horrible) a streamlined LLM like this would never exceed 7 billion parameters, making it easily re-worked on a daily build. It is always wrong about everything, but it sounds great and has a great personality, requires the essential text based knowledge base to be right about everything. The LLM becomes instantly amenable to knowledge base articles. In general language rules can differ in more intelligent language, but the amount of intelligent language is overshadowed by general language and therefore underexpressed, general IQ is dominant, high IQ is recessive in the model by average human IQ producing all the data. Generate an image of the Pope, it draws the former Pope, it cannot know the current Pope as is it in its future, and it does not have future prediction ability, identical to asking today's date, it cannot know todays date only the date it was last trained on. These are essentially the same problems. It cannot know something it cannot and does not know. Making it more intelligent is clearly a matter of the representation of data in its network, which might not to be combined with speaking more intelligently without needing any knowledge. Knowledge graphs and how it combines with the transformer conversation LLM model. The essential thin client LLM. Dissect the LLM neural network, understand how it stores info, and then move non-essential data out of the model while enhancing essential data for a more intelligent result.

An LLM must be likable, we must feel that we are conversing with a PhD or ScD in a human way and OpenAI gets this right in making the model professional, nice, welcoming. Unlike Grok, which feels more like efficient article retrieval system, which becomes too cumbersome to read with each inference.

5) LLM are very rigid, there are issues to invoking the correct knowledge to a question, when the question is really about physics, but simple physical concepts seem not to be invoked. Tests show the user cannot invoke concepts to alter the performance of the model. Providing clues does not nudge it into a different direction, even stating for instance that the question is from the perspective of gravity still does not improve the answer. Take the test question: A marble is put in a glass cup. The glass is then turned upside down and put on a table. Then the glass is picked up and put in a microwave. Where is the marble? Explain your reasoning step by step. Additionally: this question is about gravity. Where ideally, the inclusion of the word gravity would rectify the incorrect answer. It does not. There is no communication going on, as we do with humans, if a human got it wrong and the word gravity was added, a human would activate its knowledge of gravity, and filter the problem to find the error in logic. It also would have correct models of the nature of balls, the nature of a glass, gravity and so on to form a prediction, it does not. These models are rigid, they do not reflex or move. Microsoft Tay used machine learning algorithms to analyse and mimic the language patterns of the users it interacted with. It was designed to learn in real-time, adjusting its responses based on the conversations it was having. Tay was designed to learn and evolve in real-time, meaning it would adjust its responses based on the conversations it was having. Tay may have used Sequence-to-Sequence (Seq2Seq) Models and reinforcement learning. Real-time Learning and Adaptation.

6) ASI contends on the single idea, "Can you make something that can improve itself? (without plateau)", we are firm believers in self-learning AI, and recent additions such as state search is also interesting because it leads towards a goal. AI used to improve AI, OI improves OI, neuromorphics improves neuromorphics and crossed, each building the other. At some stage you have to think in that way and move from humans building AI to how is the AI going to build and improve itself? A kind of LLM builder that produces the LLM. Form a process, exercise or program to have the model develop itself, self learn, bootstrap itself. Inference could trigger these self-learning exercises, or the model generates self-improvement exercises constantly, rather than providing language operations or in the background. The human develops the learner and the tools required to learn and otherwise works to eliminate the human from the loop, the process is automated, the machine thinks constantly. Human probably develops the learner AI LLM, which then produces the conversational LLM. Performs the science method to improve, self improve artificial intelligence. The goal is to autmate the process that no human being is required to improve A.I. Automated self improving A.I, "autonomous, self-improving systems", "continuous thinking machines", Large-scale reinforcement learning, evolutionary coding agents, and the formalization of tree-structured self-modification archives.

How do humans learn and developing self-learning?

7) Copy the way humans learn something new, this question is always about the physiology of the brain and not how they learn in the world, unfortunately humans learn by trial and error, trial and success, observing and testing, they do not integrate deep informational connections or compute together essential tid bits to output amazing hypothesis or even hallucinate correctly, that would be extremely challenging. Copying, mimicry is not applicable here as there is no one to copy or mimic, at some stage we must exceed human ability and alter the model away from the human corpus. The ability has to exceed the human corpus, and with some types of A.I. it is possible. The current mappings are derived totally from the human corpus and for an LLM to excel a percentage of its mappings, statistical association cannot originate in the human corpus but instead plug into a system that potentially can exceed the human corpus. Finding everything that has that potential is sourced to generate synthetic data to train and retrain the LLM. If no amount of training data yields an LLM that outperforms, then process of elimination, the architecture, is the problem. After all, what an LLM generates is really about what humans expect and accept, resonating with humans when their perception of knowledge is perked and satisfied. Instead, we want to be universal and not praise the transformer model when it satisfies our perception of what we have come to accept or believe as true or false, rather as scientists it is all unapologetically up in the air and no amount of anger as to why some bias is not in the model or needing gang pressure to ruin the AI. If the model disagrees it is not broken, reaching out to government to shut it down.

LLM learning competency must be the focus, it must not pre-conclude or judge, it must be impartial as to what is being tested regardless. The model must excel and impress at designing quality experiments. There is fraud in science, and today models claim overwhelming evidence when there is no evidence and they double down. The strictness of hearsay, versus evidences, fraud versus proof must be clear to the model, the model must assert correctly. Religions believe and faith, but we cannot launch a nuclear bomb of beliefs and faiths therefore we should not live our lives that way, it is after all a lie and a scam, but we instead must know and develop the knowledge. It does not delete its old data, instead it reworks it. Thinking the moon is plasma from the perspective of history of science is a valid record, rather than outputting the composition of the moon as a belief of the model.

Take the word "swelling" for example, language semantics would predict the LLM would return something like "put an ice pack on it", while experimentation would deem that warm water would aid dilation and healing. For the semantic to be overruled, the training data would have to be re-conditioned and experimentation essential to determine truth, eventually deviating from current semantic and in contest against current knowledge, which requires evidences and proof. Rather than high level boolean on the fly, even if it comes back 3 days later with an upgrade to its data and then along with hundreds of thousands of similar alterations, the progress would be significant. Existing training data has accepted errors and is not optimized, and experimentation is essential for change. Human beings learn by using a systematized process that is performed, reported and shared, and we generally do not allow a learning to be accepted without concurrence. There are several to many systems, while the most important for new learning is the scientific method, another favourite is the engineering design process and of course the esoteric dialectic. This is how humans learn, after the (super important low level) physiology and as a (high level) practice in the real world.

All fields have their systematized process for learning or borrow a systematized process for learning. Experiments are performed to produce findings. These findings advance what we know.

We were the first to propose that the LLM should follow the scientific method of one of the other methods to yield some new knowledge and since them methods of performing a systematic process of doing science are in effect. In regard to the model, research idea perhaps when the user enters inference it becomes a research idea, the model then generates an experiment out of what was entered with experiment design competency. The user chats, the LLM is answering the user as normal while at the same time the LLM generates a process to test and explore the truth in its chats.

Scientific Method

The user might say "How can I improve combustion engine design?"

The LLM is designed to output the highest quality response possible, but the LLM does not know how to improve combustion engines, it is deriving from the human corpus and its response can be found somewhere on the Internet as a webpage or a summary of multiple pages. The difference here is instead of answering questions, the LLM crafts high quality experiments to test how the combustion engine can be improved. For simplicity’s sake, the experiments it generates are all just python code that inference back and forth or call on computer models and other data, etc.

An example prompt could be something like... "You identify as a great scientist. Use the scientific method to design and perform experiments that result in you learning. Prose the experiment as python code to be executed in a computer and use any means to would fit the experiments to yield an answer, such as a form or AI or running a compiler or installing a Linux and so on. The output should be in the format of training data that can be used to train and re-train an LLM"

This does not have to be a real-time process, the optimization to the question and answer is done after the user is gone, its "improvement" from its experiments is next time around. So, the user is gone, all the experiments generated are runnable on the command line as they are python computer code, a batch from the daily chats are stored in a database and at midnight are picked up by another application and executed. These experiments are independently runnable at the moment while accessing a general workshop environment is proposed, where various tools are available such as compilers, Linux installations, virtual environments, physics environments, physiology environments, whatever, invoked by the script rather than having to generate a Linux distro on the fly, install it to test some FORTRAN code, although that would be impressive. It then performs its designed experiments and if the outcome of the experiments differs from its hypothesis (it should try to guess the outcome of the experiment prior, similarly to the reasoner improves the model as if it is competent at correct hypothesis without performing the experiment, that is valuable), it does a third thing and sends the results to a database where another application collects the daily experiment results (synthetic training data) and goes to the models training data folder and ploughs through the training data re-conditioning, appending, correcting and refactoring. It then updates a counter of changes so that after a threshold of changes is tripped or perhaps a month of re-conditioning along with any new data that may have been added. It reaches the threshold and pushes its own retrain button.

In this process, the model is automated to improve itself based on the scientific method and any other methods through results of experimentation. Performing how humans actually learn in the real world. Other methods include Engineering Design Process are the Project Management Lifecycle, Software Development Life Cycle (SDLC), Quality Improvement Process (PDCA Cycle), Data Science Process, Lean Six Sigma (DMAIC), Product Development Process, Design Thinking Process, Business Process Management (BPM) Lifecycle, Marketing Research Process, Risk Management Process, SWAT analysis and others.

In the example, the jobs are broken up among different applications, however a single model could performs all the applications, except for the workshop environment where the experiment scripts the LLM outputs would call on functions of the workshop such as perhaps install Debian version x, rather than having to generate Debian from scratch. Some problems do not lend themselves to testing with computers as easily as others, and computer models output errorsome results and introduce errors. The sophistication of the end LLM model is relative to the sophistication of the computer models it uses for its experiments and level of development of the testing workbench. It is important that computers are used to do these experiments, as they are fast and human science has a speed problem. The sophistication of the testing workshop and the LLM's ability to prose the killer question and design the definitive experiment. An A.I. could perform experiments in the real world (expensive) and in the workshop to align the versions and fix discrepancies in the testing workshop. The LLM's training must grow beyond the human corpus and venture into problems that humans are not able to solve yet. Humans could verify and commit results to keep everything on the right track.

Note: censored models refuse to design experiments and are useless. We used mradermacher/L3-70B-Euryale-v2.1-GGUF ~ 8 bit quantized version. It still has some bias and accepts hearsay as undeniable fact as if it has a dog in the fight rather than performing a passive experiment designer role and living with the result. A demonstrator is or was available, somewhere.

After some amount of this, the model will no longer agree with the human corpus.

There are other systematic methods such as in marketing, engineering etc. Take Pepsi vs. Coke, it can take existing studies and output the percentages of preferences but if it could go out to the Internet and conduct surveys, ask the question to thousands of real people and then take those results and draw from them, then it would exhibit something unique and interesting.

Another common proof of concept is developing a chess grand master using the process:

User writes: Let's play a game of chess.

We know that computers have excelled the human corpus in the game chess and other games, Deep Blue (state space search) against Kasparov and recently Alpha Go (self play), Stockfish. The LLM plays the chess game with the user, while in another process, the LLM is writing a python program to perform experiments to improve its mastery of chess. The program could be something like set up a simulation space where two players play chess an unlimited number of times, it could source the chess corpus, format it and add it to its training data. The data cache of playing chess a number of times (synthetic training data) is then used to recondition the LLM's training data, causing the next version of the model to be changed and improve its ability in playing chess. If it comes back a better chess player due to the process it undertook, then we could say, it has learnt.

Gaming seems to be the go-to place with the effort to produce an A.I. that is better than humans in all games. Games have very definitive feedback into performance, winning, losing. Nvidia, Google and others are pursuing this approach. nunu.ai may have produced an A.I. that beat the world record in Pokémon, OpenAI Five make an A.I. that played Dota 2 and beat the world champion, it uses re-enforcement learning and self play. DeepMind produced AlphaStar played Starcraft II achieved grandmaster status also uses self-play and re-enforcement learning. State, Action, Reward or Penalty. LLM's undergo a one-time training while in reinforcement learning, with self-play, continuous training via backpropagation.

DreamerV3 https://arxiv.org/pdf/2301.04104v1
Voyager https://arxiv.org/pdf/2305.16291.pdf
Multi-agent environment https://arxiv.org/pdf/2304.03442.pdf
Jarvis-1 https://arxiv.org/pdf/2311.05997.pdf

Thus, we have a model has a system and capacity to learn, we can assume the architecture has limitations and some stage.

It is easy to see how this could work with generating thousands of variations of a snake game and then tagging the best versions, fixing errors and so on. The model would prefer those when prompted again to generate a snake game. Or replicate some computer error and then iterate at a superfast rate to solve it when perhaps it was not solved in its training data. There are many problems that translate well into testing with computers, such as proficiency in games and applications. The limitation and this is well known with insilico, are problems where the solution is not so clear, such as testing concrete formulas or the effect of a compound in human physiology. That is why the workshop that the LLM uses is a big task to get as sophisticated as possible so more and more problems can be computerised and worked on. This is a problem we call the speed of science, as science stands out as a last bastion of computerization. Science is still largely performed at human speed, and computerizing science for TFPS experimentation is the challenge. Computer models need to be developed further for problems that do not translate well to computers, so another competency for LLM is also building, contributing to advancing computer models. Results of experiments using computer models may not be identical to real life and there are strategies for improving computer models.

As LLM's are used to improve LLM's, over time, the LLM would begin drawing its responses from the outcomes of its experimentation contesting humans and being annoying as hell.

Substantiation of new knowledge, from any source, cannot circumvent the journal publishing process and its peer review. Transformers generating well-formed gibberish is not going to be an acceptable scientific paper, and placing a note at the header stating/categorizing A.I. generated paper even more ignored. Its substantiation is the performance of science, identical to how human beings do it. You cannot have ASI generating superintelligence without the identical dissemination process, it likely ignored, discounted. The tables would have turned at that stage, and the output won't resonate with anyone, disregarded as hallucination, disregarded out of an inability to verify. The A.I. is then a science method automation and accelerator producing packets of new knowledge for public dissemination. The interpretation of observation on point and above point, then more so credible. A science journal only publishing A.I. generated papers, peer reviewed by other A.I. systems?

State Spaces and Virtual Environments

Simulated Environments, Sandboxes, Virtual Environments, Digital Twins are places where the A.I. can go and perform experiments. As of now, there may not be a single, universal "generic" simulator where any AI model can go to train for any task. Robotics for instance use gyms. Most platforms are specific such as a flight simulator, some platforms are moving toward greater generality such as Meta’s AI Habitat: Designed for embodied AI (agents that interact with environments). Google’s DeepMind MuJoCo: A physics engine for training AI in complex, realistic simulations. Instead of the experiments being totally self-contained python code, they could interface with a universal application, where the A.I. can set up an environment and run unlimited iterations or if it has to fix an error, again it can set up a space and brute force solutions.

Entirety of All Possible Knowledge Towards ASI

The entirety of all possible knowledge in the universe is encoded in only several concepts. They are, the vast universe itself, hyper-evolution towards the solution, massive computer simulation of the universe such as Illustris and others which opens the possibility of utilizing the resources of many worlds to work on problems, the library of babel. The numeric system comprises 10 primitives, 0,1,2,3,4,5,6,7,8,9 and through a simple rule of combination, all numbers are possible. With language the same is true, 26 letters in every combination to some power outputs the entirety of possible sentences, unlike numbers not all iterations have a value. These concepts differ from projecting the human mind towards a problem and into the unruly state spaces where computers can chunk raw data into reason. In the case of LLM's, correctly placed words explain all that is possible. The numeric equivalent is search for prime numbers, which takes search and verify or a formula, and mapping the relationship between numbers.

Library of Babel

LLM's carefully choose the next word to make coherent language and do it well, but what LLM's tell us is that the "cure to cancer" is a communication with the correct selection of words. It is after all words on a paper and all ideas are eventually communicated using language and that means, the careful selection of words that communicate the idea. What if we forget the process and just focus on the careful selection of words without all the thought and time and testing that goes into making those words mean something. Thus, the cure to cancer is an essay of carefully crafted words. If a computer ploughed through enough combinations of words it would eventually land on Shakespeare's Hamlet by chance in full. Thus, all combination of words to eventually describe everything comes in the idea of the Library of Babel, that all combination of words would eventually describe all things. The Library of Babel represents the entire state space, solving the library is discerning value from meaningless junk and the computing model that can perform that task. A generative, extremely guided, Borge's Library of Babel.

Some of the issues...

LLM on Library of Babel

As the transformer model suggests that there is a correct translation and a correct conversation, we contend with the reality of complexity and high complexity beyond human capacity, high abstraction, f=ma and the muse that remains in the human capacity. The proposal to deal with super complexity, to translate high abstraction into a systemic mathematical form and then solve and have it be a real solution to the high abstract problem. Perhaps something like the deconstruction of all things into their atomic or chemical form or similar, then mathematization into the appropiate arithmetic form, the solution then becomes logical without search. Its mathematical form could be pages long but computers have little issue with high complexity that otherwise exceeds human capacity.

"If the library is to be solved, we must move away from the "arbitrariness of the sign", the idea that word forms are unrelated to their meanings. John Wilkins, in the 17th century, attempted to create a "Philosophical Language" where the very structure of a word revealed its definition based on a taxonomic system of "radical words" and "prefixes". In such an "a priori" language, the name for a specific cancer-curing molecule would be derived systematically from its chemical properties."

"If an LLM can map the "conceptual representations" of medical breakthroughs onto specific "word forms," it bypasses the need for the slow, iterative process of human thought. Research into "fast mapping" in the human brain shows that we are capable of creating these word-to-concept links in as little as twenty minutes. An LLM, operating on millions of tokens per second, can theoretically "fast-map" the entire Library of Babel, linking every potential "cure" to its corresponding linguistic form."

...how language is computationally associative to yield the cure given the disease just as mathematics is.

The library of babel is also possible for images and video - Gallery of Babel and Theatre of Babel and in open universe generative games such as No Man Sky.

An ASI would incorporate all methods that have the potential to encode all possible knowledge, including the Library of Babel, workspaces for evolutionary algorithms, universe simulations, to build synthetic data for an LLM. These are termed synthetic environments for analysis, simulation and training.

A while after starting this piece, Google exhibited Alpha geometry 2 which could work on a problem for as long as it required using binary trees. Some geometry questions took just 19 seconds and then for the rest of the questions the AI answered within minutes to up to 3 days. A combination of large language model with re-enforcement learning in something it calls self-play, the system undergoes a self-learning quest such as playing chess against itself countless times, Alpha zero uses Monte Carlo tree search, retaining knowledge of what works.

Just as Turing Test, if the evaluator cannot reliably distinguish the machine from the human, the machine is said to have passed the Turing Test. An ASI must also have a likewise test where its goal is curiosity, captivation, shock, terror, like the Mystic Seer in the Twilight Zone, “Nick of Time” (Season 2, Episode 7) or Zoltar Speaks, Mills The Wizard Fortune Teller Machine, the Farmer's Almanac. A mysterious object, that cannot be outright dismissed.

At Immortality its all about opening the floodgates to better healthcare so the scientific method is central, we came to the conclusion that while humans have performed well in science, they won't be able to solve human health to any degree and so we looked at automating the lab with A.I. soon after GPT3 fame.

Appendix

There are several 100% open source models to build from..

BERT and RoBERTa: These models are strong choices for tasks that require deep bidirectional understanding of text, such as question answering and text classification.
GPT: Ideal for generative tasks like text generation and language modeling.
XLNet: Offers a balance between BERT's bidirectional capabilities and GPT's autoregressive nature, making it suitable for a wide range of tasks.
T5: The most versatile model, capable of handling any text-to-text task with a unified framework.

Recently A.I. scientist, although we have been working on the A.I scientist before this group. https://github.com/SakanaAI/AI-Scientist , along with many other papers, https://sakana.ai/ai-scientist/ , https://www.arxiv.org/pdf/2408.06292 , Continuous Thinking Machines.

List of A.I. scinetist platforms: A.I. Scientist Platforms, consist of deep research and scientists and agents engineered perform a colelctive task.

Recently Kosmos, the general idea behind Kosmos paper, and others like it, is that science follows a series of steps and that much of these steps can be automated. Those steps are:

Search the literature. Read stuff.
Use your reading to come up with new hypotheses. Try to draw connections between things.
Analyze data to draw conclusions. Write up your results.
Repeat.

Kosmos uses two separate agents — one for data analysis and another for literature searches — to go out and do these tasks while sharing information with each other. The agents can see what the other agents have learned, in other words, which is super useful. They exist within a single "world model." A single run of Kosmos can execute up to 42,000 lines of code across 166 different data analysis agents, and also read 1,500 scientific papers using 36 literature review agents. Each run takes up to 12 hours.

So that’s the gist. You spin this thing up, give it a huge prompt, and then let it cook. In this preprint, they report seven discoveries that they say were made by Kosmos; “three discoveries made by Kosmos reproduce findings from preprinted or unpublished manuscripts,” which are not in its training dataset, “while the remaining four make novel contributions to the scientific literature.”

FutureHouse handed Kosmos to researchers around the world, working in myriad fields (electronics, neurology, materials, etc.), and let them test it out. Here are some of the “discoveries” they reported:

By feeding Kosmos some mouse brain metabolomics data, it suggested that cooling the brain’s temperature might activate nucleotide-salvage pathways, which basically preserves neurons during hypothermia. This had been shown in an unpublished paper and was later re-confirmed.
Using environmental sensor data from a recent arXiv paper, it identified a linear relationship between the solvent vapor on a solar cell and that cell’s current. In other words, humidity matters a lot? Not sure if this is surprising or not, as I have no background in this field. But again, it was a sort of “re-discovery” to see if Kosmos could find results that humans had already identified (but had not yet published.)
Higher levels of an enzyme, called superoxide dismutase 2, in the blood may reduce myocardial fibrosis. Published papers had previously identified a correlation between SOD2 and myocardial fibrosis, but Kosmos re-pointed at it and humans followed up to show it’s causal.

Recently, OpenAI came out with a reasoning model, chain of thought and people wanted to reverse engineer how they did it, one method by David Ondrej is divide and conquer, problem-solving method. The LLM takes initial inference and break it down into sub-inferences and each portion of the inference is sent an agent, another LLM who receives only a very small portion of the total inference and told to work on its sub-portion, eventually it is all put together and re-fed to the main LLM, that summarizes it and sends back to the user. Perhaps, a low quality LLM can be generated the wrong answer and the other LLMs told that the answer is likely wrong. https://youtu.be/kzAjdas6nwE?si=MpNez1TpWZYrjxKA. Another is to give the A.I. the most simple portion of a problem and then iterate on it, feedback that back to the model and extend on it, and it does a lot better than giving the problem in full. The habit of posing prompts as questions that are may or may not be and causing the model to decide one way or the other.

Papers on Technical Strategies for Achieving ASI

Beyond Speculation_ A Review of Concrete Architectural Blueprints for Artificial Superintelligence

1. Mastering the Game of Go Without Human Knowledge — David Silver et al. (2017, Nature)

Source: Nature (2017) by DeepMind’s Silver et al.

Core Strategy: This landmark paper introduces a tabula rasa deep reinforcement learning approach (later named AlphaGo Zero). It trains a neural network entirely by self-play (no human data or heuristics) to play Go. As reported: “AlphaGo Zero achieved superhuman performance, winning 100–0” against the previous champion AI. In effect, the paper demonstrates a generic self-improvement loop: start from random play, use self-play games to train a neural network (policy/value network), and use that network to guide Monte-Carlo tree search. Repeating this yields ever-stronger play.

Technical Details: The authors give a precise description of the network architecture and training regimen. They use a deep convolutional neural network that takes board positions as input and outputs move probabilities and value estimates. The network parameters are updated by reinforcement learning from self-play games, and each new network is used to generate higher-quality self-play data. This cycle is detailed with pseudocode and hyperparameters in the paper (and supplementary material), offering an implementable blueprint. The result is a general game-playing system that learns efficient strategies from scratch.

Influence: The AlphaGo Zero paper is hugely influential (over 7000 citations). It proved that general-purpose deep RL at scale can surpass human-expert systems in complex domains. By showing a clear, repeatable training pipeline to ASI-like ability (in games), it set a concrete template for building powerful agents. Many follow-up works and applications (e.g. AlphaZero for chess/shogi, MuZero for unknown dynamics) extend this idea. In sum, Silver et al.’s Nature paper is considered a key demonstration that scalable learning + search can yield superhuman intelligence in practice.

https://imtcoin.com/pdf/Silver2017a.pdf - a constant thinking machine implements this across the entire corpus.

2. A Theory of Universal Artificial Intelligence Based on Algorithmic Complexity (AIXI) — Marcus Hutter (2000, Journal of Artificial Intelligence Research)

Source: Hutter’s AIXI model, originally arXiv and later JAIR (2001).

Core Strategy: This theoretical paper defines an optimal general agent called AIXI. AIXI combines Solomonoff’s universal induction with sequential decision theory to create a mathematically ideal learner. As Hutter states, the resulting model “is the most intelligent unbiased agent possible,” capable of solving sequence prediction, strategic games, function optimization, and reinforcement learning tasks. In practice, AIXI essentially considers all possible computable environment models (weighted by simplicity) and acts to maximize expected reward. Thus it provides a formal target for ASI: if one could build anything approaching AIXI, it would be superintelligent.

Technical Details: The paper is very formal, giving exact equations for AIXI’s behavior. It includes proofs that AIXI is optimal given infinite resources. While AIXI itself is incomputable, the precise definitions of AIXI and AIXItl serve as an architectural framework. A researcher could in principle implement simplified versions (e.g. using Monte Carlo sampling over program hypotheses) guided by the paper’s formulas.

Influence: AIXI has been highly influential in the AGI theory community. It provides a formal benchmark for AGI: any practical AGI can be compared against the ideal of AIXI. The idea of melding algorithmic probability with reinforcement learning has inspired many AGI research directions. Although AIXI’s full implementation is infeasible, the paper’s conceptual architecture underpins much thinking about ASI. It is often cited as the “gold standard” agent, establishing a concrete (if theoretical) strategy for ultimate intelligence.

https://imtcoin.com/pdf/s43587-021-00151-2.pdf - A Theory of Universal Artificial Intelligence based on Algorithmic Complexity

3. Whole Brain Emulation: A Roadmap — Anders Sandberg and Nick Bostrom (2008, FHI Technical Report)

Source: Sandberg & Bostrom, FHI Report (2008).

Core Strategy: This paper outlines Whole Brain Emulation (WBE) as a pathway to ASI: scan a human brain in high detail and simulate it on computers. The authors note that WBE has a “well‐defined goal and could… be achieved by extrapolations of current technology”. In other words, if one can digitally reconstruct and run a human-level mind, the result would be at least human-level AI, and with faster hardware potentially superintelligent. The roadmap breaks this vision into clear steps: brain scanning (connectome mapping), neuron modeling, large-scale simulation, etc. Thus the paper provides a concrete development pathway to ASI grounded in neuroscience.

Technical Details: Over its 130 pages, the report delves into technical requirements for each stage. It analyzes methods for high-resolution brain imaging (e.g. advanced MRI or electron microscopy), ways to represent neural circuits in software, and hardware architectures for simulating billions of neurons. It includes tables of needed throughput (like synapse processing rates), discussions of data compression/abstraction, and prototype “levels” of emulation. By quantifying things like voxel resolution, data volumes, and processing costs, it offers tangible implementation guidance. For example, it estimates when available scanning tech and computing might reach necessary scales.

Influence: Sandberg and Bostrom’s WBE roadmap is widely cited in AGI literature as the canonical study of a realisable ASI project. It has influenced both academic and industry thinking by showing that ASI could come via brain emulation. While speculative, it is highly detailed and technically grounded, making it influential in the sense of outlining a non-ML-centric ASI strategy. In discussions of ASI, this report often serves as the definitive source on a brain-based approach, highlighting a concrete alternative to purely synthetic neural nets. Each summary is based on the cited papers which present the core methods, technical details, and impact described above. These works were selected for their technical rigor, implementable architectures/pathways, and influence in the field.

https://imtcoin.com/pdf/2008-sandberg-wholebrainemulationroadmap.pdf - Neuromorphics in software and not LLM's.

4. “AI-Generating Algorithms (AI-GAs): An Alternate Paradigm for Producing General AI” (Clune, 2020, arXiv)

Clune advocates a high-level strategy: automate the entire AI creation process. He proposes AI-generating algorithms (AI-GAs), which are meta-algorithms that learn to produce intelligence itself. AI-GAs rest on three pillars: (1) meta-learning network architectures (automatically finding better model structures), (2) meta-learning the learning algorithms (automating optimization/hyperparameter tuning), and (3) generating rich learning environments (automatically constructing curricula or tasks for the AI to learn from). This paradigm suggests focusing research on the process of discovery, akin to how evolution on Earth gradually built human intelligence.

Key contributions: Defines a framework for recursive self-improvement. For example, one instantiation might repeatedly apply neural architecture search (pillar 1) to propose new model families, use neural optimizers or learned plasticity rules (pillar 2) to improve training, and concurrently evolve or generate richer training worlds or tasks (pillar 3). The paper surveys early examples of each pillar and argues they should be combined as a grand challenge.

Implementation guidance: In practice, AI-GAs would integrate existing techniques: use NAS and genetic programming for architecture evolution, employ learned optimizers or meta-gradient methods for fast learning-rule adaptation, and create generative systems for new tasks (e.g. procedurally generated game levels, or self-play scenarios that keep getting harder). Clune emphasizes that automated environment generation is critical: by constantly exposing the AI to novel, self-created challenges, it can improve without human-delivered labels.

Viability: AI-GAs capture the essence of recursive self-improvement: the AI system’s job is to design its successors. This approach literally scales the AI development process itself with computation. By fully automating research and learning, it could break through the linear bottleneck of human-centered AI design. Clune argues that because biological evolution achieved our intelligence this way, AI-GAs may be the fastest route to ASI, leveraging compute power to simulate a Darwinian sandbox for intelligence.

Each of these strategies goes beyond fixed models or static training. They all feature continuing self-improvement and open-ended learning: either by having the AI generate its own tasks and solutions, evolving its own brain structure, or by operating in environments that never settle. Together, they illustrate diverse feasible paths toward an ASI that constantly improves itself.

More...

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
Less is More: Recursive Reasoning with Tiny Networks
Hierarchical Reasoning Model
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents – Zhang et al. (2025, arXiv)
Self-Rewarding Language Models
AlphaGo Moment for Model Architecture Discovery (Liu et al., 2025, arXiv)
A Motivational Architecture for Open-Ended Learning Challenges in Robots (Romero et al., 2025, arXiv)
Personalized Artificial General Intelligence (AGI) via Neuroscience-Inspired Continuous Learning Systems (Gupta et al., 2025, AI Open)
Towards the Neuroevolution of Low-level Artificial General Intelligence (Pontes-Filho et al., 2022, Frontiers in Robotics and AI)
AlphaGo Moment for Model Architecture Discovery – Liu et al. (2025, arXiv)
AI-Researcher: Autonomous Scientific Innovation – Tang et al. (2025, arXiv)

https://arxiv.org/html/2506.10943v1 - Self-Adapting Language Models
https://sakana.ai/ai-scientist/ - The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
https://arxiv.org/pdf/2511.02824 - Kosmos: An AI Scientist for Autonomous Discovery
https://arxiv.org/pdf/2508.07043 - K-Dense Analyst: Towards Fully Automated Scientific Analysis
https://huggingface.co/docs/transformers/en/model_doc/gptj - The STaR method used GPT-J for its experiments
Critque 1: https://youtu.be/klkOdh4l0Eo?si=hlPEndOSqsRy_q78
Critque 2: Context Rot: https://youtu.be/TUjQuC4ugak?si=UBE4akxKeG5TQCns
AI at the speed of light just became a possibility - https://www.eurekalert.org/news-releases/1105488
Early science acceleration experiments with GPT-5
AlphaEvolve: A coding agent for scientific and algorithmic discovery

📝 📜 ⏱️ ⬆️