The next generation of basic models will destroy any data moat.
Next week, Grok, which Musk has been hesitating for a long time, will be launched.
During the period when OpenAI has been releasing explosive news one after another, it seems that this matter has not caused too much controversy.
However, the more low-key sniping, the higher the damage.
Specifically, Grok’s year-end raid concealed a “secret technique” that Musk used to deal with OpenAI.
The so-called “secret technique” is actually very simple, it is the constantly emerging real human data on the X platform.
In today’s world where big model data is becoming increasingly tight, even OpenAI itself is starting to directly train with user data.
In this situation, mastering X as a continuous “data fountain” undoubtedly provides a “trump card” for future turnovers.
But that being said, Grok is still a social platform, and most of the information on it is unverified and of varying quality. It can be used directly to train large models without fear of hallucinations?
Musk, who understands big models, must know this.
However, even so, the launch of Grok still has more advantages than disadvantages for Tesla and Musk.
Because if this move is not taken, Musk will at most be the “data feudal lord” of the separatist side in the future AI track, and will eventually face the fate of breaking the “moat”.
This future is likely to be the fate of many domestic enterprises in the future.
1 Data Feudal Owner
What is “data feudal lord”?
Simply put, it refers to small and medium-sized enterprises that dominate certain vertical industries and fields based on unique data.
This concept was initially proposed by Yanis Varufax, former Greek finance minister, and was a lament he expressed in European countries under the exploitation and exploitation of American technology companies.
Since the OpenAI developer conference in early November, this voice of striving to become a “data feudal lord” has been heard incessantly in the industry
After all, when the application layer is blocked by GPTs, unique data becomes the only advantage for many enterprises.
So, is Musk launching Grok to become a new generation of “data feudal lords” based on the data from the X platform?
The answer is negative. Because in the future, the fate of such “data feudal lords” is that their barriers will continue to be weakened with technological development, to the extent that they will be gradually eroded by stronger universal models.
In this regard, Musk’s Grok has demonstrated two ways to break through such “moats”.
One way is to place the model in a “data hub” position, so that the tentacles of the model can reach industries and fields that were previously out of reach.
Many people know that the United States does not have a comprehensive platform like China’s WeChat that integrates social, payment, entertainment, and information. This is not because Americans lack the ability, but because the major financial institutions in the United States are in a competitive relationship with technology giants.
Musk’s acquisition of X not only ended this “fragmentation” to some extent, but also provided the foundation for it to become a super app similar to WeChat.
If the X platform can truly become a global market centered around audio, video, messaging, payment/banking, while linking products, services, and opportunities. So at that time, Grok will become the center of this data hub, obtaining massive amounts of data from different industries, regions, or modalities.
In this way, Grok’s positioning is no longer limited to a social platform prank model, but has become a comprehensive interactive entry point connecting various fields.
With the passage of time, the synergistic effect of this comprehensive entry+large model will widen the gap between users who do not use Grok or do not use X and those who frequently use it.
In this situation, although data barriers in various vertical industries still exist, it is difficult for users to accept them without Grok.
So, these manipulated “data feudal lords” had to pledge allegiance to Grok in order to survive.
2 Synergistic effects
In addition to encroaching on various “digital enclaves” by occupying the data hub, another major way for big models like Grok to break through the data moat is through the form of end-to-end cloud collaboration, weaving a huge encirclement network.
Specifically, in the future, Tesla will provide computing power (Dojo), X and Tesla will provide training data, xAI will conduct model development, and the output model will ultimately be fed back to the X platform and Tesla’s products (cars, humanoid robots), forming an extremely strong triangular camp.
So, in the future where big models are gradually moving towards the end, how will such triangular camps break through each “data moat”?
We can use an example from the e-commerce industry for deduction.
Assuming that a company, based on a local e-commerce track, trains a proprietary large model using industry-specific data, and Musk’s Grok intends to invade this field, it is likely to adopt a “cross domain” tactic during the data collection phase.
Specifically, Tesla collects traffic, geographic, and user behavior data during the driving process; The environmental and operational data collected by robots in scenarios such as homes and factories; And the social information on the X platform provides Musk’s team with rich sources of information.
When Musk’s team integrates these data together, they may discover some new patterns and associations, thereby weakening the unique advantages of this proprietary big model to some extent.
The core concept of this tactic is that there is no industry or field in this world that exists completely in isolation.
In addition to peripheral offensives, such triangular camps can also disintegrate isolated “data moats” through a combination of vertical and horizontal forces.
Simply put, in the face of the trend of gradually lateralizing large models, Musk’s triangular camp provides an end-to-end solution.
It covers the entire process of data collection, processing, training, and deployment from different sources. This means that enterprises do not need to search for different technology and service providers at various stages, thereby reducing implementation difficulty and cost.
Simplified processes help enterprises apply AI technology faster and improve data processing and analysis capabilities.
In this situation, a considerable number of enterprises may decide to sacrifice the uniqueness of data and join the ecosystem of the triangular camp in exchange for higher AI deployment efficiency.
This logic is essentially the same as in the era of mobile internet, where many businesses, even if enduring high commission rates, still have to enter the platform in exchange for lower customer acquisition costs.
Under such siege, isolated “data castles” will eventually find it difficult to withstand the growing Grok.
3.The Road to AGI
Faced with Grok’s potential and aggressive offensive, where will AI companies that want to rely on data go?
Before answering this question, there is a more important one, which is:
Is this adherence to the “data moat” really the right direction?
Previously, in Sequoia Capital’s summary report “The Second Act of Generative AI”, there was a passage:
“The ‘data moat is untenable’: The data generated by application companies has not created insurmountable moats, and the next generation of basic models is likely to destroy any data moats built by startups. On the contrary, workflows and user networks seem to be creating more sustainable competitive advantages.”
So, will the future really be like what Sequoia said: “The next generation of basic models will destroy any data moat?”?
At least from a technical perspective, this possibility exists.
Previously, when discussing the Q * project leaked by OpenAI, NVIDIA’s senior AI scientist Jim Fan discussed the issue of synthetic data with Musk and LeCun on Twitter.
Jim Fan believes that using computer-generated (synthesized) data can provide the next high-quality dataset of tens of trillions. The only problem is to find ways to ensure the sustained high quality and diversity of data.
LeCun, one of the three great godfathers of AI, said, “Animals and humans can quickly become very intelligent with just a small amount of training data. I believe that the new architecture can learn as efficiently as animals and humans.”
Overall, Jim Fan and LeCun represent two different approaches to solving data problems.
One approach is to solve it by synthesizing data; Another approach is to develop new architectures (such as the world model) that allow models to use very little data and draw inferences.
But regardless of the quality of the plan, these technological ideas represent a collective will in the academic community to break the “data limitation”.
Similarly, from the perspective of users, people are more willing to see a larger model that is more versatile and proficient in more tasks, rather than having to switch between different models for every different scenario.
When a technological direction becomes the common will of scientists and the people, its implementation is only a matter of time.
From this perspective, the so-called “data barriers” will eventually disappear.
Many internet companies nowadays conduct business based on user behavior data and models. If users encounter larger models with stronger integration capabilities, many of their previous businesses and functions (such as listening to music) may become plugins, eliminating data barriers.
In this stage of transition to AGI, the truly promising teams should be those that can explore core competitive advantages beyond the “data barrier”.
As Yang Zhilin, CEO of the Dark Side of the Moon, said: Different organizations give rise to different cultures, and cultures give rise to different systems, which in turn give rise to different outcomes.
In the situation where the development of technology and data tends to be gradual, the development paradigm, system, and concept, these soft and abstract factors, have become the key to victory.
And these factors beyond “data” are also the greatest source of human nature in the AI era.