
Dealing with vast sets of data takes the ability to scale things on demand.
Developing any high-level content management architecture in the age of big data has uncovered hidden challenges, ones that engineers and program designers never could have anticipated in previous years. While our computer storage and processing capacities continue to expand, the sheer volume of data being produced has led to many computing issues relating to scale.
The von Neumann bottleneck
Even the most sophisticated CMS available on the market cannot hope to handle every single piece of content being churned out. For standard personal computers, this leads to what experts have dubbed the von Neumann bottleneck. As Astoria Software CEO Michael Rosinski recently discussed with The Content Wrangler's Scott Abel, this refers to a common latency issue occurring when discrete processing tasks are completed linearly, one at a time.
The von Neumann bottleneck is easy to understand. Even though processors have increased in speed and computer memory has increased in density, data transfer speeds between the CPU and RAM have not increased. This means that even the most powerful CPU spends as much as 75% of its time waiting for data to load from RAM. CPU designer Jonathan Kang goes even further, claiming that this architecture model results in "wasteful" transfers and implying that the data need not be retrieved from RAM in the first place.
"To scale effectively, implementing predictive algorithms is necessary."
Predictive and intelligent
The solution, as Mr. Kang sees it, is to associate data with the instructions for its use. In that way, as the instructions move into the CPU, the data pertaining to those instructions moves into the CPU cache, or is accessible through an alternate addressing channel designed specifically for data.
Another approach, and one more amenable to large sets of data, is to preprocess the data as it is ingested. We recognize that not all content will be of high value in a particular set – most of it may in fact be of relatively low value – so the ability to approach content management with a sense of data relevance allows programmers to apply CPU and RAM resources to the data with the highest value.
This is at the core of intelligent computing – and intelligent content. Since existing hardware architectures contain limits a computer's ability to transfer data, there is much to be gained by creating data ingestion programs that are able to mimic a human's ability to determine data relevance and recognize content insights – techniques that overcome latency by quickly and efficiently pinpointing only the data we need.
"Every time a piece of intelligent content is processed, the machine 'learns.'"
Content versus computing
Fully intelligent computing (essentially a form of AI) remains elusive, but within the realm of content management, great strides are being made every day. One of the biggest innovations is the changing of approach from placing the full burden on computing to integrating "intelligent" features into the content itself. With the more complicated architecture content languages like XML and DITA, we can add extensive semantic context, aiding the CMS by tagging and flagging data relevance. Every time a piece of intelligent content is processed, the machine "learns" patterns and uses these patterns help deal with issues of scale.
Over time, the combination of machine learning and structured, intelligent content will lead to faster, more accurate decision-making and the ability to keep up with a constant influx of new data. It can connect multiple data sets, recognizing common metadata tags across platforms, devices and channels and making data aggregation easier. This will have an immense impact on all industries, from retail to customer service to education to medicine.