Is a Document Management System a half measure?

While the direction of this blog is forward-looking, it is instructive at times to consider the history of technologies and techniques.  One such is document management, and its predecessor, electronic document imaging, both of which are precursors to modern content management.  This is not to say that document management is dead as a technology or a solution; in fact, in some operational circles document management is very much alive and useful.

The earliest document management systems addressed the problem of paper proliferation.  "Electronic document imaging systems" combined document scanning with database-driven storage, indexing, and retrieval to form libraries of what were once reams of paper files. "Document management" became a solution in its own right as vendors added support for digital file formats generated by word processors, spreadsheets, and other office-productivity products. The descendants of those earlier systems are the document-based enterprise content management systems of today, such as Microsoft SharePoint, OpenText Documentum, and Hyland OnBase.

When is DMS useful?
One question to consider: is a document management system (DMS) relevant in the modern world of digital content management? At its core, a DMS knows nothing about the information within a document; that is, users don't link to content within a document managed in a DMS. Instead, users tag whole documents and link one whole document to another whole document; the DMS simply maintains the inter-document links.  Hence, in the context of digitized content, a DMS is something of a half-measure because each document under management exists as a static element.

"DMS is closer to DAM rather than CMS."

This may be sufficient for some organizations and in some applications. If the document itself is significant – either supplementary to or alongside the data it contains – then a DMS represents what could be a supremely useful permutation of content management. For instance, it's one thing to have a database containing the collected works of William Shakespeare intricately tagged and linked via hypertext. It's an entirely different concern, though, to digitize a specific document written in Shakespeare's own hand.

In a way, a DMS is closer in function to that of a digital asset management system rather than that of a content management system, especially in its ability to protect and preserve the original form of a document. A DMS can also be a very low-cost solution given the dozens of open-source document management solutions available today. Enterprises looking to achieve organization and clarity when dealing with large physical archives of documents may choose from a wide variety of free and fee-based DMS solutions. Using existing hardware and software like cloud computing, scanners and simple image editing and management software, an enterprise can digitize its documents without having to build or acquire a more complicated CMS.

The limits of DMS
However, by leaning on a DMS, enterprises may find themselves running up against the lack of sophistication innate to the software. Since the tagged data is essentially referential to the document itself, it is easy to miss valuable insight contained within the document. Documents cannot easily be interrelated with similar content or data recombined into something new.

Enterprises have found value in linking digital asset management with content management, so it's likely that a DMS working in conjunction with CMS is the ideal solution. If the physical document itself – or at least the visual representation of it – is of value, the ability to tag and separate the data within the document while still preserving it in a static form will lead to a more agile, comprehensive information.

The importance of simplicity in content languages

Everyone agrees: When designing a content or markup language, simple is better. Yet as intuitive as this may seem, the development arc of technology runs counter to this imperative – always evolving in terms of complexity. If we are looking to get more done with content, why do we want a relatively unsophisticated language to assist us?

Building blocks of complexity
First, it helps to understand the rationale for the evolving complexity. As technological capabilities expand – combined with users coming of age with expanded capabilities – innovation naturally pushes the boundaries of current content languages, particularly as we find ourselves needing to express and support more complex and dynamic content. The companies at the fore of innovation have subsequently made developing new programing languages to support expanding infrastructure an imperative. This has led to a boom in different programming languages of varying levels of complexity, particular according to Viral Shah, one of the creators of the programming language Julia

"Lightweight languages have endured for years."

"Big tech companies tend to have their own programming languages — Go at Google, Hack at Facebook, Swift at Apple, Java at Oracle [sic; Sun developed Java for different reasons], C# at Microsoft, or Rust at Mozilla," Shah told VentureBeat. "If you think about it, this makes sense: Software is the core competency of traditional tech companies — they can afford to have their legions of professional programmers use 'hard' languages like C++ and Java, which are great for performance and deployment, but less good for exploration and prototyping."

What Shah is pointing out is one of the key principles that makes lightweight markup languages crucial: While these companies have the capabilities to design their own languages that suit development needs, at the core of each are less sophisticated languages. In this respect Java and C++ are the building blocks supporting increased complexity.

The lasting power of markup
Similarly, when it comes to content – including tagging and metadata – lightweight languages have endured for years alongside more complex, proprietary solutions. Something like Markdown has been a favorite of bloggers, web writers and editors, developers, academics, technical writers and scientists looking for simple ways to translate simple text into HTML and XML, acting essentially as shorthand.

"Years ago, I started coding websites with HTML and then structuring documentation with XML, but Markdown allows me to use plain text for similar purposes," Carlos Evia, Ph.D., director of professional and technical writing and associate professor of technical communication in the Department of English and Center for Human-Computer Interaction at Virginia Tech told The Content Wrangler. "My Markdown files can become HTML and XML deliverables with one or two lines of commands or a few keystrokes."

Lightweight markup languages like Markdown thrive on their simplicity, bringing with them built-in constraints. As such, they area rarely the be-all, end-all for content creators and instead act as a vital component in a more sophisticated authoring tool chain. But this is the key to its staying power: With no end in sight for innovation and development of new languages, being able to author content in a simple language allows that content to be more portable a future iteration. Rather than having to parse artifacts of an outmoded language when transferring in older content, with simpler languages, the content remains relatively "pure" and thus more easily repurposed.

"The constraints of markup languages are one of their virtues."

Constrained, yet free
Mark Baker, writing for Every Page Is Page One, points out that the constraints of markup languages are one of their primary virtues. He points out these constraints essentially translate into a style guide, limiting the possibility for errors or deviations from house style. He also points out that simpler languages can interface with software and algorithms more easily, supporting automation and creating more naturalistic content.

"Every markup language has at least one program to process it and turn it into output (at a minimum, HTML)," Baker writes. "Those programs work because they know the constraints of the language. They know all the structures that are allowed to exist in the content, and all the combinations they are allowed to exist in, and they know how to format each of them."

He goes on to outline how this can extend well beyond formatting into API documentation, allowing for more sophisticated source tracking, combining sources into a single reference entry, error checks and validating the written content to make sure it conforms with the actual function definitions in the code.

Which brings us back to the main point: a relatively unsophisticated language with known constraints leaves authors free to create more compelling and dynamic content.  It is informal proof of the mantra that "simple is better."

Who ‘owns’ content design features?

Content is ever-changing. This is both its greatest virtue and the most significant challenge for designers. In the pursuit of even more intelligent and efficient user interfaces, CMS vendors are tasked with constantly redesigning their software to accommodate innovations in content format and design.

In the pursuit of a CMS that will successfully manage new forms of optimized content, there is one major obstacle that stands in the way of innovation: the provenance of content throughout its lifecycle.  With the rise of cross-platform giants like Amazon and Google, content is now being repurposed, reinterpreted or filtered through any number of proprietary formats, any one of which allows a company to stake an ownership claim. But where is the the line between content that can exist safely and comfortably within an multi-platform ecosystem and content that can be designated "property"?

The changing definition of content
The challenge of defining what counts as "proprietary" content lies in defining content in our modern data economy. If you are a user investigating a certain product on an ecommerce platform like Amazon, you will encounter a product description possibly submitted by the manufacturer or drafted by an author at Amazon itself. It is nearly impossible to trace authorship and ownership of the content since it will have been repurposed many times across a variety of platforms whenever you search for the product. This content may also be repurposed to appear in different formats: Written word turns into spoken audio which can in turn be captured on film. If all this different content is connected to the product and is the same copy, can it truly be considered different – or the same – content?

"Experts suggest that the definition of content should be expanded."

This has led to experts within the content and CMS design community to suggest we move away from the traditional definition of content as "copy produced by a single author," embracing instead a broader definition outside of where it occurs and its format.

"We need to shift our definition of content to be what the user needs right now," says Jared Spool, founder of User Interface Engineering. "It has nothing to do with how it's produced or where it lives on the server. If the user needs it, it's content."

Can you 'own' a need?
In this regard, Spool identifies content as the solution to an operational problem. Creating it comes down to identifying a need and producing something that satiates the need. This, however, becomes complicated once you introduce the idea of commercial platforms producing and managing content to meet the demands of their customers.

"If we want content seen as a business solution to a problem, we need to change expectations around what it is and what it is supposed to do," wrote AHA Media Group's Ahava Leibtag. Leibtag points to the obligations that organizations have, not only to produce and disseminate content, but to protect branding and control what it considers "proprietary."

Companies cannot patent an identified consumer "need", and the infrastructure relating to the pursuit of original content authorship privileges above all else simply doesn't exist in a robust form. Yet what organizations can do is develop proprietary design features. These features essentially act as a lens for content to be viewed: The basic content would exist outside the reach of patent, but the design features that can be woven into the overall platform interface could be copyrighted. Much like the way Microsoft and Apple of a generation ago sought to protect the look-n-feel of their respective products, modern companies can use formatting and user behavior as a mechanism for protect their proprietary interests over data that they did not create.

Most frequently used content creation and editing tools

In the world of content creation and editing, tracking the tools used across the entire industry can be tricky. For example, tools at one enterprise that facilitate collaboration while preserving authorship may be less valuable to another enterprise that needs integrated formatting and the ability to embed rich media.

In its exploration of content creation and management trends in 2016, the Center for Information-Development Management issued a survey to 328 individuals across the entire content creation spectrum. Writers, managers, information architects, content strategists, editors and a small contingent of IT support, customer services and publishers were represented – with the overwhelming majority of respondents representing computer software companies.

The survey sought to answer a few basic questions: What tools do you use to create and manage content? What kind of content do you most frequently develop? How will this content be published in years to come?

Tools of the trade 
As one might imagine, DITA played a significant role in content creation across all respondents. Roughly 74 percent of those surveyed reported using some kind of DITA-capable XML Editor as their primary content creation tool, far exceeding other tools. Following that, 66 percent reportedly used Madcap Flare, 53 percent Unstructured Adobe FrameMaker, 43 percent Adobe InDesign and finally, at 38 percent, Microsoft Word.

Microsoft Word's fall from preeminence for content creators is somewhat predictable. Content experts across the industry have been predicting the end of generic simple document creators, with many saying that basic text-to-HTML conversion tools like Markdown will render Word virtually obsolete among professional content creators.

One of the more fascinating insights in this data is the role native HTML authorship plays in content creation: While few survey respondents (25 percent) claimed an HTML editor as a primary tool, it was overwhelmingly the favorite secondary tool across all categories – coming in at a total of 52 percent. From this, we can extrapolate that content creators:

  • Are shifting away from creating HTML first/only content.
  • Still require HTML editing tools to fully leverage content production and publishing.

Where is content being published?
This seems to follow data insights related to falling use of HTML-based delivery: While still the preferred means of publishing for almost 75 percent of survey respondents, mobile is coming up rapidly – albeit with content creators seemingly confused as to how to fully leverage it.

"We were interested to learn how organizations are approaching publishing to mobile devices, since we advocate designing content differently for mobile devices," the authors of the CIDM survey stated. "Fully 38 percent report that their content is the same on all devices. Some publish more content on mobile devices (only 4 percent); more publishing less content (24 percent)."

The one not mentioned: Localization
This points to the fact that mobile content creation tools are still not being considered separately to traditional content creation. One facet not mentioned in the CIDM survey is localization. Yet this seems to ignore one of the fundamental tenants of the mobile experience: that localized UX is a crucial element for consumer engagement and must be taken into account in the creation of specific content. Tech.Co emphasizes that, for mobile experiences related to e-commerce, localization tools beyond simply translation are key as well.

"If you're doing this, be sure to use widely accepted localization packages or hire an expert to work on the content for you as there will be nuances across languages that even Google Translate doesn't quite get yet," wrote Tech.Co's Joe Liebkind. While mainstream content creators may be focused on issues related to format conversions, the greater topic of authoring content for diverse audiences seems to be underrepresented.

Style Guides: Internal or external?

The endless capabilities of an open-standard XML vocabulary like DITA means that you can design and automate the creation of content modules with minimal loss of usefulness across different platforms and applications.

However, cleverly implemented content automation does not necessary imply good content marketing. In fact, it's through content marketing that an enterprise, business, organization or brand bridges the gap between data and consumer. In other words, there must be rules to configure data that meets specific brand, linguistic and cultural guidelines so that people want to read what is produced. This is where the style guide comes into play.

"Content management goes hand and hand with content marketing."

Where did authors traditionally encounter style guides?
Style guides have long served as the way we achieve commonality and consistency from a particular institution. Two long-standing guides, the AP Stylebook and the Chicago Manual of Style, have been in heavy use by editors and journalists since the 1950s. Some brands have developed their own variations along the way to distinguish themselves in the market and ensure readability across different audience demographics.

With manual authorship, the interaction with style guides is simple: An author writes a piece of content with the style guide in mind. Editors may verify or adjust style guide usage, but the process still remains relatively contained to authorship.

In this way, style guides are primarily an internal tool, put into action by the person creating the content. Yet in the age of automation – where content can be created, reconfigured and and managed without ever being touched by human hands – where should style guides live?

CMS style guide implementation: Internal versus external
With CMS that enables automated content creation, there are essentially two scenarios where a style guide can be put in place.

The first is the code of the content generation module itself. The data module creates new content automatically and formats it according the style guide, which has been implemented algorithmically within the module. In other words, the style guide is woven into the content creation software, a process that might be termed "internal" implementation. Internal implementations are relatively simple to install and activate at the expense of complexity to the CMS architecture – effectively making it less agile and leaving room for configuration errors.

The second scenario is implementing a style guide outside of the data-creation module. This is a more "external" process and is akin to the traditional copy-editing function. It could involve the work of a human editor/author reading "copy" with the style guide in mind, or it could be an automated, rule-driven package applied in a "secondary" content configuration process. The human approach limits the complexity of the CMS at the expense of low-cost scalability.

Creating your style guide
Choosing which of these two implementation styles is right for your organization and CMS design is something that depends entirely on your resources and needs. However, one way to gauge the best approach is to explore what makes up your style guide.

The three basic elements of a style guide are:

  • Content attributes
  • Tone and voice
  • Rules

Content attributes typically involve the basic building blocks of the content, i.e. what data you will be feeding into the CMS. From there, tone and voice tie closest to the core of hypertext and data tagging. If your voice, for instance, is casual yet authoritative, having data tagged according to these descriptions can help guide the way content is subsequently assembled. Finally, rules dictate the parameters of the content – what phrases or structures must be avoided.

Through exploring the full scope of your style guide, you can more clearly see whether or not it can be integrated into CMS architecture without causing future issues.

Parsing the difference between HTML, XHTML and HTML5

Astoria’s support for the DITA Open Toolkit allows users to transform their DITA-style XML content into various permutations of HTML, the cornerstone technology for creating web pages known formally as HyperText Markup Language. While HTML has gone through many permutations and evolutions, the three main variations for web markup – HTML, XHTML and HTML5 – are all currently in use by developers. The following is a quick summary of the history and distinctions that give each language its own character and capabilities.

There are the three related languages for web markup – HTML, XHTML and HTML5.

The first internet markup language, HTML is the basis for every subsequent web design language. HTML’s enduring utility is its simplicity: a small set of elements to describe the structure and content of a web page. Layout and appearance rely on the more advanced capabilities of JavaScript, CSS or Flash to make a site more interactive. However, this can often lead to frustration for designers, since these more dynamic elements are difficult to construct natively in HTML.

The Extensible Hypertext Markup Language, XHTML, began as a reformulation of HTML 4.01 using XML 1.0.  XHTML was designed to allow for greater cross-browser compatibility and the construction of more dynamic, detailed sites. As XHTML evolved, it lost most of its compatibility with HTML 4.01.  Today, XHTML is relegated to specialized projects where HTML output is not desired.

As the latest permutation of HTML, HTML5 is a combination of three families of code: HTML, CSS, and JavaScript. It is significantly more versatile than previous HTML iterations, and it enjoys much more support than XHTML. Its cross-platform capabilities and native integration of what were once third-party plugin features (e.g., drawing, video playback, and drag-and-drop effects, etc.) make it a favorite of web designers.

IRS turning to XML to handle tax exemption forms

XML is not just an encoding format for technical documentation.  In a move that touches on XML's original purpose, United States Internal Revenue Service Form 990 – detailing tax-exempt organizations' financial information – is shedding its paper-based roots to go digital as its native format. The IRS announced that Form 990 Form will now be available in the machine-readable XML format.

"This will have an impact on the speed and efficiency of requests."

"The publicly available information on the Form 990 series is vital to those interested in the tax-exempt community," IRS Commissioner John Koskinen wrote in a statement regarding the transition, as quoted by AccountingWEB. "The IRS appreciates the feedback we've received from a variety of outside partners as we've worked together to explore improvements to make this data more easily accessible."

With more than 60 percent of Forms 990 filed electronically, according to FCW, the move to making data – with relevant redactions – available in a native machine-readable format is intuitive. Covered forms include electronically filed Form 990, Form 990-EZ and Form 990-PF from 2011 to the present.

"The IRS' move is a very good thing," Hudson Hollister, executive director of the pro-transparency Data Coalition, told FCW. "There is no reason why public information that the government already collects in a machine-readable format can't be published in that same format!"

Some industry experts, like The Sunlight Foundation's Alex Howard, emphasize that requesting and obtaining information still remains difficult for the public and that improvement in accessibility should be an ongoing focus, given the fact that public requests for non-profit or tax-exempt organizations' filing information is commonplace.  Nevertheless, the IRS's announcement will presumably have a tangible impact on the speed and efficiency of compliance with requests. It will also offer benefits to backend integration of IRS Form 990 data into XML-based content management systems, such as Astoria.

Understanding and utilizing adaptive content modeling

The Astoria Portal gives end-users the ability to interact with content managed by the Astoria Content Management System.  The Astoria Portal is a web site customized to match the client's expectations for user experience.  Not long ago, discussions about site design and user experience would involve arcane terminology and technical jargon.  Today, the world of web architecture has become increasing democratic, so that discussions about Astoria Portal use terms and concepts that are increasingly common knowledge. With the increased emphasis on user friendly web interfaces, the average consumer may have a solid grasp on the concept of "responsive design." Yet adaptive web design – and its sibling, adaptive content – remains a relatively unknown aspect of new technology.

In truth, adaptive design is one of the most important drivers of innovation within the world of content management. Rather than simply offering the ability to flip between mobile and desktop optimization, adaptive design allows for content to be reconfigured at will, taking the burden off designers as the focus shifts to more impactful content.

"Adaptive content is completely flexible on the back end."

What's the difference between responsive and adaptive?

While there is overlap between the concepts of responsive and adaptive design, "responsive" suggests design that fluctuates between a fixed number of outcomes and focus on fluid grids and scaling. With adaptive design, the possibilities are virtually limitless. This is enabled by a fundamentally modular approach to content and data, allowing for a completely device-agnostic content model.

"[Web site] CMS tools have largely been built on a page model, not on data types," Aaron Gustafson, coiner of the phrase "adaptive web design," told CMSWire. "We need to be thinking more modularly about content. We need to design properties of content types rather than how it's designed."

Embracing omnichannel 
What does this mean? Adaptive content is completely flexible on the back end, with a solid model able to publish across an infinite number of channels – a highly desirable ability. This is because most enterprises are fundamentally operating in an omnichannel world already, with the final hurdle being effective personalization. And what's the end-game of adaptive? SES Magazine recently found that eCommerce sites featuring personalized content were able to increase conversion by up to 70 percent

With responsive design, some content may be reconfigured in response to the device it is viewed on, but in general the content is static. Adaptive design allows for new levels of personalization, with machine learning giving an enterprise the ability to analyze a user's habits and taste and present customized, on-demand content matching their preferences. The key isn't just content that looks different – it's content that is different, depending on the device and the viewer. This is a valuable capability since device usage itself implies different behavioral patterns. 

"The key isn't just content that LOOKS different – it's content that IS different."

A 'multi-year journey'
The issue for many enterprises in embracing adaptive content, of course, is implementation. Not all portal systems driven by an XML content management system can easily convert from a static content management system to a dynamic one.

"For many organizations, especially those in business-to-business or those with large, complex or regulated content sets, implementation will be a multi-year journey, with many iterations and evolutions along the way," wrote Noz Urbina in Content Marketing Institute. "Organizations struggle to transform themselves to keep pace with communications options and customer demand. Delivering major changes in two years might mean having gotten started two years ago."

The major push of conversation is the ability to create data hierarchies that can exist independent from the eventual design functions. Rather than creating content with the end in mind, this requires strong hypertext conversion and structuring, as well as the integration of analytics and content building apps. Yet in committing to this conversion, the possibilities of how content is presented and its effectiveness could be limitless.

Essential vocabulary: Transclusion

Transclusion is one of the foundational concepts of DITA. Coined by hypertext pioneer Ted Nelson, the term "transclusion" refers to the inclusion of part or all of an electronic document into one or more other documents by hypertext reference.

"Transclusion allows content to be reused far more efficiently."

The concept of transclusion took form in Mr. Nelson's 1965 description of hypertext. However, widespread understanding of transclusion was limited by the slow adoption of markup languages, including Structured Generalized Markup Language (whose origins date to the 1960's), Hypertext Markup Language (released in 1993), and eXtensible Markup Language (released in 1996).  In fact, it wasn't until DITA, an XML vocabulary donated to the open-source community in 2004 by IBM, that the power of transclusion enjoyed broader reception.

Transclusion differs from traditional referencing. According to The Content Wrangler's Eliot Kimber, traditional content had "…to be reused through the problematic copy-and-paste method." With transclusion, a hyperlink inserts content by reference at the point where the hyperlink is placed. Robert Glushko adds, "Transclusion is usually performed when the referencing document is displayed, and is normally automatic and transparent to the end user." In other words, the result of transclusion appears to be a single integrated document, although its parts were assembled on-the-fly from various separate sources.

"In the information management sense, transclusion makes content easy to track, removes redundant information, eliminates errors, and so on," writes Kimber. "Use-by-reference serves the creators and managers of content by allowing a single instance to be used in multiple places and by maintaining an explicit link between the reused content and all of the places it is used, which supports better tracking and management."

Transclusion is not without its limitations. It's rarely used in web pages, where the processing of transclusion links can become cumbersome or can fail when the page is displayed.  For that reason, people writing content for the Web "…do the processing in the authoring environment and deliver the HTML content with the references already resolved. However, transclusion, which doesn't rely directly on metadata is superior to conditional preprocessing when working with content that has a large number of variations.

Content gets predictive with analytics

Within the world of customer engagement, predictive analytics have revolutionized the ability for enterprises to match materials with the audiences most suited to appreciate them. In regards to content, this has traditionally meant creating channels based on customer profiles and then funneling content to the appropriate market. Now, with the increasing sophistication of analytic algorithms – combined with a component-based, hypertext approach to content creation that XML vocabularies such as DITA enable – content can be configured on demand, customized to match the consumer profile.

"Content can now be configured on demand, customized to match the consumer profile."

Descriptive versus predictive

The key to this evolution is the transition from descriptive to predictive and on to prescriptive marketing. In the traditional customer profile, hindsight is 20/20 in the eyes of the marketer: existing materials are evaluated based on how they did previously. This is a simple descriptive process, but it limits the ability to better match content to customer needs except by small evolutions or by accident.

With predictive analytics, marketers can build "a fluid and multi-dimensional map of prospect interests," according to Ilan Mintz, Marketing Coordinator at Penguin Strategies. Mintz describes how predictive content analytics aggregates data within the pieces a user reads, then builds and catalogs a topic composite akin to a word cloud. From there, the composite is tied to the profile of that user or user group.

Mintz claims that this approach to content marketing allows for a graphical view of content-related interest. This in turn facilitates new insights, such as:

  • Content personalization.
  • Competitor analysis.
  • Anticipation of trends.
  • Lead nurturing and tracking/predicting sales cycles.

In all, Mintz points to the increased ability to target audiences with personalized content as the return on investment in content data analysis. 

More content than ever
This has given new power to marketers and content authors, particularly in an environment that is already awash in materials. Digital content is at a higher premium than ever according to the Content Marketing Institute, with 70 percent of B2B content marketers in 2014 saying they created more content than the previous year, with no end in sight for the trend. However, increased volumes isn't a meaningful measure of success for marketers. The impact of the content, which in and of itself is defined by the goals of those who run lines of business, must be measured and interpreted – and, according to Tjeerd Brenninkmeijer and Arjé Cahn, co-founders of Hippo, engagement metrics are a "notoriously fluffy" and increasingly unhelpful way to appraise successful content. 

"Over the next two years, predictive content analytics will provide smart businesses a means of gaining better insight into customer's interactions with content," the Hippo co-founders told CMSWire. "And by equipping their marketers with better access to analytics and more decision-making power, businesses will reap the benefits."

Through a deepened understanding of the role that predictive analytics can play in modern content marketing, authors and marketers have more effective tools to affect customer engagement.