Parsing the difference between HTML, XHTML and HTML5

Astoria’s support for the DITA Open Toolkit allows users to transform their DITA-style XML content into various permutations of HTML, the cornerstone technology for creating web pages known formally as HyperText Markup Language. While HTML has gone through many permutations and evolutions, the three main variations for web markup – HTML, XHTML and HTML5 – are all currently in use by developers. The following is a quick summary of the history and distinctions that give each language its own character and capabilities.

There are the three related languages for web markup – HTML, XHTML and HTML5.

HTML
The first internet markup language, HTML is the basis for every subsequent web design language. HTML’s enduring utility is its simplicity: a small set of elements to describe the structure and content of a web page. Layout and appearance rely on the more advanced capabilities of JavaScript, CSS or Flash to make a site more interactive. However, this can often lead to frustration for designers, since these more dynamic elements are difficult to construct natively in HTML.

XHTML
The Extensible Hypertext Markup Language, XHTML, began as a reformulation of HTML 4.01 using XML 1.0.  XHTML was designed to allow for greater cross-browser compatibility and the construction of more dynamic, detailed sites. As XHTML evolved, it lost most of its compatibility with HTML 4.01.  Today, XHTML is relegated to specialized projects where HTML output is not desired.

HTML5
As the latest permutation of HTML, HTML5 is a combination of three families of code: HTML, CSS, and JavaScript. It is significantly more versatile than previous HTML iterations, and it enjoys much more support than XHTML. Its cross-platform capabilities and native integration of what were once third-party plugin features (e.g., drawing, video playback, and drag-and-drop effects, etc.) make it a favorite of web designers.

IRS turning to XML to handle tax exemption forms

XML is not just an encoding format for technical documentation.  In a move that touches on XML's original purpose, United States Internal Revenue Service Form 990 – detailing tax-exempt organizations' financial information – is shedding its paper-based roots to go digital as its native format. The IRS announced that Form 990 Form will now be available in the machine-readable XML format.

"This will have an impact on the speed and efficiency of requests."

"The publicly available information on the Form 990 series is vital to those interested in the tax-exempt community," IRS Commissioner John Koskinen wrote in a statement regarding the transition, as quoted by AccountingWEB. "The IRS appreciates the feedback we've received from a variety of outside partners as we've worked together to explore improvements to make this data more easily accessible."

With more than 60 percent of Forms 990 filed electronically, according to FCW, the move to making data – with relevant redactions – available in a native machine-readable format is intuitive. Covered forms include electronically filed Form 990, Form 990-EZ and Form 990-PF from 2011 to the present.

"The IRS' move is a very good thing," Hudson Hollister, executive director of the pro-transparency Data Coalition, told FCW. "There is no reason why public information that the government already collects in a machine-readable format can't be published in that same format!"

Some industry experts, like The Sunlight Foundation's Alex Howard, emphasize that requesting and obtaining information still remains difficult for the public and that improvement in accessibility should be an ongoing focus, given the fact that public requests for non-profit or tax-exempt organizations' filing information is commonplace.  Nevertheless, the IRS's announcement will presumably have a tangible impact on the speed and efficiency of compliance with requests. It will also offer benefits to backend integration of IRS Form 990 data into XML-based content management systems, such as Astoria.

Understanding and utilizing adaptive content modeling

The Astoria Portal gives end-users the ability to interact with content managed by the Astoria Content Management System.  The Astoria Portal is a web site customized to match the client's expectations for user experience.  Not long ago, discussions about site design and user experience would involve arcane terminology and technical jargon.  Today, the world of web architecture has become increasing democratic, so that discussions about Astoria Portal use terms and concepts that are increasingly common knowledge. With the increased emphasis on user friendly web interfaces, the average consumer may have a solid grasp on the concept of "responsive design." Yet adaptive web design – and its sibling, adaptive content – remains a relatively unknown aspect of new technology.

In truth, adaptive design is one of the most important drivers of innovation within the world of content management. Rather than simply offering the ability to flip between mobile and desktop optimization, adaptive design allows for content to be reconfigured at will, taking the burden off designers as the focus shifts to more impactful content.

"Adaptive content is completely flexible on the back end."

What's the difference between responsive and adaptive?

While there is overlap between the concepts of responsive and adaptive design, "responsive" suggests design that fluctuates between a fixed number of outcomes and focus on fluid grids and scaling. With adaptive design, the possibilities are virtually limitless. This is enabled by a fundamentally modular approach to content and data, allowing for a completely device-agnostic content model.

"[Web site] CMS tools have largely been built on a page model, not on data types," Aaron Gustafson, coiner of the phrase "adaptive web design," told CMSWire. "We need to be thinking more modularly about content. We need to design properties of content types rather than how it's designed."

Embracing omnichannel 
What does this mean? Adaptive content is completely flexible on the back end, with a solid model able to publish across an infinite number of channels – a highly desirable ability. This is because most enterprises are fundamentally operating in an omnichannel world already, with the final hurdle being effective personalization. And what's the end-game of adaptive? SES Magazine recently found that eCommerce sites featuring personalized content were able to increase conversion by up to 70 percent

With responsive design, some content may be reconfigured in response to the device it is viewed on, but in general the content is static. Adaptive design allows for new levels of personalization, with machine learning giving an enterprise the ability to analyze a user's habits and taste and present customized, on-demand content matching their preferences. The key isn't just content that looks different – it's content that is different, depending on the device and the viewer. This is a valuable capability since device usage itself implies different behavioral patterns. 

"The key isn't just content that LOOKS different – it's content that IS different."

A 'multi-year journey'
The issue for many enterprises in embracing adaptive content, of course, is implementation. Not all portal systems driven by an XML content management system can easily convert from a static content management system to a dynamic one.

"For many organizations, especially those in business-to-business or those with large, complex or regulated content sets, implementation will be a multi-year journey, with many iterations and evolutions along the way," wrote Noz Urbina in Content Marketing Institute. "Organizations struggle to transform themselves to keep pace with communications options and customer demand. Delivering major changes in two years might mean having gotten started two years ago."

The major push of conversation is the ability to create data hierarchies that can exist independent from the eventual design functions. Rather than creating content with the end in mind, this requires strong hypertext conversion and structuring, as well as the integration of analytics and content building apps. Yet in committing to this conversion, the possibilities of how content is presented and its effectiveness could be limitless.

Essential vocabulary: Transclusion

Transclusion is one of the foundational concepts of DITA. Coined by hypertext pioneer Ted Nelson, the term "transclusion" refers to the inclusion of part or all of an electronic document into one or more other documents by hypertext reference.

"Transclusion allows content to be reused far more efficiently."

The concept of transclusion took form in Mr. Nelson's 1965 description of hypertext. However, widespread understanding of transclusion was limited by the slow adoption of markup languages, including Structured Generalized Markup Language (whose origins date to the 1960's), Hypertext Markup Language (released in 1993), and eXtensible Markup Language (released in 1996).  In fact, it wasn't until DITA, an XML vocabulary donated to the open-source community in 2004 by IBM, that the power of transclusion enjoyed broader reception.

Transclusion differs from traditional referencing. According to The Content Wrangler's Eliot Kimber, traditional content had "…to be reused through the problematic copy-and-paste method." With transclusion, a hyperlink inserts content by reference at the point where the hyperlink is placed. Robert Glushko adds, "Transclusion is usually performed when the referencing document is displayed, and is normally automatic and transparent to the end user." In other words, the result of transclusion appears to be a single integrated document, although its parts were assembled on-the-fly from various separate sources.

"In the information management sense, transclusion makes content easy to track, removes redundant information, eliminates errors, and so on," writes Kimber. "Use-by-reference serves the creators and managers of content by allowing a single instance to be used in multiple places and by maintaining an explicit link between the reused content and all of the places it is used, which supports better tracking and management."

Transclusion is not without its limitations. It's rarely used in web pages, where the processing of transclusion links can become cumbersome or can fail when the page is displayed.  For that reason, people writing content for the Web "…do the processing in the authoring environment and deliver the HTML content with the references already resolved. However, transclusion, which doesn't rely directly on metadata is superior to conditional preprocessing when working with content that has a large number of variations.

Content gets predictive with analytics

Within the world of customer engagement, predictive analytics have revolutionized the ability for enterprises to match materials with the audiences most suited to appreciate them. In regards to content, this has traditionally meant creating channels based on customer profiles and then funneling content to the appropriate market. Now, with the increasing sophistication of analytic algorithms – combined with a component-based, hypertext approach to content creation that XML vocabularies such as DITA enable – content can be configured on demand, customized to match the consumer profile.

"Content can now be configured on demand, customized to match the consumer profile."

Descriptive versus predictive

The key to this evolution is the transition from descriptive to predictive and on to prescriptive marketing. In the traditional customer profile, hindsight is 20/20 in the eyes of the marketer: existing materials are evaluated based on how they did previously. This is a simple descriptive process, but it limits the ability to better match content to customer needs except by small evolutions or by accident.

With predictive analytics, marketers can build "a fluid and multi-dimensional map of prospect interests," according to Ilan Mintz, Marketing Coordinator at Penguin Strategies. Mintz describes how predictive content analytics aggregates data within the pieces a user reads, then builds and catalogs a topic composite akin to a word cloud. From there, the composite is tied to the profile of that user or user group.

Mintz claims that this approach to content marketing allows for a graphical view of content-related interest. This in turn facilitates new insights, such as:

  • Content personalization.
  • Competitor analysis.
  • Anticipation of trends.
  • Lead nurturing and tracking/predicting sales cycles.

In all, Mintz points to the increased ability to target audiences with personalized content as the return on investment in content data analysis. 

More content than ever
This has given new power to marketers and content authors, particularly in an environment that is already awash in materials. Digital content is at a higher premium than ever according to the Content Marketing Institute, with 70 percent of B2B content marketers in 2014 saying they created more content than the previous year, with no end in sight for the trend. However, increased volumes isn't a meaningful measure of success for marketers. The impact of the content, which in and of itself is defined by the goals of those who run lines of business, must be measured and interpreted – and, according to Tjeerd Brenninkmeijer and Arjé Cahn, co-founders of Hippo, engagement metrics are a "notoriously fluffy" and increasingly unhelpful way to appraise successful content. 

"Over the next two years, predictive content analytics will provide smart businesses a means of gaining better insight into customer's interactions with content," the Hippo co-founders told CMSWire. "And by equipping their marketers with better access to analytics and more decision-making power, businesses will reap the benefits."

Through a deepened understanding of the role that predictive analytics can play in modern content marketing, authors and marketers have more effective tools to affect customer engagement.

Is XML too ‘verbose’?

As one of the most important and comprehensive languages for encoding text, XML has enjoyed great popularity as the basis for the creation of structured writing techniques and technologies. Yet even with continuous refinement and the broad adoption of data models like DITA for authoring and publishing, XML poses major challenges, especially when compared to so-called plain-text-formatting languages such as Markdown.  Many, like The Content Wrangler’s Mark Baker, are criticizing XML for its perceived limitations.

“XML’s complexity makes it hard to author native content.”

A ‘verbose’ language
In a post bluntly titled “Why XML Sucks,” Baker says that, while performing a vital function as the basis for structured writing systems, XML’s tagging – which he says makes XML “verbose” – inhibits author productivity.

“If you write in raw XML you are constantly having to type opening and closing tags, and even if your [XML] editor [application] helps you, you still have to think about tags all the time, even when just typing ordinary text structures like paragraphs and lists,” said Baker.

“And when you read, all of those tags get in the way of easily scanning or reading the text. Of course, the people you are writing for are not expected to read the raw XML, but as a writer, you have to read what you wrote.”

The absence of absence
Baker hangs a lantern on the issue of whitespace. He cites the original purpose of XML (“XML was designed as a data transport layer for the Web. It was supposed to replace HTML and perform the function now performed by JSON. It was for machines to talk to machines…”) as the reason why whitespace has no meaning in XML.

And what’s the big deal about whitespace? Says Baker, “…in actual writing, whitespace is the basic building block of structure. Hitting return to create a new paragraph is an ingrained behavior in all writers….”

He goes on.  “This failure [of XML] to use whitespace to mean what whitespace means in ordinary documents is a major contributor to the verbosity of XML markup.  It is why we need so many elements for ordinary text structures and why we need end tags for everything.”

“XML performs a vital function…”

No ambiguity
While all this talk of verbosity and whitespace may seem fairly damning to the future of XML, the truth is that it serves a fundamental purpose, a “vital function,” as Baker puts it, that lots of people use and which contributes to its longevity. As one Charles Gordon of NetSilicon said in 2001, XML is “…a tool that concisely and unambiguously defines the format of data records.”

The “unambiguous” aspect is particularly important. While Baker may lament the loss of readability when viewing XML-encoded content in its raw form, the fact that XML requires authors to make conscious decisions about the structure of what they’re writing – even the placement of whitespace – makes every line purposeful. XML is ideal for communicating with unambiguous intent, which is the precise purpose of structured writing systems and rule-based content architecture. Raw XML is indeed verbose, but its general simplicity has made it a building block of so many improvements in technical communication that its use endures and even flourishes to this day.

3 possible pitfalls in a content management system

Building a streamlined curation process allows owners, authors, editors and even users to fully engage with content in a manner best suited to each person’s needs. However, even the best laid plans will go awry if oversight responsibilities are murky or absent. The problem simply gets worse as the volume of data-rich hypertext content increases. Curators must maintain thorough reviewing processes to verify that the underlying data is of value. Here are just a few of flaws that inhibit effective content curation.

“Underpinning each pitfall is a missing aspect of oversight.”

Unclear ownership
As a foundation for establishing acceptability, authority, and proper editing privileges, a  robust curation strategy requires a system that maintains ownership of content on a granular level. Otherwise, the system gives rise to “orphaned” content; i.e., content exposed to an editorial gap (because it has no owner) that can result in inaccuracies.

Lack of coherent review stage
While it may seem obvious that a review stage is needed for effective content curation, a significant issue is where review should occur. Should individual data components be subject to review and approval? How much should generated content be subject to peer review if those reviewing it have similar editing privileges as the author? The placement of review stages in your curation processes is the essence of content “management” at every phase of content lifecycle.

Doesn’t consider design and formatting
Content curation programs put extensive thought into information architecture but comparatively less attention to the end-user’s experience with the content. This can lead to the selection of a component content management system (CCMS) that does everything expected of it while producing content that is fundamentally not user-friendly. Unless the CCMS integrates end-user presentation into its operating capability, even the most complex CCMS can miss the mark.

Putting together a content lifecycle strategy

For marketers, creating compelling content that connects with the intended audience is the main push of their daily work. But once this content is created, what happens next? How will it be disseminated, redeveloped and warehoused for future use?

“Marketers who have developed a strong content lifecycle have a leg up.”

Content: No longer disposable
Marketers who have developed a strong content lifecycle have a leg up when it comes to managing their materials and potentially reusing it for later campaigns. Columnist Robert Norris recommends the development of lifecycles to help craft content that resonates with different groups of customers and can remain effective across a variety of channels. To do this, he advocates moving away from treating content as a disposable material and toward viewing content as a living, evolving entity worthy of attention and careful consideration.

“Critically, we realize that these audiences have very specific needs for which we have the expertise—if not yet the processes —to craft and maintain targeted knowledge base resources,” Norris writes in The Content Wrangler. “Moreover, we recognize that the task of creating and publishing these resources must receive the same diligent attention to detail that we apply to our goods and services because poor publishing reflects upon our credibility just as harmfully as does a poor product or service.”

The content lifecycle
To ensure that content reaches its full potential, Norris proposes a lifecycle based on constant evaluation and redevelopment. The steps he puts forth include:

  • Production – Where content is developed, based on existing data components.
  • Approval – Content is reviewed and vetted by editors and administrators before being slated for release.
  • Publish – Content is configured and fully optimized for a publishing platform, as well as made discoverable by adding meta-data and setting prominence.
  • Curate – Ancillary resources are integrated into the content.
  • Improve – Feedback, telemetry and analytics are used identify and address successful aspects as well as deficiencies in the content. Once identified, the content is tweaked to address these pain points.
  • Re-certify – An often missed step, data used in content must be reverified periodically to ensure it is still relevant and accurate based on more recent findings.
  • Update – Aside from recertification, consideration of timeliness and cultural relevancy can warrant changes from minor updates to major revisions.
  • Retire – Once a piece of content has reached the end of its relevancy, archiving it is warranted. Make sure the content and its metadata are tagged for ease in locating it later.

With a hypertext-based content paradigm like DITA, this lifecycle is made even simpler by being able to evaluate and repurpose content on an XML component level. Analytics can show the efficacy of a single data element, and automation driven by content tagging can streamline campaign variations to audience segments to gauge impact. From there, each element of the lifecycles is a chance to refresh and swap metadata into more compelling content.

Defining and implementing ‘Transcreation’

Creating effective globalized content is much more than simply translating text. The context in which text exists forces, in many ways, the creation of new content that has meaning only within that context; the greater the number of contexts, the greater the number of translation-induced content changes.

Translation-induced content creation, or "transcreation" forms a crucial part of your localized content strategy. Let's examine transcreation in greater detail and see how it factors into your fully-realized content strategy.

"What exactly is transcreation?"

Defining the cultural lines 
Transcreation improves on word-to-word translation through a top-down focus that gives greater value to content meaning and navigation, and which harnesses cultural norms to convey ideas in a way that avoids the pitfalls of word-to-word translation.  As such, the mechanism at its core is conceptual rather than literal.

Why this emphasis on concepts versus specific materials or language? Because some linguistic structures for conveying ideas do not function across regional lines. Cultural interpretations can vary for even the most essential building blocks of language, as pointed out in a study published by the American Psychological Association. Examining the way that different regional and cultural groups interpret facial expressions, lead researcher Rachael E. Jack commented that "East Asians and Western Caucasians differ in terms of the features they think constitute an angry face or a happy face."

"Our findings highlight the importance of understanding cultural differences in communication, which is particularly relevant in our increasingly connected world," Jack told the APA. "We hope that our work will facilitate clearer channels of communication between diverse cultures and help promote the understanding of cultural differences within society."

Breaking it down to build back up
This underscores the importance of transcreation.  In our quest to convey content's "true intent" and not be stymied by cultural differences, we must break it down to its component parts and reassemble it locally so as to create the most compelling and clear messaging. For example, look at Coca-Cola's most recent company slogan, "Taste the Feeling". As a global brand, that slogan will be translated into any number of languages.

"Transcreation takes content as written and breaks it into component parts."

Now consider the problem of establishing that slogan in a locale that does not emphasize "feelings" or that considers it shameful to express an excess of emotion. In this context, a direct translation, which would render the equivalent of the words "Taste" and "Feeling", would not reach its audience with the conceptual meaning that Coke triggers a visceral, joyful response.

Transcreation takes a different approach.  Creators and content managers first break the content into its component parts. Next, they tag the content with signifiers, allowing the data to be parsed into components that are pertinent to the target culture. The tagged content is fed to localized content management teams in the form of a creative brief. These teams, complete with their own culturally tagged data, reconfigure the basic content building blocks into new – yet derivative – content. 

Can automation break down cultural barriers?
While transcreation is already being used to great effect in companies worldwide, it is largely a manual process.  Nevertheless, automated transcreation is on the horizon.  Smart websites already use localization parameters to reconfigure formatting elements, swap images and insert culturally specific elements. Research teams are tackling the problem of cultural computing with sophisticated algorithms that may one day emulate human thought patterns, allowing automated transcreation to be a seamless and instantaneous process.

How DAM impacts content management

Content management is all about the on-demand assembly and reconfiguration of information modules into new products – either autonomously or under human supervision. In a world that accepts the notion of "fair use" of copyrighted material, it should be relatively easy to repurpose information modules, the only limitations being those of limited imagination (machine-driven or otherwise) or of limited technical capability. It is ironic, then, that the rise of regulations and commerce tied to authorship should have a complicating impact on CMS development.

"DAM handles one granular aspect of content – authorship."

This is where Digital Asset Management (DAM) influences the world of content management tools. DAM handles one granular aspect of content – authorship – and concerns itself primarily with enforcing copyright protection. A DAM system functions by tracking the use of copyrighted material and flagging improper, unauthorized or unattributed use. A digital media asset is entered into the DAMS in the form of a high-resolution "essence" along with detailed metadata about the asset.

From there the DAMS can be used to pull logged materials as needed and identify uses of the asset, flagging a violation of copyright – or, as a secondary function, ensuring that the copyright owner is compensated for the authorized use of the asset. This can be a crucial revenue stream for authors and copyright owners, though it may also become complicated once an asset has been combined as a module into other pieces of information.

When DAM and Content Management are combined, the CM system has a broader scope than a DAMS and largely does the heavy lifting of content assembly.  A content creator working in a CMS can pull digital material from a DAMS. A content curator may choose to push finished content from a CMS to a DAMS.