The Rationale for Using Metadata in Learning Content

What is metadata? (Types of metadata)

Metadata is a standardized set of variables and their values that tell us information about information.  It is usually part of an asset (video, audio, HTML file, audio file) and or a set of assets (files) such as a course.  There are many standards for metadata such as Dublin Core.  There are a few standards for learning metadata. LMRI is an extension of the Dublin Core and the base definitions.

Why do we need metadata?

We need to be able to know what our digital assets contain.  Without metadata it is much more difficult to manage and reuse digital assets.  This leads to duplicate effort as well as wasted time and money.  It is important however, that a component content management system (CCMS) of some kind is employed so that a library of reusable assets can be found during course creation. Without a CCMS, content reuse is much harder, if not impossible.

Additionally, metadata makes content more discoverable.  In the case of site search, instead of relying solely on the title of a course or course description, the metadata of files in a course could be searched for keywords in order to find courses that contain specific skills.  The return of the search would be much richer and allow the learner to choose to consume only part of a course.

Content goes stale, especially with the constant releases of Agile/Scrum methodology using in software development for SaaS applications.  Metadata and a component content management system could aid in keeping track of and announcing what content has gone stale so that it can be reviewed and replaced.

And finally, metadata will allow a content development team to map skills and proficiency levels to competencies and competencies to courses.  This will be necessary for Competency Based learning.  A curated learning path will start with a Competency Dictionary that includes a list of skills and proficiencies. Proficiencies map to difficulty level of courseware such as Beginner, Intermediate and Advanced. (Numbering courses is US academic setting centric not a global standard.)

Modular Content

One of the effects of employing metadata is to inform the way that content is created.  Learners sometimes need only some of the information in a course related to a task they need to perform or to fill gaps in their overall understanding of systems.  If content is created in a modular way, creating one learning piece that maps to a particular skill or concept, the learners can consume the content as Just-In-Time (JIT) training.

When is metadata needed?

Metadata is needed throughout the content life cycle.

At content creation

When content is created the metadata is created.  This allows the CCMS to read the metadata and inform the content developer of what is available for reuse.

At content aggregation

If the content is created in a modular way (one learning objective mapped to one skill) then the metadata should contain the learning objective and the skill. When content is being aggregated into a course, metadata is read by the end user of the CCMS in order to decide whether or not the content is useful to the course.

At content search

When a learner searches an LMS for content, they will be able to decide if a course meets their need for obtaining new skills.  Google, Bing and other search engines read inline metadata. Additionally, use of metadata facilitates the creation of an interface for learners to decide how they want an LMS course catalog to be displayed.  Take this one step further, and you have the foundation of a course recommendation engine (Cortana, Bing Predicts, Machine Learning anyone?) that matches a learner’s already gained knowledge with the delta they are attempting to fulfill.

At content delivery and curation

As already stated, assets with metadata could be presented alone, outside of a course if JIT is wanted by the learner rather than taking an entire course.  Additionally, an asset (such as a video) can be presented when needed as part of a help system contained in an application such as any SaaS application that keeps track of how long the user spends attempting an activity.  Content creators have used RoboHelp in the past to achieve this goal.

At content evaluation

Ideally, a content development team is using ADDIE or SAM to review the freshness of their content.  Metadata can be used to keep track of versioning, product release dates, or cycles.  Again, if content is encapsulated and modular, it can be archived and replaced more easily.

Where is metadata?

Metadata is found at different levels of courseware.  It is at the asset level, unit or module level, chapter or section level, and course level.

By the way, some people call a section a module, some people call a lesson a module and some people call a single interactive activity a module.  It usually depends on whether or not they wrote textbooks or classroom material in a previous life. I am used to calling a single interactive activity a module which can also mean a lesson. It’s going to be important to have terms mean the same thing to all the people.

An asset in this case is an HTML file, an image, a SWF, in other words, the most granular piece of content. Asset metadata is can be found in the header of scripted documents such as HTML or it’s found by “reading” a compiled or rendered digital asset such as a video, audio or swf file.   It is not recommended that metadata in compiled or rendered assets be kept in a place decoupled from the asset itself such as in a database because if an asset is moved from one repository to another, the metadata will be lost.

HTML and other plain text documents

Metadata in plain text documents is usually found in the document header and can expressed with <meta> tags.  These are easily readable by content management systems and have traditionally been written using Dublin Core.  Metadata is also expressed as microdata in child objects of the DOM.

Compiled or rendered documents

Metadata is usually added to compiled or rendered assets on or after compilation or rendering depending on the tool used to create it.  For example, in the case of video creation, metadata can be added to a video file using tools such as Adobe Bridge after it has been rendered.

The content management system will then need to be able to read the metadata of the file.  There are open source, command line tools that can be used for reading metadata from these files such as ExifTool.  This tool can also add metadata to assets such as GIF, JPG, MP4, or PDF files.

Sample Metadata Dictionary

The following is a table of the learning metadata that a learning organization would need in order to be able to keep track of the content. Every line in the table describes a required metadata item.

The dictionary is based on the CreativeWork schema, as base definitions and the Learning Resource Metadata Initiative (LRMI) extensions.  Additionally, the IEEE Learning Object Metadata (LOM) standard has been consulted.  It is assumed that all Dublin Core metadata would be used where appropriate. A couple of the Dublin Core items have been called out specifically.  By using already established schemas, a learning organization would be in alignment with what the rest of the learning industry does with metadata.  In 2014, the LMRI schema passed stewardship of the schema to the well-known Dublin Core organization. By the way, Google, Bing, Yahoo, and Yandex contributed to the LRMI schema.

Competency Based Metadata

Since corporate learning organizations typically adhere to a Competency Based model, it follows that metadata should include competency, skill and proficiency on each object in the courseware.  However, the LRMI standard does not account for competency based learning, although many people in the learning industry have asked for it.  These are extra metadata items in the metadata dictionary. They map to a competency dictionary created for a role.

In the table, Standard, Dublin Core and LRMI items have been bolded in the table.  Extra metadata is not bolded.

Extra Metadata

  • UID – The UID has been included in the list as structured content usually relies on this identifier. It is not necessarily required for asset level objects.  However, it does exist for content created by the Open edX Studio.
  • competency – the overall competency of the course, and the competency that a skill fulfills.
  • skill – most closely aligned with a learning objective, it is the measurable outcome of the learning object (Bloom’s taxonomy).
  • proficiency – the level of learning offered, suggested values are beginner, intermediate, advanced
  • modifiedBy – last editor of courseware
  • replaces – the object that this object replaces.  For historical purposes.
  • tags – additional random information.  Included here because it is part of the MVA metadata and could be valuable.
  • prerequisites – at the course level only.  Content developers will have to pay close attention to this one, unless the LMS has functionality for associating one course as a prerequisite to another.
  • systemRequirements – included because it is part of MVA metadata and could be valuable.

Metadata Dictionary

Property Expected Type/Origin Description
name The title of the resource.
about (about property) The subject of the content.
description (description property) A description of the item.
dateCreated The date on which the resource was created.
dateModified The date on which the CreativeWork was most recently modified or when the item’s entry was modified within a DataFeed.
datePublished Date of first broadcast/publication.
version (version property) The version of the CreativeWork embodied by a specified resource.
author The individual credited with the creation of the resource.
contributor A secondary contributor to the CreativeWork or Event.
publisher The organization credited with publishing the resource.
inLanguage The primary language of the resource.
audience An intended audience, i.e. a group for whom something was created.
product (name property) The name of the item. In this case, MS product covered.
license Url or CreativeWork A license document that applies to this content, typically indicated by URL.
rights dublinCore/rights Information about rights held in and over the resource.
instructionalMethod dublinCore/instructionalMethod the way instructional materials for presented (self-paced, instructor led, weekly, blended)
educationalUse The purpose of the work in the context of education. Ex: “assignment” Ex: “group work”
timeRequired (ISO
Approximate or typical time it takes to work with or through this learning resource for the typical intended target audience. Ex: “PT30M” Ex: “PT1H25M”
typicalAgeRange The typical range of ages the content’s intended end user. Ex: “7-9” Ex: “18-“
interactivityType The predominant mode of learning supported by the learning resource. Acceptable values are active, expositive, or mixed. Ex: “active” Ex: “mixed”
learningResourceType The predominant type or kind characterizing the learning resource. Ex: “presentation” Ex: “handout”
isBasedOnUrl A resource that was used in the creation of this resource. This term can be repeated for multiple sources.
educationalAlignment An alignment to an established educational framework. For example:
alignmentType Text A category of alignment between the learning resource and the framework node. Recommended values include: ‘assesses’, ‘teaches’, ‘requires’, ‘textComplexity’, ‘readingLevel’, ‘educationalSubject’, and ‘educationLevel’.
educationalFramework Text The framework to which the resource being described is aligned. For example:
targetDescription Text The description of a node in an established educational framework.
targetName Text The name of a node in an established educational framework.
targetUrl URL The URL of a node in an established educational framework.
educationalRole The role that describes the target audience of the content.
accessibilityAPI Text Indicates that the resource is compatible with the referenced accessibility API. (WebSchemas wiki lists possible values).
accessibilityControl Text Identifies input methods that are sufficient to fully control the described resource. (WebSchemas wiki lists possible values).
accessibilityFeature Text Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility. (WebSchemas wiki lists possible values).
accessibilityHazard Text A characteristic of the described resource that is physiologically dangerous to some users. Related to WCAG 2.0 guideline 2.3. (WebSchemas wiki lists possible values).
uid generated unique identifier This comes from the method used by structured documentation, but it is also how Open edX references components of the course as seen in the XML files.
competency String parent competencies of the item
skill Array skill covered in item, learning objective
proficiency String: beginner, intermediate, advanced level of skill
instructor String Person owning and responsible for the item in the LMS
modified by UTC or other global person making the last modification
replaces String if the item replaces another item.  Useful for change management
tags Array tag cloud, may not be necessary
prerequisites Array Course level; requires curation of all courses mapped in curriculum matrix; course UID of prerequisite
systemRequirements Array Not needed as metadata for most items.