This document describes how to prepare metadata for inclusion in the Gateway to Educational Materials. It is intended primarily for organizations that intend to use batch processes to generate and submit descriptions of their educational resources.
Document history
This version: November 29, 2004Previous version: February 6, 2004
Most recent version: November 29, 2004
Overview
This document describes how to prepare metadata for inclusion in the Gateway to Educational Materialssm. It is intended primarily for organizations that intend to use batch processes to generate and submit descriptions of their educational resources.
To make your educational resources accessible through the Gateway, your organization must be a GEM consortium member. If your organization is not a member, please read about the GEM Consortium and then complete the application form.
After being accepted as a GEM consortium member, your organization will need to commit staff time to creating metadata records for your resources. This is because the Gateway is more than a list of links. It is a searchable collection of metadata records that describe and point to educational resources.
In general, there are two ways to go about creating GEM metadata:
1. Output metadata (converted to conform to the GEM 2.0 element set and syntax requirements) in batch from your local database or other data source.
2. Use the GEMCat metadata creation tool to create separate metadata files.
In the future GEM will also support data exchange through the OAI Metadata Harvesting Protocol.
Considerations for creating batch output
Metadata records submitted in batch must be in either XML or RDF/XML format. They must conform to the guidelines specified in the remaining sections of this document and posted on the Gateway documentation pages.
Elements must conform to the GEM 2.0 element set and each metadata record must contain a URL pointer to the resource at your site.
Depending on the data elements that are currently defined in your data set, and how well they correspond to the GEM elements, outputting data in batch may be simple or quite complex. If you choose to output metadata in batch, GEM can assist you in performing the intellectual task of mapping your data elements to ours. It is your responsibility to produce the data with the correct content and syntax.
Usually there are several iterations of sample files/testing/feedback before the resources can be successfully added to the Gateway. Once the process is in place, you can provide us with new/updated data at your discretion.
Process for submitting batch output
Here is an outline of the steps you can expect if you decide to create batch output:
1. Your staff: Become familiar with the GEM element set, as documented at GEM 2.0 Application Profiles and its individual element pages.
2. Your staff: Analyze your data and create a "crosswalk" document mapping your data to the GEM metadata element set and vocabularies. (If you have already implemented the GEM element set at your site, this process is quite simple.)
Some organizations prefer that GEM staff take the lead creating the crosswalk. If you would like us to create the crosswalk document, you will need to send a list of your fields, with detailed descriptions (such as field lengths and values). Also send several sample records that contain all your fields, in plain text or in your local XML format.
3. Your staff and GEM staff: Review the crosswalk document and jointly resolve any problems or questions that arise.
4. Your staff: Create a script or program to produce the output containing GEM metadata elements, following the crosswalk created in step 2 above and the guidelines in this document. Produce a sample of the output.
5. GEM staff: Test the sample output; provide feedback to you.
6. Your staff: Resolve any problems; if necessary, send another sample output. Once all problems have been resolved, determine a schedule for regular production/harvesting of batch output.
Requirements for XML and RDF/XML data submissions
(GEM does not have a DTD or an XML schema for the GEM element set at this time. Please follow the guidelines that appear below.)
1. Syntax: Metadata records submitted in batch must be in XML or RDF/XML format.
2. Element set: Elements and their content must conform to the GEM 2.0 element set and either the "Gateway Lite" or "Gateway Full" application profiles, as documented at GEM 2.0 Application Profiles and its individual element pages.
Consult the application profiles to determine which elements are mandatory, optional, or advised, and which are repeatable.
The Guidelines for Implementing Dublin Core in XML are useful in understanding the XML syntax used in the examples on the GEM 2.0 Application Profiles pages.
3. Tags, namespaces and attributes: Use tags, namespaces and attributes as shown at GEM 2.0 Application Profiles and its individual element pages.
4. XML prolog: Supply the following xml prolog to each record:
<?xml version="1.0" encoding="UTF-8"?>
5. Root element:
The root element for XML records should be: <record>
The root element for RDF/XML records should be: <rdf:RDF>
6. Namespace declarations: The following namespace declarations should appear at the beginning of each record:
for XML records:
<record xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:gem="http://purl.org/gem/elements/"
xmlns:gemq="http://purl.org/gem/qualifiers/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
for RDF/XML records:
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dctype="http://purl.org/dc/dcmitype/"
xmlns:gem="http://purl.org/gem/elements/"
xmlns:gemq="http://purl.org/gem/qualifiers/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
7. Additional requirements for RDF/XML records:
Use the rdf:Description element to contain the other metadata elements. Put the URL of the resource in the rdf:about attribute, as follows:
<rdf:Description rdf:about="http://url.of.the.resource.here">
Supply an rdf:type property to each record as follows:
<rdf:type rdf:resource="http://purl.org/gem/qualifiers/GEM2"/>
8. URL pointers: Each metadata record must contain a URL pointer to the resource at your site. The URL pointer must be encoded in the <dc:identifier> element as follows.
for XML records:
<dc:identifier xsi:type="dcterms:URI">
http://url.of.the.resource.here
</dc:identifier>
for RDF/XML records:
<dc:identifier rdf:resource="http://url.of.the.resource.here"/>
9. Character encoding and character entities:
Use UTF-8 character encoding.
You may use the five predefined XML named character entities (' " > < &) in field content. Do not use any other named character entities.
If you are unable to provide UTF-8 multi-byte sequence character encoding for characters outside the Basic Latin character set (Unicode 0000 through 007F), provide numeric character references containing the correct Unicode value expressed in either hexadecimal or decimal.
For example:
| character | name | utf-8 multi-byte sequence (in hexadecimal) | numeric character entity, with Unicode value (in hexadecimal) | numeric character entity, with Unicode value (in decimal) |
| ¿ | inverted question mark | C2 BF | ¿ | ¿ |
| é | small e with acute accent | C3 A9 | é | é |
| — | em dash | E2 80 94 | — | — |
10. Delete HTML markup: Delete HTML markup tags that appear within field content (for example, <b> <i> <a href> <br> <p> etc.)
11. Line separators: To support human readability, each XML tag and value should appear on a separate line. (This is a recommendation but not an absolute requirement). For the line separators, use a carriage return followed by a line feed (hex values 0D 0A).
For example:
<dc:title>
Balancing your checkbook.
</dc:title>
Not:
<dc:title>Balancing your checkbook.</dc:title>
12. Individual files: Submit each metadata record as an individual file. (This is a recommendation but not an absolute requirement).
13. File naming: Do not use spaces in individual XML or RDF/XML file names.
14. Well-formed records: Before submitting records to GEM, use an XML tool to ensure that they are well-formed. If your records are not well-formed, it will significantly slow the process of adding them to the Gateway.
Use of controlled vocabularies in GEM metadata records
For the following fields and their refinements, you are encouraged to use terms from the GEM controlled vocabularies:
Audience (Mediator, Beneficiary, Education Level)
Pedagogy (Assessment, Grouping, Teaching Methods)
Price Code
Subject
Type
If you submit XML or RDF/XML records, do not use literal values for GEM controlled vocabulary terms. Instead, enter URIs that point to the values as they appear in the RDF versions of the controlled vocabularies. The pages at GEM 2.0 Application Profiles include examples of how to encode GEM controlled vocabulary terms.
The Gateway is also able to support terms from other controlled vocabularies, within certain constraints. If you are interested in using vocabulary terms from a controlled vocabulary other than GEM, please consult with the GEM staff regarding technical requirements and implementation issues.
Considerations for using the GEMCat metadata creation tool
If you are unable to output metadata in batch, you may want to consider using the GEMCat metadata creation tool to create individual metadata records for selected resources at your site.
The GEMCat tool is available for download (at no cost) from the GEM site. GEMCat presents the user with input windows that allow the user to describe resources using the GEM element set. GEMCat saves each metadata record in a format that is appropriate for inclusion in the Gateway. The advantage to this approach is that GEMCat automatically creates data with content and syntax that GEM can process. A disadvantage is that in the long-term it can be more labor-intensive than the batch approach, since each metadata record is created individually. In addition, if your site undergoes revisions or resources are removed, you must remember to notify GEM or submit updated metadata records.
For more information
For more information, please contact the GEM staff at geminfo@geminfo.org and include the phrase "metadata preparation question" in the subject line.
