Search, Sitemaps, and Quality Control Last Updated: August 08, 2016; First Released: September 26, 2014 Author: Kevin Boyle, President, DevTreks Version: DevTreks 2.0.0 A. Introduction This reference explains how to search for and find the main content in DevTreks. This includes calculator and analyzer results, multimedia, and stories. This reference addresses the following content discovery requirements: 1. Search engines must be able to index all site content. They must also be able to provide concise and relevant summaries of the content. 2. Humans must be able to find relevant content using mobile, tablet, and desktop hardware. They must have confidence in the quality of the content. 3. Machines must be able to access raw datasets and follow links to related data. They must be able to uniformly interpret the data. B. Linked Data Management The following www3.org definition introduces general guidance for web content management: Linked Data [LINKED-DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web. Search engines use Linked Data to find and index site content. That means content will appear in search engine results and can be accessed by consumers of the data. Machines use Linked Data to find datasets and then analyze them. C. Hyperlinks (IRIs) 1* Every page in DevTreks is accessed using standard hyperlinks, or IRIs. We recommend linking only to the Recommended IRI displayed on the bottom of the Preview panel. That IRI works well with search engines (1*). The following is an example of a typical IRI: https://www.devtreks.org/commontreks/preview/commons/resourcepack/Search and Quality Control/1536/none/ This IRI can be decomposed into the following properties: * Network group: commontreks (2*) * Html view: preview (i.e. display a preview of the IRI content) * Network: commons (3*) * Content node: resourcepack * Name: Search and Quality Control * Id: 1536 * FileExtensionType: none; this property is used to identify the specific linked view to load from a list of linked views associated with the IRI; it is also used in the names of raw data files The following images of the Preview panel show that page navigation uses two different types of hyperlinks. The first hyperlink (i.e. views IRI) appears as a conventional link –an underlined name appears in blue. These links allow search engines to navigate the site. The second hyperlink (i.e. Views) appears as a conventional touch screen button and allows humans to navigate the site. The first type of hyperlink can be distinguished from the second because they contain a suffix with “IRI”. These links do not retain panel content, or state. That means they only display content associated with one specific panel. The “Recommended IRI” found at the bottom of the page provides the recommended IRI that humans use to link to content correctly. The remaining “IRI” hyperlinks allow search engines to navigate to other pages. D. Sitemaps and Quality Control Sitemaps are used to provide lists of IRIs to site content. Desktop and mobile sitemaps are submitted to search engines. Many sitemaps document the results of running specific calculators, analyzers, and other applications. These serve as a software quality control mechanism (5*). Periodically (i.e. version 1.7.2, version 1.7.7), this content is rechecked to ensure the calculators and analyzers work as advertised. Detailed analysis of the numeric calculated results is only made when the calculator or analyzer itself is upgraded (rather than during periodic overall reviews). The display of any content may be improved at any time. The following site maps have links to selected calculations and analyses that are referenced in DevTreks tutorials. These IRIs, or more specifically, this evidence, will be actively maintained by DevTreks. Descendants, such as grandchildren data, will be less actively maintained. As part of software testing, additional IRIs are also tested. These latter IRIs may not be maintained if they are not essential for long term software quality control (6*). Html sitemaps to selected content can be downloaded at: https://www.devtreks.org/commontreks/preview/commons/resourcepack/Search and Quality Control/1536/none Search Engines Every page title corresponds to the name of the base element associated with the IRI. Every page’s meta description corresponds to the description of the base element. Search engines often use both attributes to index pages. Pay particular attention to the name and description given every base element. The preview panel uses basic microdata (7*) to describe images, videos, technical articles, and datasets. Some search engines use the microdata in their search results. This specific microdata will evolve (see the following section) as search engines evolve. Images should be compatible with search engine recommendations, employing bmp, gif, png, or jpg extensions. DevTreks primarily uses png extensions. Videos should also comply with data formats recommended by search engine, including mpg, mpeg, mp4, m4v, mov, wmv, asf, avi, ra, ram, rm, flv, or swf. DevTreks primarily uses mp4 extensions. Technical articles include Resource base elements that use pdf or txt data formats. They also include base elements that contain linked views which are not a formal calculator or analyzer. These linked views are known as stories (see the Story Telling Tutorial). Datasets include base elements that are linked to formal calculators or analyzers. Three self-explanatory html views of Dataset content are supported: Media (the default), Mobile, and Desktop (8*). DevTreks has no control over whether or not search engines will index specific pages. Prior to version 1.7.2 DevTreks paid very little attention to search engines because the software features were still being developed and the quality of the content may not have justified search engine indexing. As a result, very few pages were indexed. A control quality review conducted as part of version 1.7.2 resulted in the development of this tutorial, along with new Linked Data management techniques. Individual search engines explain strategies for ranking higher in search results. Their basic advice is to focus on delivering high quality and high value content. The nature of the social budgeting content in DevTreks supports the “high value” advice. The “high quality” advice needs constant attention. All base element content can be linked to multimedia, including images and videos, which should be used to enhance the quality of content. E. Machines The following image of the Views panel shows that a typical IRI hyperlink (i.e. 2 - Corn Soybeans) appears at the top of the panel. This link uses microdata markup to tell machines that it is a Dataset (see www.schema.org). The link contains the raw data results of whichever calculator, analyzer, or story has been selected. Search engines are starting to use JSON-LD as another markup format to describe content (9*). JSON-LD is primarily designed for JSON raw data formats. Currently, DevTreks primarily uses XML raw data formats. JSON-LD, along with JSON raw data, may be supported in future versions (i.e. most analyzers already have a Save as Text option). Another mechanism commonly used to support machine-accessible data is to use web application programming interfaces (Web APIs). DevTreks has experimented with this approach and found that the standard hierarchical data used in DevTreks can be accessed using this approach. Web APIs may be supported in future releases. F. Social Budgeting Markup Scientific and technical data, such as social budgeting data, is often defined using custom markup that helps machines to uniformly interpret the data. For example, the NPV Calculators Tutorial explains that budgets contain properties that include Operating Costs, Allocated Overhead Costs, Capital Costs, Total Costs, Total Revenues, and Net Returns. These types of properties can be defined as part of a Social Budgeting Markup language. Future releases will address this issue. G. Internal Search Content Management The search engine allows two ways to find base element content: Club Search: The following image shows that, by default, clubs who are logged in can search through their own content. Public Search The following image shows that the public must search through all network content. Shortfalls with public searches will be addressed in future releases. Summary Clubs using DevTreks can easily access evidence proving the basic quality of this social budgeting software. They can manage their content in ways that make the content easy to find by both humans and machines. High quality, easily found, social budgeting content may help people to improve their lives and livelihoods. Please assist these efforts by linking to social budgeting content. Footnotes 1. Http addresses, or hyperlinks, will be referred to by their w3.org-recommended formal name: Internationalized Resource Indicator (IRI). 2. All IRIs on the Preview panel have standard html hyperlinks which can be indexed by search engines. DevTreks strongly recommends linking only to the Recommended IRI found at the bottom of the Preview panel. We don’t recommend simply linking to the IRI in the browser address because that address may not uniquely identify the IRI (i.e. to increase the speed of page loading, the address isn’t changed when touchscreen hyperlinks are used). 3. Good practice is for network groups and networks to start with a cleaned up database that does not contain software and content testing data. As pointed out in many tutorials, test data should be kept on a development server rather than a knowledge bank server. Further instructions can be found in the Source Code tutorial. 4. Networks are designed to allow all content, including database content, to be stored separately from other networks. At the current time, all content from all networks is stored in one database and one file/blob storage location. This feature, which will be a technical challenge, will be fully implemented when the need for it arises. 5. These IRIs provide basic quality control. The software industry has developed more advanced quality control techniques. Some of these techniques will be used in future releases. 6. DevTreks primary role is software development rather than content development –we mainly maintain the data associated with the IRIs displayed in the sitemaps. The actual numeric results are mainly reviewed when the associated calculator or analyzer is upgraded. Data that is not maintained can be recognized by messages on the Views panel stating that the calculated results don’t exist. Also refer to footnotes 3 and 4. 7. The w3.org defines microdata as “This mechanism allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model. It is compatible with numerous other data formats including RDF and JSON.” 8. The w3.org recommends 20 KB as the maximum size of mobile page content. The Mobile View does not comply with that recommendation, but the default view, Media, can comply by keeping the linked multimedia file sizes down. It’s not clear whether that recommendation recognizes technological advances associated with mobile technology. For the most part, the Mobile view displays the same data as the Desktop view, but the data is arranged vertically, rather than horizontally. 9. JSON-LD is a lightweight syntax to serialize Linked Data in JSON [RFC4627]. Its design allows existing JSON to be interpreted as Linked Data with minimal changes. JSON-LD is primarily intended to be a way to use Linked Data in Web-based programming environments, to build interoperable Web services, and to store Linked Data in JSON-based storage engines. Since JSON-LD is 100% compatible with JSON, the large number of JSON parsers and libraries available today can be reused. References 1. HTML Microdata. http://www.w3.org/TR/2013/NOTE-microdata-20131029/ 2. JSON-LD 1.0. JSON-based Serialization for Linked Data http://www.w3.org/TR/2014/REC-json-ld-20140116/ 3. Google. Search Engine Optimization Starter Guide. Last accessed September, 2014 References Note We try to use references that are open access or that do not charge fees. Improvements, Errors, and New Features Please notify DevTreks (devtrekkers@gmail.com) if you find errors or can recommend improvements. Video tutorials explaining this reference can be found at: https://www.devtreks.org/commontreks/preview/commons/resourcepack/Search and Quality Control/1536/none/ DevTreks –social budgeting that improves lives and livelihoods 1