Social Performance Analysis 3
Last Updated: March 27, 2019; First Released: May 17, 2018
Author: Kevin Boyle, President, DevTreks	
Version: DevTreks 2.1.8
Appendix E.  Examples 5 to 8

This Appendix uses online datasets to explain how to carry out Social Performance Analysis, primarily related to stakeholder Performance Monitoring, Impact Assessment, and Impact Evaluation. The first 4 examples can be found in in the Social Performance Analysis 2 (SPA2) reference, which should be read first. The RCA Framework can be found in Appendix A of the Social Performance Analysis 1 (SPA1) reference. The CTAP algorithms, covering Disaster Risk Prevention, can be found in the CTA-Prevention reference in the SPA tutorial.

ExamplePage* 5. Rural Stakeholder Resource Conservation Value Accounting (RCA6)2* 6. Disaster Stakeholder Resource Conservation Value Accounting (RCA6)88* 6A. Stakeholder Abbreviated Resource Conservation Value Accounting (RCA7)
* 6B. Correlated Hazards135
149* 7. SDG Stakeholder Resource Conservation Value Accounting160* 8A. SDG Stakeholder Resource Conservation Value Accounting using .NetStandard Libraries180* 8B. SDG Stakeholder Resource Conservation Value Accounting using R204* 8C. SDG Stakeholder Resource Conservation Value Accounting using Python206* Appendix A. Standard Statistical Impact Evaluation Analysis208* Appendix B. Machine Learning Impact Evaluation Analysis242
Several of the algorithms introduced in this reference employ machine learning techniques that are under active development in their underlying statistical libraries. Like all DevTreks algorithms, they’ll evolve in future releases. Most of the algorithms referenced in this tutorial have been tested using the upgraded 2.1.6 calculator patterns. Although Version 2.1.6’s upgraded security will automatically redirect http://localhost:5000 URLs to https://localhost:5001, the SSL URL should be used with all algorithms.
A video tutorial explaining this reference can be found at:	
The Social Performance Analysis tutorial on the DevTreks home page.



Example 5. Rural Stakeholder Resource Conservation Value Accounting (RCA6)
Algorithms: RCA algorithms and algorithm1, subalgorithm17
URLs: 
https://www.devtreks.org/greentreks/preview/carbon/resourcepack/Coffee Firm RCA Example 5/1558/none
https://www.devtreks.org/greentreks/preview/carbon/output/Coffee Firm RCA5 Stock/2141223484/none
http://localhost:5000/greentreks/preview/carbon/resourcepack/Coffee Firm RCA Example 5/543/none
http://localhost:5000/greentreks/preview/carbon/output/Coffee Firm RCA5 Stock/2141223499/none

A. Introduction

This example continues with the Coffee Farm, and surrounding community, introduced in the SPA2 reference to demonstrate tying stakeholder socioeconomic characteristics, such as population, age, and gender, to the Social Performance Assessment metrics introduced in that reference. Beside that reference’s emphasis on private sector materiality impact and performance reporting, this example also illustrates how community service organizations, such as Producer Organizations, RCA Districts, Community Sustainability Offices, or National Statistical Offices (NSOs), can report on periodic accomplishments in efforts to conserve scarce resources for targeted stakeholders.

B. Introduction to Sustainable Development Data Systems

The Sustainable Development Solutions Network (Espey et al, 2017) summarize the challenges faced by most countries, and developing countries in particular, in developing the data systems needed to accomplish the UN’s Sustainable Development Goals, or SDGs. They describe the primary challenge as follows:

“The adoption of the SDGs by the global community in 2015 represents a concerted attempt to ensure sustainable and equitable development over the period from 2015 to 2030. Meeting the goals and their associated targets will be challenging, particularly in the context of the overarching principle that no one should be left behind. However, there is a real risk that the efforts made to meet the goals and targets may be compromised by the inadequacy of the data, particularly in the Global South, to benchmark, monitor and track the 232 unique indicators of progress in meeting the SDGs.”
Using the context of climate change, the UN (2011) summarizes why the SDGs are important –they are designed to help ameliorate conditions endemic to most poor people in most countries. As the climate change-related SDG Indicators make clear, they also will help ameliorate conditions that have the potential to wreak havoc on the majority of the world’s countries and citizens (1*).
“In sum, the people most vulnerable to climate change are usually poor, undernourished, of poor health, live in precarious housing conditions, farm on degraded lands, have low levels of education, lack rights, have little opportunities to influence decision making, work under precarious conditions, and/or reside in countries and regions with non-resilient health systems, limited resources and sometimes poor governance systems.  Social, cultural or political circumstances, often including inequalities and discriminatory practices, deprive them of the basic assets and entitlements and the institutional support needed to make a living and ensure their well-being even under normal conditions, let alone for mastering the increased and additional challenges posed by climate change. These non-climatic factors and the socioeconomic context in which climatic problems occur is likely to be as important, if not more so, than climate-related hazards themselves.”
Espey et al (2017) document how National Statistical Offices (NSOs) have primary responsibility for the censuses, household surveys, civil registration and vital statistics systems, administrative systems, and geospatial systems, needed to report civil sustainability data. Although the authors may be right about the central roles played by country-based NSOs in ensuring success with the SDGs (i.e. as contrasted to contemporary alternatives such as international, Cloud Statistical Offices, or CSOs), this tutorial emphasizes the roles of social networks, clubs, stakeholders, producer organizations, community service organizations, companies, investors, end-product consumers, and supply chain participants, in collecting, reporting, and acting on sustainability data (2*). 
To anchor these data challenges, and the reality of their moral imperative, the following images (UNEP/SETAC, 2017; UN, 2017) display the SDG Targets and the first few of the 230+ SDG Indicators.


Gordon (2016) uses the following image to explain the relation between the SDG and the related Sendai Disaster Risk Reduction (SDRR) Framework. As with the SDG, the SDRR’s targets and indicators have also been adopted my most countries and most countries face similar challenges in collecting and applying the data. Country leaders, or more accurately, informed country leaders, are motivated to comply because they have first-hand experience proving that climate change is wrecking their countries and citizens. Writing billion dollars checks for disaster recovery provides clarity for the informed country leaders. Espey et al (2017) mention that adoption of these global data systems “has created a rare but significant opportunity to build coherence across different but overlapping policy areas”.

KPMG (2016, in the SPA1 reference) points out the SDGs contain a specific target (12.6) for private sector companies to integrate sustainability [Indicators] into their reporting cycles. As documented in the SPA1 and SPA2 references, recent business news reports continue to verify the increasing importance of ESG (Environmental, Social, and Good Governance) reporting and investment in the private sector. Investors, supply chain participants, end-product consumers, employees, and local communities, want this evidence, and more companies are responding to the demand (i.e. Footnote 1’s “taking independent action”). 

As an example for the agricultural sector, the IFC (2013) confirms that, like many investors in this sector, “[they require their] clients to identify, avoid, mitigate, and manage [Environmental and Social], or E&S, risks and impacts as a way of conducting sustainable business”. They incorporate the SDG throughout their corporate strategy for the years 2018 to 2020 (IFC, 2017). They document the immense role that the private sector can play in accomplishing the SDG and clearly expect their private sector clients to also incorporate the SDG in their corporate strategy. For example, they document how 34 development finance institutions have adopted their Corporate Governance Development Framework, which help companies to fully employ the management accountability systems needed to carry out the SDG. Like many investors and asset managers, they recognize that the SDG Indicators are comprehensive and flexible enough to be incorporated into their client companies’ sustainability accounting and reporting systems (3*). 

Vanclay et al (2015) mention that the IFC’s performance standards (see IFC, 2012), have in fact, become international benchmarks, as suggested by IFC (2013): “Whether or not a company is an IFC client, IFC Performance Standards provide a useful framework for identifying risk assessment variables. Companies may choose to use all eight IFC Performance Standards as the framework to manage risk in their supply chain”. The SPA1 reference introduced equivalent sustainable reporting systems, such as EMAS, SASB, GSSB, and B-Lab, that companies also use as sustainability standards. Examples 1 to 4 demonstrate how specific industries have developed industry-specific supplements to these data systems, such as WHO’s health care reporting framework and COSA’s agricultural smallholder performance framework. The goal of most sustainability standards systems, including the SDG and related standards systems, is to institutionalize “integrated and cross sectoral approaches” for sustainable social, environmental, and economic development.

The Task Force on Climate Related Disclosures (2017) provide professional financial accounting recommendations showing companies how to tie domain-specific sustainability reporting (i.e. related to climate change) to “publication of [domain-specific] financial information in mainstream financial filings”. The 17 SDG Goals address many domain-specific topics, including climate change and gender equality, which can directly lead to company materiality impacts and therefore require “publication of [SDG]-related financial information in mainstream financial filings [that] will ensure that appropriate controls govern the production and disclosure of the required information”. 

Many private sector companies are realizing that, in order to remain competitive with their peers, they must follow suit and adopt sustainable business data systems that assist them to prove the “social soundness” of their business activities. IFC (2015) uses the term Environmental and Social Management System (ESMS) and Vanclay et al (2015) use the term Social Impact Management Plan in similar ways to sustainable business data systems. To assist in the adoption of these systems and their international reporting norms, this reference replaces the globally-accepted term, “Sustainable Development Goals”, and the sustainable value accounting framework term, Resource Conservation Value Accounting (RCA), for the hodgepodge of terms commonly used in past eras for materiality reporting, such as “Economic, Social, and Good Governance”, “Environmental and Social Management System”, or “Corporate Social Responsibility”. As with the SDRR system, adoption of these global data systems “has created a rare but significant opportunity to build coherence across different but overlapping policy areas [, including private sector accountability via SDG materiality impact reporting]”. 

Espey et al (2017) imply that no clear data platform has appeared, or will likely appear, that is appropriate, acceptable, and useful for meeting the needs of every country’s sustainable development data systems and every company’s sustainable business data systems. The authors conclude that a broad array of “data actors” have roles to play in collecting, managing, and employing this data (2*). They further speculate about public sector data roles “[morphing] from producer to coordinator of a broad data ecosystem”.

Appendix A in the SPA1 reference points out that an important goal of this tutorial is to harmonize private and public sector sustainability reporting. This example demonstrates how both the public and private sector can monitor, evaluate, and report, their SDG, and general sustainability, accomplishments. In particular, this example demonstrates harmonizing NSO data systems, which document SDG goal accomplishment, with this reference’s emphasis on drilling down to explore the why and how behind the accomplishments –specifically, “what factors caused sustainability to change for different groups of stakeholders? How will sustainability change for particular groups of stakeholders? How did businesses, communities, and public agents contribute to these accomplishments?” 

Ideally, NSOs and industries institutionalize both sets of data via “administrative data systems”. One-time, “very sound”, “outside expert”, assessments are actively discouraged in favor of ongoing, “reasonably professional”, “local administrative”, assessments. As DevTreks demonstrates, evolution in data development is an ongoing task.

C. Introduction to Stakeholder Impact Assessment (SIA)

Vanclay et al (2015) use The International Principles for Social Impact Assessment to introduce SIA:

“[SIA is defined as being] the processes of analysing, monitoring and managing the intended and unintended social consequences, both positive and negative, of planned interventions (policies, programs, plans, projects) and any social change processes invoked by those interventions”.

This reference extends SIA definitions and purposes beyond special projects and public sector actions to include regular activities carried out by private sector firms and community service organizations (CSO). In addition, SIA is extended beyond community stakeholders to include company. CSO, and local community stakeholders, including employees, supply chain participants, consumers, investors, and asset managers. Although these companies and organizations are not in the public census-taking business often needed for full SIAs, they are, or will be, fully involved in the performance monitoring, impact evaluating, and transparent reporting, business needed for SDG materiality reporting. As this reference will make clearer, if the companies and organizations can’t or won’t fulfill this enhanced reporting, their stakeholders may either do it for them or engage in other types of “ameliorating actions”.
Vanclay et al (2015) use the following statement to further clarify the relation between company and organization activities, SIA, the SDGs, and “sustainable social development”. The statement makes clear that “social soundness”, as proven by SDG materiality impact reporting, involves more than just the charitable activities carried out by many private sector firms and CSOs. A key SIA goal is to improve the company and community institutions that many people in many countries have come to distrust (as demonstrated by several of the Footnotes) but are often the main avenue for improving local society. Even hard headed business people who question the validity of this statement may agree that modern stakeholders have “modern issues” with current institutions.

“Social development means more than just providing a few jobs and providing funding for a new school building or swimming pool, it requires that the project partner with the local communities in being a force for positive social change and beneficial social development. Social development should be a participatory process of planned social change designed to improve the wellbeing of the community as a whole and especially of the vulnerable, disadvantaged or marginalised groups within a region. Rather than being about benefits to individuals per se, social development is more about facilitating change in institutions and society to reduce social exclusion and fragmentation, to promote social inclusion and democratisation, and to build capacity in institutions and governance.”

Vanclay et al (2015) use the following description and image of “community capitals” to illustrate the relation between the frameworks that underlie SIA and the SDG.  The RCA Framework introduced in the SPA1 reference employs the same capitals but uses the terms Physical Capital for Built Capital, Economic Capital for Financial Capital, and Institutional Capital for Political Capital. These frameworks aim to achieve better societal, or public service, outcomes and impacts from private and public sector activities. 
“The Sustainable Livelihoods Approach considers the capabilities, livelihood resources (assets, capitals) and livelihood strategies (activities) people undertake to make their living and conduct their way of life. At the heart of the model is the notion that all community resources or assets can be represented as a set of capitals. The assessment of social investment strategies can consider these capitals and how strengthening one or more of these capitals might increase the overall wellbeing in the community.”



In terms of private sector materiality impact reporting, Vanclay et al (2015) introduce a glossary of SIA definitions to help companies understand “international reporting norms”, especially related to human rights, which, in this digital age, can quickly impact their companies’ reputations, product sales, and existence (i.e. sexual harassment in the USA movie industry). For example, in the following SIA definition, the term “local impacted communities” can easily be replaced by the term “investors, supply chain participants, end-product consumers, employees, stakeholders, and impacted communities”:
“Social Licence to Operate refers to the level of acceptance or approval of the activities of an organization by its stakeholders, especially local impacted communities. Leading corporations now realize that they need to meet more than just the regulatory requirements, they also need to consider, if not meet, the expectations of a wide range of stakeholders, including international NGOs and local communities. If they don’t, they risk not only reputational harm and the reduced opportunities that might bring, they also risk being subject to strikes, [, boycotts, disinvestment, whistle blower disclosures, social media campaigns, hacking, inclusion contracts] protests, blockades, sabotage, legal action and the financial consequences of those actions. In some countries, ‘social licence’ has become an established element of the language of business, actively influencing, if not driving, the business strategy of many companies, and is part of the governance landscape”.
Vanclay et al (2015) and IFC (2013f) use the following images to summarize the business value that SIA, and its applied SDG (i.e. E&S) risk management, offers companies. Although these positive impacts are reason enough for companies to take SIA and the SDG seriously, more companies are realizing that the modern digital age makes Vanclay’s (2015) formal SIA instruments, such as Social Licenses to Operate, Impacts and Benefits Agreements, and this reference’s algorithms, much more practical, transparent, and verifiable, as well. And even without formal SIA instruments, recent SIA-related social media campaigns (i.e. sexual harassment in the USA) prove that modern digits can quickly impact industries and companies. Finally, TFCD (2017) confirms that the materiality impacts of this increased “social accountability” actually requires companies to carry out formal SDG-related financial reporting. Failure to comply may be grounds for investor lawsuits (i.e. like those recently filed for sexual misconduct in the casino industry) and consequential digital activism (7*).


Whether companies like it or not, digits shift power away from conventional institutions, which they have some control over (but which many people distrust), to stakeholders, investors, supply chain participants, consumers, community service organizations, and impacted communities. Many of these stakeholders, or more accurately, informed stakeholders, are realizing their power and recognizing the stark difference between marketing fiction and sustainability evidence. That’s leading them to take independent “ameliorating” actions (i.e. purchases of verifiably sustainable products, protest marches, disinvestment, lawsuits, local activism, app development, social contract riders, and social media campaigns). Bad actor organizations, companies, and their executives have reason for concern. Misinformed employees have reason to become informed (or to find misinformed companies to work for).
The following image (IFC, 2015) demonstrates how companies can integrate sustainable business data systems into their overall management systems. The IFC references provide detailed instructions that companies can follow to implement these systems. They use the term “You can’t improve what you don’t measure” to explain why companies adopt the formal instruments used to apply these systems. Note carefully the emphasis on stakeholders, affected communities, monitoring and evaluating, risk and impact identification, and transparent reporting. Many of the concrete examples used in this IFC reference relate to agricultural firms.


Vanclay et al (2015) describe the relationship between the impact pathways introduced in Examples 1 to 4 to this example’s SIAs:
“[S]ocial impacts are rarely singular cause-effect relationships. There are complex patterns of intersecting impact pathways.”
This example introduces basic, but professional, SIA. The “impact transition states” explained in this reference and the “impact pathways” used throughout this tutorial will partially address Vanclay’s requirement to understand “complex intersecting patterns”. The algorithm addresses several IFC (2015) recommendations for incorporating SIA and the SDG in formal company management systems (i.e. stakeholder engagement, monitoring and evaluating, managing risks and impacts, and transparent community reporting). The machine learning algorithms introduced in this reference begin to demonstrate fuller SIAs (i.e. because the algorithms are getting better at understanding “complex intersecting patterns”).

D. Stakeholder System Boundary and Stakeholder Engagement

System boundaries are a key initial step in defining the primary stakeholders being impacted by a community service organization intervention, or a company activity. The following images (Vanclay et al, 2015, IFC 2014, UNEP/SETAC 2017) introduce this phase. Although the Vanclay reference relates primarily to capital improvement projects that impact community stakeholders, this reference extends SIA to company and organization activities that impact employees, stakeholders, supply chain participants, consumers, investors, and asset managers.
Example 3 in SPA2 explains that UNEP/SETAC (2017) recognizes stakeholder engagement as a key element in effective LCIA and Hot Spots studies –stakeholders need to be engaged “early and often”. Vanclay et al (2015) discuss using stakeholder engagement to gain stakeholder “trust and respect”. Those authors distinguish between the “statutory” basis for many public participation outreach efforts and the “actual” stakeholder engagement needed to conduct SIA (10*). 



IFC (2014) use the following image to illustrate the use of a Stakeholder Engagement Plan as a basis for identifying and working with impacted communities and stakeholders when conducting SIA.




Vanclay et al (2015) use the following image to illustrate the use of a Community Engagement Plan as a basis for getting more participation from impacted communities and stakeholders when conducting SIA.



IFC (2015) use the following statement to clarify that SIA is fully applicable to small and medium size enterprises (SMEs). As the previous images demonstrate, standards setting organizations, such as the producer organizations and CSOs introduced in the SPA2 reference, often work at national and industry scale to help SMEs with these systems (5*). SPA2 verifies that many of the SMEs are actually privately run businesses, such as family farms, rather than publicly traded corporations.

In terms of natural resources conservation, the following watershed management framework (FAO, 2017) confirms that stakeholder boundaries often relate to watersheds, land use areas, ecosystems, and Resource Conservation Districts. These scales are important because the impact of individual private sector. CSO, and local community activities take place at landscape, watershed, and ecosystem, scale. Coordination of multiple stakeholders is required to achieve landscape-wide, and watershed-level, SDG “materiality” impacts. 

Although companies themselves don’t usually lead watershed management teams, they often are stakeholders in, and sponsors of, a community’s overall watershed, or ecosystem, management plan (i.e. BLM or Forest Service plans in the USA). FAO (2017) describes the central role played by the SDG in these integrated landscape management approaches and provides practical advice for incorporating the SDG, or “integrated and cross sectoral approaches”, in resource conservation planning. Business support for these approaches plays an important role in improving “community institutions”, because as FAO states “The private sector has had little role in the mobilization of resources [for watershed planning]”, but “Incorporating [SDG-related] projects in existing institutional structures is crucial at both the local and national levels.” (10*) 
Watershed and landscape scales are also important because tradeoffs are needed to accommodate the needs of impacted stakeholders who hold different value systems. Example 4B in SPA2 discuss the importance of carrying out value assessments, including Cost Effectiveness Analysis, using “perspectives” that reflect the value systems held by different groups of impacted stakeholders. Understanding the value systems, as measured using value assessment “perspectives”, offers a better chance of reducing conflict and achieving equity, and thereby accomplishing successful results. Several references endorse allowing each stakeholder group to take ownership of their own sustainability assessment, including the required M&E system. In the latter case, RCA Assessors play consultant roles to each stakeholder group. These experts focus on understanding each stakeholder perspective, identifying potential tradeoffs among stakeholders, and exploring possible sustainability transition states (i.e. low sustainability to high sustainability), that can keep enough of the stakeholders satisfied to achieve the ultimate sustainability goals. In practice, that means carefully considering and recommending refinements to Example 4B-like Natural Capital Care Perspective, Stakeholder Groups Perspectives, and the Societal Perspective.
E. Risk and Impact Identification
The IFC references related to agriculture introduce several agricultural case studies and examples relevant for SIA purposes. For example, the following images (IFC, 2014 and 2015, and TFCD, 2017) demonstrate how to more thoroughly identify crop production risks and impacts. Although similar studies can be found throughout the agricultural sustainability literature, most reflect the era before the SDGs, cloud computing open source apps, and machine learning algorithms.



The Process Map used to identify risks in the following image (IFC, 2014) is similar to the “impact pathways” introduced in SPA2 for conducting short term Performance Monitoring and to DevTreks hierarchical base elements. Example 1, 2, and 3 demonstrate how to use the impact pathways to monitor and evaluate company SDG-related accomplishment. Examples 3, 3A, and 3B, discuss deriving the Resource Inventory Phase of LCAs by using standard Operating and Capital Budgets to define a company’s Inputs, Operations (or Processes, or Activities), Outputs, Outcomes, and Budget (i.e. holding Impacts such as Net Returns) base elements. Example 4 demonstrates using TEXT datasets to define the same base elements. DevTreks applications, Budgets, impact pathways, and this Example’s impact transition states, offer companies and communities several applied tools to thoroughly understand these work processes, to identify and reduce these risks, and to manage the impacts.


The following image (Taylor-Powell et al, 2003) of an impact pathway demonstrates that many longer term Impact Evaluations, especially those involving local community interventions or company long term strategy, will be less interested in identifying project or business risks and instead focus on choosing Indicators that can verify results –that can be used to evaluate cause and effect attribution. The primary long term business risks arise from wasting resources on ineffective Activities and failing to achieve acceptable levels of SDG-related Impacts. For developing countries, the IFC documents the importance of identifying and reducing risks that are external to a company’s internal activities, such as social discord, disease outbreaks, and institutional corruption. For companies in developed countries, this example identifies external risks related to informed stakeholder actions, such as consequential digital activism. Activities must be broadly defined in these Logic Frames and impact pathways. Example 7 [may] demonstrate using the impact pathways in machine learning algorithms to more fully identify and reduce these risks and evaluate their long term impacts (i.e. causal inference and causal tree algorithms). The image also highlights how activities must be grounded in the socioeconomic root causes of societal problems, with less emphasis on superficial band-aids (i.e. gun violence in the USA).


The following image (Independent Evaluation Group, World Bank, 2017) verifies that impact pathways, or theories of change, make good frameworks for helping companies and local communities to understand, account for, and report on, their own accomplishments, including SDG-related achievements (8*). Section H, Communication, demonstrates how to use Results Chains to communicate this algorithm’s raw Math Results to stakeholders.

The RCA framework introduced in Appendix A, SPA2, used the following image to introduce terms used in ecosystem and disaster assessment that are also used to reduce risk, including stressors, drivers, pressures, exposure, vulnerability, and resiliency. These techniques, which will be revisited in Example 6, are especially important when carrying out risk identification and reduction at watershed, landscape, and planet (i.e. climate change) scales.

The FAO’s (2017) review of 12 watershed management projects found that “capacity gaps identified in the watershed assessments were mainly in the socio-economic disciplines and linked to inadequate analytical skills.” Many natural resource planning groups are bad at incorporating humans and their institutions, and identifying socioeconomic-related risks and impacts, in their work. The instruments, or algorithms, introduced in this reference reinforce the FAO recommendation to employ “standardized format[s] for data and information collection” as a means of institutionalizing the missing socioeconomic and institutional analysis skills. New types of jobs, such as RCA Assessors and RCA Technologists, which focus on applying modern IT to understand “value perspectives” and facilitate “stakeholder tradeoffs”, offer a promising means to institutionalize the needed skills at scale and scope.
Dataset 1. Business Resource Conservation Value Accounting (5*)
The following image displays part of a typical, but stylized, TEXT dataset used with this Business Value Report to identify the risks and impacts introduced in Example 1 to 4’s impact pathways and datasets. Many of these socioeconomic characteristics derive from SPA2’s COSA and Sustainable Food Lab references.
1st TEXT dataset. Demographics and SDG Indicators. The following dataset shows that the CIs record socioeconomic characteristics for different stakeholder groups while each CI’s children Indicators measure SDG-related measurements that derive from related subalgorithms, such as Examples 1 to 4. In this dataset, the 2nd CIs are actually measuring land use transition states, rather than population transitions.

2nd TEXT dataset. Population transition states and SDG allocations. The following dataset shows that the CIs record uncertain population measurements for the 1st dataset’s stakeholder groups while the CI Indicators allocate the 1st dataset’s SDG-related measurements to these stakeholders. Additional population algorithms may employ alternative population modeling techniques.

Vanclay et al (2015) point out that SIA is often conducted concurrently with, or before, fuller sustainability assessments. In many cases, Example 1 to 4’s RCA-style datasets may need to be completed specifically for the impacted populations first identified using this algorithm. For example, once populations have been characterized and aggregated into typical TEXT hierarchies, such as this dataset’s poverty-related CIs, the SPA2’s RCA tools are completed for those specific groups. This algorithm is rerun after the SPA2 data results have been transferred back to these datasets. This technique allows more fined tuned control in identifying SDG-related risks that impact both the companies and their stakeholders. The Monitoring and Evaluation section, below, discusses how companies and local communities carry this out as part of their standard management system
The following features of these 2 datasets highlight how companies use this algorithm for RCA reporting. 

1st TEXT file. Demographics and SDG Indicators

Stakeholder Characteristics Titles (3*): In this dataset, the first 2 rows hold the titles for the stakeholder socioeconomic characteristics being recorded. At least 1 row must be used to hold these titles, but the algorithm supports additional rows. These particular characteristics come from several of this tutorial’s references. The Sustainable Food Lab and COSA references found in SPA2, in particular, explain the importance of developing standard characteristics that can be used throughout industries (i.e. agricultural stakeholders in developing countries). Ultimately, this data supports the end goal of cause and effect attribution pursued by Impact Evaluation to explain sustainability and stakeholder impacts.

The specific reason for these 2 particular sets, or rows, of socioeconomic characteristics is that farm sustainability is usually measured in terms of both stakeholder characteristics and supply chain, and in this example, farm, characteristics. Did conservation practices cause farmers’ income go up or down? Were farmworkers protected against safety hazards? Did cultivation practices cause soil erosion rates and farm productivity to go up or down? Did grazing practices result in better wildlife protection? Depending on the purpose of the SIA, additional characteristics may be based on food security, health care, public infrastructure, and specific ecosystems. 

Only the most important socioeconomic variables –those needed to explain attribution, should be included in these datasets. Datasets that employ more than 11 to 22 variables are more likely to be “fishing for attribution” rather than “explaining attribution” and should be analyzed using other algorithms prior to being aggregated into these datasets (i.e. refer to Example 7, IITA and COSA, 2016, and Hoebink, 2014, for examples).

Total Risk Indexes (TR) and Locational Indexes (LI): The examples in this reference will demonstrate a variety of purposes for these Indexes, including further subdivision of socioeconomic data and additional life cycle stages.

Categorical Index (CI) Stakeholder Characteristics: Each CI records measurements for each of the title rows’ characteristics. The goal behind this particular set of properties is to support data collection for single stakeholder members or for groups of stakeholders. The 2 CIs for this dataset relate to stakeholder groups, such as small-scale, low income, coffee farmers, and agricultural areas or “sub-ecosystems”, such as moderately sloped, higher elevation, coffee plantations. The Indicators include averages and percentages, such as average poverty index and age, and percent of the population that has at least 1 disability or that has a minority ethnicity. Stakeholder groups that can’t be described using averages or percentages may require a separate CI, or a different Indicator property.

The first CI documents demographic characteristics of the CI’s stakeholders. These measurements can be based on single stakeholders, straight population counts, or on common numerator/denominator population metrics, such as 1,000 very low income workers per 10,000 total worker population.

The second CI documents land characteristics for single farm plots or for groups of farms. These measurements can be based on single farms, straight area measurement, or on common ecosystem-related metrics, such as land use areas or watersheds.

This example demonstrates using combinations of single demographic characteristics, such as age and gender, with multi-variate indexes, such as poverty indexes and visual soil assessment indexes (refer to IITA and COSA (2016) for an example of how to use index-based evaluations). The decision on whether to use single characteristics or indexes should be based on their causal relationship to sustainability, to international norms for reporting, and to data collection cost. For example, companies that already use SDG M&E systems, or local communities who use watershed simulation models, can take data from those systems to develop land use characteristic metrics for soil quality, water quality, air quality, plant quality, and animal quality, indexes.

The RCA-style instruments introduced with Examples 1 to 4 have complementary relationships with these socioeconomic characteristics. For example, the Poverty Index used with this dataset can also be considered a Categorical Index in Examples 1, 1A, 2, and 3B that documents a COSA or SAFA theme, such as Livelihood and Well-Being. This algorithm’s socio-economic characteristics serve 2 primary purposes, which help to distinguish them from the RCA impact pathway Indicators (5*):

1. Disaggregated SDG Reporting. They support the demographic disaggregation required to fully report SDG and SDRR accomplishments. The UN recommends disaggregating this data by demographic characteristics such as geography, income, sex, age, and disability.
2. Stakeholder Impact Cause and Effect Attribution. The socioeconomic characteristics and indexes lead to a better understanding of the cause and effect attribution leading to the RCA impacts, particularly related to stakeholder social impact. This algorithm allocates the general RCA Impacts documented in Examples 1 to 4 to specific groups of stakeholders. Equity, in particular, can be understood better and acted upon. 

SDG-related Indicators: The CI or LI math results from subalgorithms 13, 14, 15, or 16, are transferred to separate CI Indicators. As the MathResult section explains, this algorithm will relate these SDG measurements to stakeholder population and/or land use measurements stored in a 2nd TEXT dataset to define stakeholder SDG impact. Custom data derived from other sources can also be used. Example 6 will use this technique with data from the CTAP algorithms to support the SDRR system.

The reason that the natural capital stock Categorical Indexes, Air Pollution and Water Consumption, have been included as Indicators with the stakeholder CI and not the ecosystem CI, is that their units of measurement include Disability Adjusted Life Years (i.e. daly / kg PM25 and daly / m3). Examples 3, 3A, and 3B in the SPA2 reference confirm that many RCA measurements will be based on environmental and socioeconomic impacts that harm humans (1*). This example allocates those damages to particular stakeholder groups and land use areas for subsequent tradeoff analysis (i.e. do some groups experience higher impacts than others? why?). In this tutorial, analyzing equity is a critical part of understanding the term “stakeholder impact” (6*).

2nd TEXT file. Population and Land Use Impact Transition States or Adoption and Diffusion Paths: 
A 2nd TEXT dataset stores Categorical Indexes that measure a population and/or land use for the same CI stakeholder group and or land use area defined in the 1st TEXT dataset. CI Indicators are used to allocate the same SDG Indicators from the 1st dataset to the 2nd dataset’s specific population. Additional calculator Indicators (i.e. 2 to 15) can be used to model population and SDG “transition states” that change over time, often as a result of mitigation and adaptation activities. 

USHHS (2012) introduces the term “health states” in a similar manner to “impact transition states” and demonstrates how the health care sector carries out Performance Monitoring and Impact Evaluation, or, their preferred term, “Epidemiology”. The transition states are modeled on these “health states” which measure how populations move between states such as healthy, sick, seriously ill, and death (i.e. quality of life states measured as QALYs or DALYs). The Impact Transition states define how the population and their ecosystem move from a baseline, no mitigation action, state, to a final, fully sustainable, impact state. The health care sector has developed several algorithms to analyze and simulate these population transitions (i.e. Markov transition algorithms; the WHO, 2003, reference introduces advanced population models). 

In terms of agricultural sustainability, transition states often measure adoption and diffusion “states”, or paths, for new agricultural technologies and policies, such as conservation practices or carbon pricing schemes. IITA and COSA (2016) provide an example for coffee production that show how the poorest stakeholders have significantly lower adoption rates for sustainable practices, with related causal factors, than the richest. A stylized, but timely, example of a “technology adoption and diffusion path” can be found in the definition for Technology Development, Diffusion, and Adoption in SPA1, Appendix D, and below, in Section G, Quality of Life Scenarios. 

This example makes arbitrary allocations of the SDG measurements to each stakeholder population and land use area. In practice, better rules are needed for making the allocations. More sophisticated algorithms can be used to codify the rules.

Advanced SIA
Example 7 [may] demonstrate the use of algorithms to train machine learning algorithms to define likely technology diffusion and adoption paths. Once trained, the algorithms predict SDG and SDRR impacts on targeted stakeholders that arise from company activities and community service organization interventions. More advanced algorithms predict likely transition states and advise companies how to dynamically tweak their mitigation and adaptation actions to achieve SDG-related Impacts.

Dataset 2. Community Resource Conservation Value Accounting (5*)
Gertler et al (2016) use the following reference dataset, containing 19,800+ rows of population socioeconomic characteristics, to conduct an Impact Evaluation of a health insurance social intervention. Examples 7 and 8 employ the same dataset and explain more about the data.

The goal of this algorithm is not to conduct formal Impact Evaluations for capital projects and social interventions, as demonstrated by Gertler et al (2016). The broader goal is for companies and organizations to achieve sustainable development, as defined by SDG-related targets, by using causal attribution to selfishly identify and reduce risks that impact stakeholders, and to selflessly improve the well-being of targeted stakeholders.  Appendix A demonstrates how to use this dataset to conduct a formal Impact Evaluation and then adding those results to this algorithm’s datasets to more thoroughly understand organization and stakeholder long term impacts.
To support the broader SDG goal, Gertler’s reference dataset is aggregated into the following hierarchical TEXT datasets. The Indicators include averages and percentages, such as average poverty index and age, and percent of the population that is indigenous or that has dirt floors. The main purpose of this dataset is to illustrate “harmonizing” existing socioeconomic data, including Gertler’s partial SDG data, with the purposes of this algorithm. Given the transparent nature of all DevTreks data, these aggregated hierarchies also help to address the “misuse of highly personal, private, data” (Espey et al, 2017). A more typical dataset might address SDG-related risks and stakeholders at watershed and landscape level and would be carried out as part of an overall “watershed management plan” (see FAO, 2017).
Indicator 1. Benchmark or Baseline
1st TEXT dataset. Demographics and SDG Indicators.

2nd TEXT dataset. Populations and Allocated SDG Indicators. 
The following dataset was first run unsuccessfully.

The following dataset was then run successfully. TEXT csv datasets need careful attention. Good practice is to always run searches for commas in the TEXT files prior to saving them and to inspect the final TEXT dataset in a text editor, such as Notepad, to check for anomalies.

The following features of these datasets highlight how local communities use this algorithm for RCA reporting. 

Total Risk Indexes (TRs). This example only uses the TR Indexes to aggregate the children LIs. Other datasets may choose to use the TRs to further subdivide the data.
Locational Indexes (LIs). A total of 200 separate villages are included in the full dataset. The treatment_locality variable is used to aggregate the data into the following 2 LIs.
1. Treatment Villages (SA): the SDG intervention causing impacts
2. No Treatment Villages (SB): the counterfactual used to verify impact
For convenience, the actual dataset uses the same data for both the Treatment and No Treatment Villages. 
Categorical Indexes (CIs). The poverty index variable is used to aggregate the socioeconomic characteristics of the full population. The Poverty Index was chosen because of its importance in the SDG. As discussed, the actual purpose of the Gertler dataset was to evaluate whether or not the intervention reduced household health care expenditures. Whether these particular Poverty Indexes can really support machine learning algorithms is explored in Example 7. 
1. Stakeholder Group 1: Poverty Index range: <= 50, high poverty
2. Stakeholder Group 2: Poverty Index range: 51 to 70: medium poverty
3. Stakeholder Group 3: Poverty Index range: >= 71: low poverty
Indicator 1 Indicators. The “round”, or survey round, variable is used to document the starting and actual, allocated, SDG impacts for the specific stakeholder populations. For this dataset, the target allocation has to be surmised (i.e. from the SDG targets). 
Indicator 2 to 15 Indicators and Performance Monitoring. Additional transition states define how the populations’ socioeconomic characteristics and their SDG risks and impacts will change over time. The Resource Stock and M&E calculators support up to 15 Indicators, each of which can define a transition state.

These additional Indicators can be defined using the following 2 techniques.

Categorical Index-based Data: Each CI holds updated socioeconomic characteristics for these transition state Indicators. Supports thorough SIAs.

Indicator-based Data: Each CI holds the same socioeconomic characteristics for these transition state Indicators. Supports abbreviated SIAs.

The 2nd TEXT file’s population measurements show that the SDG measurements have been allocated equally to each stakeholder group, even though the number of people in each group varies considerably. In practice, better rules are needed for making the allocations. 
With the exception of health care expenditures, Gertler’s socioeconomic characteristics did not change much after the intervention took place. That allows the SDG “Actual” measurements, or “round 1” rows, from the intervention to be realistically included in a 1 Indicator dataset. When the socioeconomic characteristics, such as Poverty Index, change significantly, additional Indicators should be used to document the changes, or transition states. In the latter case, each of the 15 calculator Indicators then serve as short term Performance Monitoring instruments, while the entire group of Indicators, as summarized in the Score, serve as long term, audit-like, Impact Evaluation instruments.
Scores and Impact Evaluation. The Score can be used to hold the same 2 TEXT datasets used by each Indicator, except Score datasets hold data for the full period being evaluated and serve a summary, long term, Impact Evaluation role (i.e. from benchmark to final impact state). Example 1 explained that full Impact Evaluation requires using counterfactual evidence to measure social performance. Example 7 demonstrates these fuller Impact Evaluations.

The aggregation of this type of survey and census data into these hierarchies needs careful thought. For example, if SDG target accomplishment data is needed for each village, Example 7 shows that the LIs can be broken down into the 200 individual villages, and 2 TRs can be used to define the Treatment and No-Treatment villages. Caution must be exercised because the resultant village data could lead to the “misuse of highly personal, private, data” (Espey et al, 2017). In addition, the ultimate destination for this data is likely to be the machine learning algorithms –until those algorithms are fully developed, the best use of this data can’t be determined.
F. Mitigation and Adaptation Actions
The following images (FAO, 2017 and IFC, 2015) demonstrate translating Section E’s risks (or problem areas) into management actions that companies use to mitigate and adapt to risks that reduce “shared value” or that generate negative SDG materiality impacts. One of the main goals of the algorithms in this tutorial is to use the Performance Monitoring and Impact Evaluation techniques to identify sustainable business activities that have been customized to work for specific companies and organizations.


G. Quality of Life, or Sustainability, Scenarios

The TFCD (2017) discuss, in detail, the important role that Scenario Analysis plays in helping companies [and local communities] to understand the risks they face from transitioning to higher levels of sustainability. These risks come from highly uncertain, but increasingly likely, SDG target-related materiality impacts, such as the transition risks associated with climate change [and gender equality].

The following image (IFC, 2015) use internal company management system development to show how companies themselves transition between adoption states. Although the IFC case study companies are not small-scale coffee farms, nor small and medium sized enterprises, IFC makes clear that the whole supply chain is accountable for sustainability, including SMEs, producer organizations, cooperatives, regulatory agencies, industry support professionals, input sellers, output buyers, and advisory service companies. Example 3 in SPA2 identified the need to educate consumers about sustainability in ways that result in behavioral changes. Specifically, so that they’ll prefer buying sustainable products and services.

TFCD (2017) use the following Adoption and Diffusion Path to illustrate how industries can use sustainability reporting systems to accomplish SDG targets, such as those related to climate change [and gender equality].


The following Scenario Analysis is used by companies and local communities to more fully understand the transition risks they face from changes needed in their business activities and social interventions to accomplish the SDG targets.

General Scenario
Threatened Quality of Life: High GHG result in 1.5ºC temperature increase with higher incidence of biodiversity loss, droughts, severe heat waves, crop and livestock production risks, air pollution, floods, migration, and social discord that leads to the incapacity to achieve the SDG goals (7*). 
Targeted Social Performance Risks and Targets: Combinations of the 17 SDG Goals such as climate change and gender equality
Mitigation and Adaptation Actions: Portfolio 1 consists of a) …, b)…, and c)…. 

Sustainability Scenarios. 
In terms of SDG goal accomplishment, logical sustainability scenarios relate directly to a company’s or local community’s transition to higher levels of sustainability, as illustrated by the following scenarios:
A. Transition State _A: baseline (i.e. current GHG emissions and lack of full gender equality)
B. Transition State _B: low sustainability
C. Transition State _C: medium sustainability
D. Transition State _D: high sustainability (i.e. fully sustainable GHG emissions and full gender equality)

Simple data conventions, including the use of sibling base elements and Example 1’s “_xx” labelling convention, can be used to model these more comprehensive transition scenarios.
Section J, Decisions, demonstrates how to use the Reference Case Cost Effectiveness Results introduced in Example 4B to assist making decisions about the degree of sustainability, or stage of sustainability, appropriate for specific companies and local communities. 

Example 7 may demonstrate how machine learning algorithms anticipate transition states and alternative scenarios, given current circumstances, and tailor sustainability recommendations to specific companies and communities, based on their assessment of the company and society’s most probable sustainable development paths.

H. Social Performance Score

The following algorithm properties define the socioeconomic characteristics and an “impact transition state” for the stakeholder groups identified in these datasets. 

Indicator 1. Benchmark or Baseline
The following images of Indicator 1 displays the “start-target-actual” properties explained more thoroughly in this section that produces the benchmark (i.e. start), target, and actual scores. These Indicator and Score metadata displayed in this image, and in reports, are calculated as the sum of the TR Indexes for all locations. They are calculated as the sum of the TR’s children normalized and weighted LIs. This dataset uses 2 locations for testing purposes. These scores will be used with the “standard sustainability scorecards” that will be explained in Section I, Decisions.


1st TEXT dataset. Demographics and SDG Indicators

Stakeholder Characteristics Titles. Rows used to describe socioeconomic variables must include a 0 in the 2nd data column, the location, property.

Total Risk and Locational Indexes. The final 11 columns of data for TRs and LIs are not used to change calculations. 

* factor1 to factor11: not used – the SDG CI Indicators use normalized data

Categorical Indexes. The following list defines how the final 11 columns of data for Categorical Indexes are used. 

* factor1 to factor11: socioeconomic characteristic measurements for separate stakeholder groups

SDG-related Indicators. The following list defines the final 11 columns of data that are taken from the CI or LI normalized impacts from subalgos 13, 14, 15, or 16 or custom data. Example 6 demonstrates using the results from the CTAP algorithms.
* factor1: Most likely quantity of associated CI. 
* factor2: Hotspots unit of measurement for most likely quantity CI.
* factor3: Low quantity of associated CI. 
* factor4: High quantity of associated CI. 
* factor5: Unit of measurement for low and high quantity CI (i.e. upper and lower 80% CI). 
* factor6: certainty1 of associated CI (i.e. likelihood of CI risk)
* factor7: certainty2 of associated CI (i.e. severity of CI risk) 
* factor8: distribution type (if PRA was used with associated CI) 
* factor9: Hotspots production process (i.e. farm operation label)
* factor10: Hotspots life cycle stage (i.e. production) or Results Chain stage (i.e. outcome)
* factor11: date of measurement

Example 3, 3A, and 3B, explain the importance of the Hotspots-related factors. This algorithm is documenting which company. CSO, and local community activities and “stages”, caused the SDG allocations in the 2nd dataset to change. Those activities include mitigation and adaptation activities and work processes that companies adopt to achieve the SDG-related targets. Although this algorithm, by itself, can’t determine cause and effect attribution, when multiple datasets are added to machine learning algorithms, attribution may be found. Example 7 [may] demonstrate how.

2nd TEXT Dataset: Populations and SDG Allocations. This TEXT file is added as the 2nd URL in the Indicator.URL property (using a semicolon delimiter). Additional algorithms may use different properties and Indexes in these datasets to conduct other types of population simulations (i.e. Markov simulations).

Locational Index Measurement. The following properties show that only the last 2 columns of data are used to run calculations. These 2 columns will normalize and sum all of the calculated LIs within each TR Index. This is the same normalization and weighting as used in subalgorithm15.
* factor1 to factor9: none
* factor10: normalization type
* factor11: weight

Categorical Index Population Measurement. These properties support the basic probability density functions demonstrated in Example 2 and throughout the CTA and CTAP tutorials. The data must be put in the exact same CI row index as its related SDG data (i.e. by copying the 1st TEXT and then changing the CIs).
* factor1: start date
* factor2: end date
* factor3. QT population or land use count, or most likely estimate (i.e. QTM)
* factor4: QT unit
* factor5: QTD1 PRA shape (i.e. mean), or lower estimate (i.e. QTL) 
* factor6. QTD1 unit 
* factor7: QTD2 PRA scale (i.e. standard deviation), or higher estimate (i.e. QTU) 
* factor8: QTD2 unit
* factor9: PRA distribution type (i.e. normal) ; set to “none” if PRA is not being conducted
* factor10: none
* factor11: none

Indicator SDG and Population M&E. These properties define how the SDG measurements from the 1st TEXT dataset and the parent CI’s population are expected to change between the CI’s start date and the end date. Indicators that are initially measured using Threshold systems may find it easier to make these allocations based on the 0-100% SAFA Thresholds introduced in Example 1. The data must be put in the exact same Indicator row index as its related SDG dataset.
* factor1: start SDG allocation multiplier; a multiplier, expressed as a percentage, for allocating the SDG Indicator measurement to this population; acts as an initial 100% measurement for factor2 and factor3.
* factor2. end SDG target multiplier; a multiplier, expressed as a percentage, for defining a target SDG Indicator measurement for the end date; 
* factor3: end SDG actual multiplier; a multiplier, expressed as a percentage, for defining the actual SDG Indicator measurement at the end date; the target multiplier can be entered as a placeholder in the initial data.
* factor4: start Population/land use allocation multiplier; a multiplier, expressed as a percentage, for allocating the CI population measurement to this SDG Indicator; acts as an initial 100% measurement for factor5. 
* factor5: end Population/land use actual multiplier; a multiplier for defining the actual Population measurement at the end date; an “expected” multiplier can be entered as the initial data. 
* factor6. certainty1; severity, and probable consequence, of this SDG risk on this population/land use as of end date
* factor7: certainty2; likelihood of this SDG risk on this population/land use as of end date
* factor8: certainty3; probable impact of this population/land use on company/CSO/local community from this SDG risk as of end date
* factor9: end Cost per Unit SDG Quantity taken from a CEA conducted using the techniques introduced in Examples 4A, 4B, and 4C. Section I, Decisions, show that this property supports the Reference Case Cost Effectiveness Results introduced in those examples.
* factor10: normalization type
* factor11: weight

Factor 8, certainty3, documents the degree of responsibility by the local community, CSO, or company for the allocated SDG risk. If the risk is caused by internal activities, the degree of responsibility is very high. If the risk is external to company/CSO/local community activities, the degree of responsibility is low, provided the company does not have indirect responsibility for the risk. As explained in Section I, Decisions, objective, evidence-based, 3rd party, standards organizations periodically carry out Impact Evaluations to verify the accuracy of these accounting metrics and to complete their own scorecards.

Indicators 2 to 15. Population Impact Transition States

Exactly the same properties are filled out to define additional population and land use impact transition states. This algorithm supports all 15 Indicators. These “states” can be defined flexibly. For example, they can be used in the same manner as the Results Chains, or impact pathways, used throughout this tutorial. As introduced in Example 1, each Indicator models the elements of the chain (i.e. Indicator 1 = Inputs-> Indicator 2 = Activities-> Indicator 3 = Outputs -> Indicator 4 = Outcomes -> Score = Impacts). Alternatively, instead of using the 1st TEXT dataset’s factor10 property to define a life cycle stage, that property can be used to define a Results Chain stage. 

Business Resource Conservation Value Accounting MathResults. The following table displays the results for the company SIA.


Indicator Math. The following 8 columns of Indicator data are added in the MathResult. These measurements are summed, normalized, and weighted, into their parent CI, LI, and TR Indexes.
Existing Properties (updated population allocations)
* Population Start Count = (popStartCount * ((popStartAllocation / 100)
* Population End Count = (popStartCount * ((popStartAllocation / 100) * (popEndAllocation / 100))
New Properties (added as new data columns)
* qtmost: most likely SDG quantity per total population (not per population member because the calculations are difficult to interpret). The mathematical formula used in this calculation is:
sdgPerPopulation = (sdgQuantity * (sdgStartAllocation / 100) * (sdgEndActualAllocation / 100)) 
* percenttarget: (qtmost / (sdgQuantity * (sdgStartAllocation / 100) * (sdgEndTargetAllocation / 100))) * 100
* qtlow: low estimate of likely SDG quantity per total population 
* qthigh: high estimate of likely SDG quantity per total population
* certainty1: factor6 averages
* certainty2: factor7 averages
* certainty3: factor8 average
* totalcost: total cost of most likely SDG quantity (qtmost) (only found in the local community dataset)
totalSDGCost = factor9 (Cost per Unit SDG) * qtmost (most likely quantity of SDG prior to normalization and weighting)

The final 3 columns display the following properties from the 1st TEXT dataset.
* sdgunit (Elementary Flow): Hotspots unit of measurement from the 1st TEXT SDG Indicator.
* productionprocess: Hotspots production process from the 1st TEXT SDG Indicator (i.e. in terms of results chains, company Operation, Component, or Outcome)
* lifecyclestage: Hotspots life cycle stage (i.e. if using Results Chains, can be Inputs,  Operations, Components, Outputs, Outcomes, or Impacts) 

Categorical Index Math. The first 11 columns display the results of the PRA population calculations. The final 7 columns are calculated by normalizing and weighting all of the Indicators in each parent Locational Index and then summing the normalized and weighted Indicators into their respective CIs, as displayed in the following list. 
* qtmost: sum of CIs
* percenttarget: average of CIs
* qtlow: sum of CIs
* qthigh: sum of CIs 
* certainty1: factor6 averages
* certainty2: factor7 averages
* certainty3: factor8 averages
* totalcost: sum of total costs

Locational Index Math. Locational Indexes are calculated as the sum of the normalized and weighted Indicators contained in all of the LI’s children CIs.

TR Index Math. The TR Indexes are calculated as the sum of the normalized and weighted children LIs, and display the same properties as the LIs. The LIs can be normalized and weighted separately from their children Indicators. Section I, Decisions, demonstrates displaying these Scores in uniform, “SDG-ASB Scorecards”.

Indicator Metadata Math. The displayed Indicator properties (i.e. Q1 to Q5, QTMost, QTLow, QTHigh) are summations of all of the final TR Index rows of data, across all locations.

Local Community and CSO Resource Conservation Value Accounting MathResults. The following image displays the results for the local community’s SIA. For testing purposes, the datasets were added to Indicator 15.


Score. The following Score properties show that the Score employs the same 2 TEXT datasets as Indicators, but support longer term Impact Evaluations. Although Scores only partially display the “start-target-actual” properties returned in Indicators, the raw MathResults contain all of that data for reporting. For testing purposes, Indicator 1’s datasets were reused in this Score. Resource Stock and M&E Analyzers, as explained in their respective tutorials, can be used to further analyze all of the “actual” scores for both the Score and the Indicators (i.e. Totals, Statistics, Change Bys, and Progress, Analyzers).
 
 
I. Decisions

Vanclay et al (2015) use the following statement to reinforce SIA’s role in reducing corporate risk. The IFC (2013) presents concrete examples of using similar techniques for reducing Environmental and Social, or SDG (i.e. E&S), risks in agricultural supply chains (8*).

 “A key point underpinning […} this [reference] is that rather than seeing SIA as being a cost to business, SIA should be seen as an appropriate, useful management process that reduces risk and brings benefits to companies and to communities – in other words, that operationalises the concept of shared value. There is thus a rock-solid, strong business case for [companies and organizations] to do effective social impact assessment and management.”

The following image (Vanclay et al, 2015) illustrates how to use this algorithm’s SDG Indicator certainty properties, factor6 and factor7, to prioritize and manage business risks and shared value. Those factors derive largely on the 2 certainty factors included in the 1st TEXT dataset. The third certainty factor, factor8, measures the degree of responsibility for the SDG risk by the company. CSO, or local community and can be reported on a 3rd plane, such as a 3D Column graph.

The following image (IFC, 2013) summarizes a concrete example of assessing SIA risks in agricultural supply chains. DevTreks hierarchical base elements, including Data Services, support the IFC recommendation to use “supplier databases” to document these types of risk ratings. The aggregation of SDG-related risks is especially important for watershed level planning, or for applying “integrated and cross sectoral approaches” to regional resource conservation planning.

Full SDG Monitoring and Evaluation Report: The following table illustrates how to use this algorithm’s primary decision support variables, including the 3 risk factors, to develop a full M&E Report for decision making purposes. In this report, Indicators 1 to 15 serve as Performance Monitoring instruments (i.e. Indicator 1 = 2018, 2028 = Indicator 6), and the Score acts as a summary Impact Evaluation of total SDG accomplishment. In practice, this image’s SDG Indexes consist of combinations of the 17 SDG Goals (i.e. poverty, climate change, and gender equality). RCA Assessors work at industry, company, and community scale to develop combinations of the SDG Targets relevant to impacted stakeholders, along with recommended targets and completion percentages for each Monitoring period.



Reference Case Cost Effectiveness Results Report. Example 4B used a Reference Case Cost Effectiveness Results to illustrate analyzing the cost effectiveness of alternative sustainability scenarios. That example also completed separate “value perspectives” that supported tradeoff analysis for the value systems held by industry, society, and community, stakeholders. These perspectives support equitable decisions and reduce conflict among impacted stakeholders. The following image illustrates using this algorithm’s decision support data to complete similar Reference Cases, or basic value assessments, for impacted stakeholders (i.e. industry, community, society). The following Reference Case Cost Effectiveness Results is comparing the incremental costs and effectiveness of transitioning to higher states of sustainability, or “impact transition states”, for each value perspective.


Hotspots Analysis Report. Example 3 demonstrated how Hotspots Analysis can be used to “identify and prioritise potential actions around the most significant economic, environmental and social sustainability impacts or benefits associated with a specific country, city, industry sector, organization, product portfolio, product category or individual product or service”. The following table comes directly from this algorithm’s MathResults. The 3 Hotspots elements in the last 3 columns (i.e. SDG Unit, Production Process, and Life Cycle Stage) readily support ranking mitigation and adaptation actions and sustainability stages, or Production Processes and Life Cycle stages, according to their degree of SDG Target Accomplishment and “fairness” for each stakeholder group.

The previous 3 images imply that fully sustainable impact transition states, or technology and diffusion paths, can be defined with a level of accuracy needed to establish these initial targets. In practice, that’s unlikely. RCA Assessors can establish the initial targets, but the most appropriate technology for fully accomplishing the targets needs experimentation. In general, with the exception of “low hanging fruit”, or proven sustainable business activities, the business practices causing the SDG scores have to be discovered and then customized for each business and community via algorithms like those introduced in this reference. The purpose of this SDG M&E accounting system is to provide the feedback needed by managers and advisors to identify these practices and then adjust their activities to achieve the necessary targets.
SDG Accounting Standards Board Scorecards (SDG-ASB) (10*)

The institutional improvements recommended throughout this tutorial include the independent, evidence-based, 3rd party, verification of the content in these reports. SPA2 demonstrated the role of standards setting groups, such as the ISEAL Alliance, in setting standards for CSOs who administer commodity and producer organization standards for their members. TFCD (2017) explains the importance of financial accounting standards, such as those enforced by the International Accounting Standards Board, in ensuring the integrity of private sector financial reporting. These reports require similar enforcement. 

CSO certification standards setters (EMAS, SASB, GSSB, B-Lab, FairTrade, Rainforest Alliance, …), make logical enforcers for their members. In addition, many local communities are adopting a variety of accounting techniques to measure sustainability. For example, many U.S. states now require standard energy efficiency reports for all home sales. These types of reports are completed by 3rd party professionals (i.e. sustainability officers or RCA Assessors) who specialize in sustainability reporting. Local community sustainability offices make logical enforcers for their stakeholders.

The importance and international reality of the SDG, justifies a separate “SDG Accounting Standards Board” (SDG-ASB) to verify and “harmonize” the disparate standard setting CSOs and their reporting requirements. The UN references suggest such an organization may already exist (i.e. refer to the United Nations Forum on Sustainability Standards (UNFSS) reference in SPA1). The following images illustrate several ways to use this algorithm’s metrics to develop uniform Scorecards used by the SDG-ADB, standards setting CSOs, and local community sustainability offices, to evaluate sustainability target accomplishment. One advantage to these uniform reports is that they eliminate the need for hundreds of disparate reports –all of which ultimately have the same purpose. Importantly, their uniform “data standards” support more advanced algorithms, such as machine learning algorithms, which can facilitate SDG Target accomplishment at industry and landscape scale.

1. Scored Full M&E SDG Report. The following image shows that Scores have been added to the full M&E SDG Report generated from this algorithm’s results. 

Scored 1 Page M&E SDG Report. The following image demonstrates using the Full SDG M&E Report directly as a 1 page Scorecard. This instrument recognizes the nature and size of many SMEs do not justify full sustainability accounting, such as the use of sustainability algorithms. These annual, 1 page, “running record”, sustainability reports, are completed by 3rd party sustainability officers, many of whom work for CSOs supported through local, fee-based, sustainability programs. 

The use of indexes, such as this image’s SDG Target Indexes, for sustainability reporting is also discussed throughout this tutorial. For example, Example 2 demonstrates how to generate the Indexes using simple MCDA techniques that can employ Most Likely-Low-High Estimates for the Index’s Indicators.


2. Scored Stakeholder Reference Case CEA. The following image shows that Scores have been added to the full Reference Case Cost Effectiveness Results displayed above. Equity is the primary concern of the SDG-ADB. They use the company. CSO, and local community rating matrixes to independently evaluate stakeholder equity, which can be assessed from these types of reports. 

Appendix 1 to Example 4B discusses the important role that QALYs, DALYs, and QASYs can play in these scorecards (i.e. the 4th column holds QASYs rather than SDGs). The overall goal of any sustainability system is to increase human quality of life. Measuring all SDG accomplishment according to that yardstick may be the best metric for any of these accounting systems because they derive from stakeholder preferences. Example 4B confirms that the health care sector in many countries is already using that yardstick in these CEA instruments.

3. Scored Scenario Analysis. The following image from SPA2 shows that Scores have been added to the full Reference Case Cost Effectiveness Results displayed for alternative scenarios.


4. Scored Hotspots Analysis. The following image shows that Scores have been added to rank the effectiveness of coping and resilience work processes and population epidemiology stages in achieving SDG Targets. Example 6 explains that coping capacity and resiliency work processes, and population transition states, make appropriate replacements for Production Processes and Life Cycle Stages in many types of Hotspots-related sustainability accounting.


The SPA2 reference documented that many industries are already developing sustainability standards for their member companies. It’s not farfetched for an organization such as the SDG-ASB to provide incentives to these industry groups to maintain “Reference Case Scorecards” that document the most probable, cost effective, equitable, practices needed to achieve specific sustainability targets in those industries. 

In terms of tackling landscape-wide SDG impacts, many local communities are already mandating the use of “sustainability reporters” for a variety of purposes. These impacted communities and their newly hired, or fee-supported, reporters may welcome having these types of uniform scorecards to carry out their reporting requirements of actual SDG target accomplishment. 

Impacted stakeholders, such as consumers, may also welcome having uniform scorecards so that they can make “informed” purchases, support “informed” stores, judge how well their public officials are delivering public services and cast resultant “informed” votes, and lead meaningful lives. The latter likelihood reinforces the role that objective, evidence-based, science must play in these scorecards.

These independent scorecards also deal with the “gaming the financial accounting system” employed by companies and accounting firms to indefinitely postpone financial disclosures while they conduct poorly defined, opaque, audits. Rather than wait for the audit results, impacted stakeholders conduct their own Social Performance Assessments and communicate the results directly to other “informed” stakeholders.

J. Communication 

The following list summarizes the full storyline that companies and local communities communicate to stakeholders using this algorithm’s approach. 
 
* CSOs work at industry and watershed scale to assist businesses and other CSOs achieve more effective SDG-related activities and to ensure that independent business mitigation and adaptation efforts are coordinated in a way that leads to sector and landscape level SDG-related results. They maintain the “Scorecards” introduced in Section I, Decisions, which lay out industry-wide and community-wide best sustainable practices and targets.
* Business managers and CSO executives employ impact pathways and theories of change to identify and prioritize the primary SDG risks associated with their activities and work processes and to assess the impacts of those activities on targeted stakeholders.
* Managers take mitigation and adaptation actions based on the SDG-related risks and impacts they identify as relevant to company. CSO, and local community stakeholders. 
* Formal Performance Monitoring data systems allow companies and CSOs to account for periodic, short term, SDG-related accomplishments. These instruments are submitted to the sustainable business accounting system to establish “cause and effect” attribution.
* Formal Impact Evaluation data systems allow companies and CSOs account for periodic, SDG-related, long term strategy and to conduct external audits. These instruments are submitted to the sustainable business accounting system to verify the accuracy of the initial “cause and effect” conclusions.
* Full theories of change, impact pathways, and results chains, help companies and CSOs to fully communicate periodic SDG-related accomplishments to relevant stakeholders and to make adjustments to their mitigation and adaptation activities to achieve more efficient and effective results.
* The accounting metrics not only assess company. CSO, and local community performance, they also help stakeholders take ownership of their “own impacts”. That can mean completing their own SIAs, proposing their own mitigation and adaptation actions, monitoring and evaluating their own performance, and using their own “perspective” to report their SDG accomplishments.

Section H, Social Performance Score, demonstrated basing SIA on final SDG Impacts. Examples 1 to 4 demonstrated that Impacts are the final part of the full impact pathways, or Results Chains. The following summary Results Chain use both Example 1 to 4’s results (or direct DevTreks Operating and Capital Budgets) and this algorithm’s results, to communicate final results to decision makers.  The report also supports communicating this storyline to impacted stakeholders who hold specific “value systems”, or Example 4B-like “perspectives”.

Business RCA report (8*)
The following report answers the evaluation question 

To what extent has the [company] [organization] [community:] been effective and efficient in achieving SDG-related targets that are relevant to our stakeholders?

Example 7 [may] demonstrate the use of machine learning algorithms to codify this report. 


The following image demonstrates using a calculator’s Media View to communicate these reports and scorecards to stakeholders.


One of the most comprehensive examples of communicating SDG target accomplishment relates to the SDG Index and Dashboards Report project summarized in the following image (Bertelsmann Stiftung and SDSN, 2018). Example 9 in the SDG Plan reference, demonstrates using a similar approach for assessing the feasibility of mitigation and adaptation technologies for local communities and local businesses within watersheds or landscapes.




K. Performance Monitoring and Impact Evaluation (M&E)
The following image (IFC, 2014) illustrates how the Indicators used in this tutorial’s RCA instruments relate to company monitoring and reporting systems. Unlike the “benchmark-target-actual” M&E approach introduced in Example 1, the “start-target-actual” Indicator properties used in this algorithm support direct M&E (i.e. without the “_xx” suffixes in the labels).
IFC (2015) uses the term “audits” in a similar manner to the Impact Evaluations introduced in SPA2 and demonstrated in Example 7. 


 This reference endorses engaging impacted stakeholder groups to use their own “perspective” to carry out SIA and M&E. These types of “participatory M&E systems”, or “Stakeholder Perspectives”, must be accompanied by an independent “Societal Perspective” that doesn’t lose sight of the overall public service, SDG, objectives. 

Although the purpose of this example is not to demonstrate full Impact Evaluation, as explained by Gertler et al (2016), the local community dataset employs 2 LIs to document impact results for Treatment and No Treatment villages. These 2 datasets allow basic, but statistically invalid, comparison between the intervention (Treatment villages) and counterfactual (No Treatment villages) used to support full Impact Evaluations. Example 6A demonstrates how to use these datasets to reach initial conclusions about the effectiveness of an intervention or company activity. Example 7 shows how to conduct full, statistically valid, Impact Evaluations.

L. Additional SDG M&E Tools

The following image (IFC, 2013) introduces additional M&E-related tools that companies can use in agricultural supply chains to further accomplish the SDGs. This tutorial introduces tools that support many of these mechanisms. For example, Example 3’s Score Hot Spots analysis can be used as a “supplier score card”. Example 7’s Impact Evaluations can be used in the same manner as “suppler audits”.



RCA-style instruments can digitize many of these tools and make the consequences of local community and company activity fully transparent. Regular financial disclosures that include SDG accomplishments, disclosures completed by 3rd party RCA Assessors, and new types of SIA-related techniques, such as this reference’s algorithms, SDG-ASB Scorecards, Social Licenses to Operate, social contract riders, and social media campaigns, will make SDG accomplishment much more transparent. Stakeholders, investors, asset managers, supply chain participants, consumers, employees, and impacted communities, can more easily assess the degree of SDG target accomplishment, or “social soundness”, achieved by specific communities, companies, and their executives. Companies and communities can anticipate a large proportion of stakeholders, or more accurately, informed stakeholders, to act accordingly (11*).

Footnotes

1. As mentioned or implied throughout this tutorial, given that the conventional institutions in some countries appear incapable of understanding, let alone ameliorating, these conditions, networks and clubs may need to take independent action.
2. Espey et al (2017) also mention the need for “vetting [of NGO data products] by the international community”. That’s why DevTreks’ source is open and the tutorials plentiful. It’s not necessary for DevTreks itself to be proven right with this approach, nor is it absolutely essential to get “international vetting” (although, by all means, vet), –it’s necessary to teach the next generation about alternative approaches that might help them to be right. As stated in the SPA2 reference, reinforced in Footnote 10, and double-downed in Example 6’s footnotes, too much is at stake to keep repeating the same mistakes over and over again. 
3. The SDGs are best viewed as an “integrated and multi-sectoral approach” that companies and communities can use for achieving sustainability goals. For example, although the SDG Indicators use terms associated with country populations or broad ecosystems, many of those terms can be adjusted for use by private sector companies and community service organizations. Examples include, “16.b.1. Proportion of population [employees] reporting having personally felt discriminated against or harassed in the previous 12 months …”, “12.5.1 National [Company] recycling rate, tons of material recycled”, and “9.4.1 CO2 emission per unit of value added [product gross revenue]”. Most progressive sustainable business indicator systems will make, or have made, similar adjustments. 
4. The Hoebink et al (2014) reference highlights the challenges of using socioeconomic characteristics to explain cause and effect attribution related to agricultural sustainability. The reference runs through a gamut of feasible socioeconomic explanations –from on-farm income, producer organization support, gender roles, to informal institution development. This tutorial’s consistent advice that networks need to work with experts and information technologists to fine tune these algorithms, is no more specious than researchers continual request to sponsor more research. This tutorial is authored by a technologist who believes applied IT offers a better chance of “ameliorating” than the approaches taken by conventional institutions, including many academic research ones. If you have evidence proving otherwise, take these recommendations with a grain of salt (i.e. but for some reason, the word “futile” comes to mind –where are the reference datasets and source code?).
5. SMEs should recognize that the RCA instruments used throughout this tutorial and the CTAP tutorial employ TEXT csv datasets –even companies that don’t have direct access to computers can probably find a way to complete these instruments. The TEXT files can be submitted to the sustainable business data system when access to an Internet connection is available. Even if the international community develops “international standards to ensure data integrity” (Espey et al, 2017), the simplicity of the TEXT files increases the likelihood of compliance with the new standards. For example, the following image demonstrates harmonizing IFC (2015) Company Risks, related to natural resources, with the SDG Indicators (2017). SPA2 and Footnote 3 demonstrate how to further harmonize disparate sustainability standards.

6. Example 3 in SPA2 explains that the current UNSETAC LCIA characterization factors (cfs) are most useful at country or regional scale. DevTreks assumes that science and technology will continue to evolve to make local cfs available as well. As stated in SPA2, the take home message is that people need to be getting their hands dirty and gaining experience building these algorithms and applying their tools so that they know when and how to use them properly. And as this tutorial makes clear, science and technology’s end goal is evidence-based societal wealth via sound IT, not marketing-based technologist or company wealth via sound bites. While it may be easy for an IT company to attract investment for this reference’s algorithms, it may be impossible for most of them to supply scientific evidence of the required societal end goals (i.e. stock market returns are not societal end goals).
7. Version 2.1.8 changed the targeted increase in global temperatures from 2ºC to 1.5ºC because of recent IPCC (2018) recommendations. IPCC documents how the higher target poses too many costly risks to too many local stakeholders.
8. The IFC 2013 reference dealing with agricultural supply chains includes a case study of Malaysian palm oil processing. The authors acknowledge “There are serious concerns regarding the environmental sustainability of palm oil production, particularly in Southeast Asia” and mention how the company addresses those concerns. That doesn’t seem to have stopped the biodiversity loss and reduction in carbon sequestration. DevTreks emphasizes the importance of institutional improvement because that’s where these broader issues have to be addressed. Specifically, real private sector SDG materiality impact accountability only works when SDG-related institutions are developed that can independently monitor and evaluate the accounting metrics and then have the power to influence private sector behavior. SPA2 confirms that product certification standards organizations, with some international scrutiny, have the potential to fulfill this role and those standards systems are expanding. This example explains that many local communities are expanding their sustainability accounting requirements and devising new incentives to pay for improved sustainability reporting. These local sustainability efforts also have the potential to fulfill this role. “Consequential digital activism” that helps CSOs, companies, and local communities, to further achieve the SDG has the potential to be a powerful institutional improvement.
9. The author has assisted U.S. Resource Conservation Districts in Southern California to report on their annual accomplishments. He employed abbreviated Theories of Change, or impact pathways, to structure those annual reports. Similar to the IEG evaluation (2017) of pollution reduction activities, RCA Districts face the annual evaluation question “To what extent has the RCA District been relevant, effective, and efficient in addressing SDG-related natural resources concerns for our stakeholders?”. Unlike the qualitative World Bank Results Chain, the Districts preferred quantified numbers in the reports, especially related to money received as Inputs, number of Activities carried out with the money, and quantified natural resource and socioeconomic Impacts for each Activity. The original motivation for this reference came from the desire to institutionalize, by automating, the annual RCD [RCA] reports.
10. Many of the references used in this tutorial emphasize the importance of improving “existing [or conventional] institutions”. That poses a quandary to a technologist/economist who has worked for and with many of those institutions. Personal experience suggests that many of these institutions are ill equipped to make innovative changes and even less equipped to handle the technological changes required by this tutorial. Even well intentioned institutions can’t succeed when their leaders simply don’t understand public service (or this reference). Many are stuck carrying out statutory, contract-related, special interest group-influenced, or peer-siloed, work. In many cases, they’re good at spending money, accommodating special interests, or pandering to peers, because the special interests and peers like the money or the rigged rules, but not accounting for how well the money is spent. Consequential digital activism may require more of a start-up, moral accountability, culture and less of an existing, rote bureaucratic, special interest-infested or peer-siloed, institution. FAO (2017) provides practical guidance for achieving institutional improvement at national scale. The “next generation” needs to experiment and find the exact nature of the needed institutional improvements. They better hurry up (i.e. by reading Footnote 11).
11. After completing the first draft of the SDG-ASB Scorecards, the author felt that companies and communities might find the idea of “universal scorecards” far-fetched or hubristic. Until he read of a popular web application used throughout the U.S.A. that allows employees to rate their companies for how well the company contributes to their “personal quality of life”. “Informed stakeholders” should recognize that rating companies and communities for their contribution to “societal quality of life” may be inevitable, if more complicated and consequential. This reference recommends basing the ratings on “international SDG reporting norms”, “evidence-based” science, and concrete, open-source, IT that employs “next generation”, M&E-based machine learning algorithms. Refer to the Version 2.1.8 SDG Plan reference for additional guidance.

Case Study References

Bertelsmann Stiftung and the Sustainable Development Solutions Network (SDSN). SDG Index and Dashboards Report 2018. Implementing the Goals. Global Responsibilities. 2018
Espey, Jessica et al. Counting on the World. Building Modern Data Systems for Sustainable Development. Sustainable Development Solutions Network. Thematic Research Network on Data and Statistics. 2017
FAO. 2017. Watershed management in action – lessons learned from FAO field projects. Rome. 
Gertler, Paul J., Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel M. J. Vermeersch. 2016. Impact Evaluation in Practice, second edition. Washington, DC: Inter-American Development Bank and World Bank. doi:10.1596/978-1-4648-0779-4. License: Creative Commons Attribution CC BY 3.0 IGO
Marc Gordon, United Nations Office for Disaster Risk Reduction (UNISDR). Monitoring progress in disaster risk reduction in the Sendai Framework for Action 2015?2030 and the 2030 Agenda for Sustainable Development, slide presentation, 2016.
Hoebink, Paul; Ruben, Ruerd; Elbers, Willems (eds). The Impact of Coffee Certification on Smallholder Farmers in Kenya, Uganda and Ethiopia. Centre for International Development Issues Nijmegen (CIDIN), Radboud University. Nijmegen, The Netherlands, 2014
Intergovernmental Panel on Climate Change. IPCC Special Report on Global Warming of 1.5ºC. 2018
International Finance Corporation. Environmental and Social Management System Implementation Handbook. General. 2015
International Finance Corporation. Environmental and Social Management System Toolkit and Case Studies. Crop Production. 2014
International Finance Corporation. Good Practice Handbook. Assessing and Managing Environmental and Social Risks in an Agro-Commodity Supply Chain. 2013
International Finance Corporation. IFC Performance Standards on Environmental and Social Sustainability. 2012
International Finance Corporation. Strategy and Business Outlook. FY18-FY20. Creating Markets and Mobilizing Private Capital. 2017
International Institute for Tropical Agriculture (IITA) and Committee on Sustainable Agriculture (COSA). Impacts of Certification on Organized Small Coffee Farmers in Kenya. Baseline, 2016.
Task Force on Climate-related Financial Disclosure (TFCD). Recommendations of the Task Force on Climate-related Financial Disclosure. Final Report. 2017
Ellen Taylor-Powell, Larry Jones, and Ellen Henert. Enhancing Program Performance with Logic Models, University of Wisconsin-Extension, Feb. 2003
United Nations (20 organizations collaborated). The Social Dimensions of Climate Change. Discussion Draft. 2011
United Nations. Global indicator framework for the Sustainable Development Goals and targets of the 2030 Agenda for Sustainable Development. 2017	
United Nations. SDG Indicators 2017. Available from: https://unstats.un.org/sdgs/indicators/indicators-list/ [last accessed: September, 2017].
UNEP/SETAC. Hotspots Analysis. An overarching methodological framework and guidance for product and sector level application. 2017
U.S. Department of Health and Human Services. Centers for Disease Control and Prevention (CDC). Principles of Epidemiology in Public Health Practice. Third Edition. An Introduction to Applied Epidemiology and Biostatistics. 2012
Vanclay, Frank, Ana Maria Esteves, IIse Aucamp, Daniel M. Franks. Social Impact Assessment: guidance for assessing and managing the social impacts of projects. Fargo, ND: International Association for Impact Assessment. 2015
World Bank. Data for Development. An Evaluation of World Bank Support for Data and Statistical Capacity. Independent Evaluation Group. Washington DC, 2017
World Bank. Toward a Clean World for All. An IEG Evaluation of the World Bank Group’s Support to Pollution Management. Independent Evaluation Group. Washington DC, 2017
World Health Organization. Guide to Cost-Effectiveness Analysis. 2003
References Note
We try to use references that are open access or that do not charge fees.



Example 6. Disaster Stakeholder Resource Conservation Value Accounting

Algorithms: CTAP algorithms and algorithm1, subalgorithm17
URLs
Monitoring and Evaluation Output Calculator
https://www.devtreks.org/greentreks/preview/carbon/output/Disaster Risk Management, Example 6/2141223485/none
https://www.devtreks.org/greentreks/preview/carbon/resourcepack/Disaster Accounting, Example 6/1560/non
http://localhost:5000/greentreks/preview/carbon/resourcepack/Disaster Accounting, Example 6/545/none 
http://localhost:5000/greentreks/preview/carbon/output/Disaster Risk Management, Example 6/2141223500/none

A. Introduction to Disaster Risk Management

As with the SDG, the SDRR’s targets and indicators have also been adopted my most countries and most countries face similar challenges in collecting and maintaining the data. In particular, this example, like the CTAP examples, support local DRR efforts carried out by companies, producer organizations, community service organizations, investors, supply chain participants, consumers, and impacted communities. The following image (UNISDR, 2017) verifies that most countries do not have local DRR practices in place that can support the SDRR or their national DRR efforts.

Poljanšek et al (2017) use the following quote from the Sendai Framework 2015 to reinforce the relationship between the SDG “drivers” of sustainability and the SDRR “drivers” of disaster risk. 
 “more dedicated action needs to be focused on tackling underlying disaster risk drivers, such as the consequences of poverty and inequality, climate change and variability, unplanned and rapid urbanization, poor land management and compounding factors such as demographic change, weak institutional arrangements, non-risk-informed policies, lack of regulation and incentives for private disaster risk reduction investment, complex supply chains, limited availability of technology, unsustainable uses of natural resources, declining ecosystems, pandemics and epidemics”.
The following paragraph (UNIDSR, 2015) relates these drivers to government policies that fail to manage risk adequately, such as the failure to price carbon properly. TFCD (2017) discuss the equivalent risk to the financial system posed by climate change. The 2 statements explain why both global data systems need concurrent action and why socioeconomic and institutional analysis skills, or “disaster [and sustainable business] risk assessment capacities”, must be learned and applied to tackle these drivers, causes, and needed policies.
“The continuous mispricing of risk means that consequences are rarely attributed to the decisions that generate the risks. This lack of attribution and accountability creates perverse incentives for continued risk-generating behaviour, as those who gain from risk rarely bear the costs.”
Example 1 includes the Physical Capital category, Flood Control, which qualitatively addresses the coffee firm’s concern about reducing the risk of losses from floods. That capital stock can be expanded to include additional categories that identify additional risks from disasters. In addition, the UNEP (2016) recommendation to include biodiversity and ecosystem services in disaster loss estimates, can be accommodated in a manner similar to Examples 1, 2, and 3. The most straightforward way for companies, CSOs, and local communities to carry out Social Performance Assessments for local disaster risks, is to include more Physical Capital Stock categories and Indicators when using subalgorithms 13, 14, 15, 16, and 17.

The following image (IFC, 2015) verifies that the importance of SDRR, or CTA-Prevention, justifies separate Social Performance Assessments. The International Recovery Platform (2016) refers to these CTAP-related instruments as business continuity plans while IFC uses the term, emergency preparedness plans. In these types of specialized instruments, the RCA-style algorithms focus on the environmental and socioeconomic impacts of disasters. Example 1’s company. CSO, and local community activities measure how those actions change the disaster-related impacts. This example demonstrates using Example 5’s population transition states to record impacts related to population epidemiology. These algorithms can be used flexibly –some disaster accounting systems may also need to use the transition states to record impacts related to specific disaster hazards or specific hazard events.

Example 6A’s dataset uses Example 1 to 4’s semi-qualitative, and quantitative, company data to complete these CTAP plans (i.e. “business continuity plans” or “emergency preparedness plans”). Unlike the SPA2 approaches, the Indicators and Indexes used in these reports relate directly to disasters and hazards that impact specific businesses and their stakeholders. 



The CTAP algorithms demonstrate that an important aspect of disaster loss estimating is to correctly identify the Hazards, Exposures, and Vulnerabilities that generate Loss Exceedance Probability distributions (i.e. disaster impact pathways). The CTAP reference addressed disaster risk analysis for single hazards, such as typhoons, droughts, floods, and earthquakes, and human-induced hazards, such as bad management of the drivers of disaster risk (i.e. weakened environmental rules). 

This example’s first dataset, Watershed Disaster Risk Management Plans, demonstrates taking the data from multiple, single hazard, CTAP, or CTAP-like, analyses and adding their results to Example 5’s datasets for specific impacted stakeholders and/or land use areas and using multiple “impact transition states” to understand the resultant population epidemiology. This approach begins to address the UNISDR (2017b) recommendation for estimating “sequential, simultaneous, cascading and interrelated effects of some hazards” and for thoroughly understanding the resultant “population epidemiology”. Individual business continuity and emergency preparedness plans must support this type of overall community disaster management plan. 

Although incentives to complete these instruments are not yet offered by companies, insurers, or locally impacted communities, they add another potential revenue stream to support the uniform sustainability reporting carried out by RCA Assessors and RCA Technologists. Local sustainability offices and industry standards organizations can develop fee-supported reporting requirements for these purposes. Given the potential savings from losses and business disruption and the strong possibility of reputational enhancement, informed businesses may become strong supporters of this approach (i.e. especially when evidence starts coming out from disaster areas that either confirm the approach or point to ways to improve the approach). 

B. Stakeholder System Boundary and Stakeholder Engagement

The following statement (UNISDR, 2017a) summarizes the UN’s recommendation for data disaggregation by hazard, location, [CTAP] hazard events, and Example 5’s impact transition states. USHHS (2012) confirms that the health care sector uses health states, or impact transition states, to analyze and simulate population “epidemiology” (1*):

“Temporal aspects for attribution and cut-off for data collection: Countries may choose to have different timeframes for each type of hazard, because they have different epidemiology. If so decided, timeframes for each hazard should be based on the epidemiology of injury and illness rates during the event and the feasibility of recording those injuries and diseases.”
FAO (2017) uses the following statement to confirm the importance of carrying out these assessments in the larger context of integrated landscape management, including watershed planning (2*).
“[The goals of] the Sendai Framework for Disaster Risk Reduction […] should be seen as a call for watershed management to play its part and take on a stronger role in risk management and resilience building. Institutional strengthening and capacity development for risk management, coordination and contingency planning will be crucial in this regard."
The CTAP reference introduced using the Exposure and Vulnerability elements of CTAP algorithms to identify and define stakeholder boundaries and impacted stakeholders. All of the references used in this example emphasize the importance using those elements for the same purpose. Serious disaster risk management requires CTAP-like instruments, with their full disaster impact pathways, to be completed prior to this algorithm.
C. Disaster Risk Management Planning
This example customizes Example 5’s techniques for disaster risk management, or DevTreks IT-rooted term, Conservation Technology Assessment-Prevention (CTAP). Rather than repeat the same techniques again, the following list summarizes the prominent differences used when assessing disaster risks.

Disaster Risk Management Pathways (for cause and effect). The CTAP reference introduced algorithms that employ a disaster impact pathway consisting of Hazards -> Exposure -> Vulnerability -> Mitigation Actions -> Impacts. Poljanšek et al (2017) confirm that this is similar to the SDRR impact pathway. UNIDSR (2017b) defines a related, typical impact pathway for disaster risks: Drivers -> Hazards -> Exposure –> Vulnerability -> Capacity -> Impacts. Version 2.1.4 upgraded several subalgorithms by allowing them to be used in 6 or more separate calculator Indicators so that they can used to define 6 or more of these “paths”.


This example follows Example 5’s approach of identifying drivers of disaster risk in related algorithms’ impact pathway stages (i.e. see Example 5’s risk and impact identification stage). Poljanšek et al (2017) identify additional examples of risk reduction pathways, such as HAZOP and FMEA, which can also be modeled using this example’s approach. The following image (UNIDSR, 2017b) illustrates how these impact pathways support “holistic disaster risk management”. 
Disaster Risk Management Indicators and Indexes: UNISDR (2017a) provides “minimum standards” for SDRR Indicators that “allow for consistent measurement of progress towards the global targets across countries and over the duration of the Sendai Framework and Sustainable Development Goals”. They also mention that individual governments have the latitude to develop alternative guidelines and Indicators that are consistent with these minimum standards.  

Example 3B in SPA2 demonstrated the use of “ecopoints” as the final, uniform, score generated from multi-indexed sustainable Indicator systems. Examples 8 and 9 in CTAP demonstrated the use of Multi Criteria Decision Analysis (MCDA or MCA) for the same purpose. This example’s references (UNIDSR, 2017; Poljanšek et al 2017) provides similar examples, such as INFORM and the World Risk Index, for multi-index disaster risk management systems. The following image (UNISDR, 2017b) confirms that INFORM employs a basic disaster impact pathway of Hazards & Exposures -> Vulnerability -> Lack of Coping Capacity. INFORM (2018) demonstrates how a fully developed, mature, MCDA, system can support disaster risk management at global scale. In the context of this reference, the authors explain how their 1 Total Risk Score, 4 Indexes, 6 Categories, 14 Components, and 53 Indicators, can be “harmonized”, to a partial degree, with the SDG and SDRR Targets and Indicators (7*). 

This tutorial will refine the existing “ecopoints” and MCDA approaches to demonstrate how these MCDA systems can support disaster risk management at industry, company, community, and stakeholder group, scale. The fact that many of INFORM’s, CTAP’s, and related comprehensive disaster risk management Indexes and Indicators, extend well beyond the SDRR Indexes and Indicators suggest that comprehensive disaster and crisis management must also look beyond just the UN “metadata” system.
Multiple Disaster Hazards. Disaster risk management requires understanding how multiple hazards impact populations, which is another way of saying understanding “population epidemiology”. The first image (UNIDSR, 2017a) shows a Hazard Classification system that includes earthquakes, typhoons, droughts, floods, biological epidemics, and human-caused hazards (i.e. terrorism, hazardous waste spills, weakened environmental laws, weakened protected area boundaries). Many of the drivers of human-caused hazards can be traced to bad public service management and corrupted institutions which must be addressed as “drivers of disaster risk” in any serious sustainability accounting system. 


The following image’s (Poljanšek et al 2017) correlation matrix demonstrates a basic approach for understanding the interaction of multiple hazards. Examples 1b to 1e in the CTA 01 reference begins to demonstrate formal techniques that can be used in more advanced risk assessments to account for correlated Indicators, such as interacting hazards (i.e. see Example 6B).


Multiple Hazard Events. The CTAP reference introduced algorithms that employ typical hazard event periods for disasters, such as 2 year, 5 year, 10 year, 25 year, 50 year, and 100 year, events. In a related fashion, Example 5 demonstrates how impact transition states model population epidemiology over time. The following image (Poljanšek et al 2017) shows that the aggregation of multi-hazard event losses can be carried out using data derived from the PRA techniques introduced in CTAP. UNIDSR (2017b) explains how Average Annual Loss estimates or MCDA qualitative ratings, both of which are demonstrated in CTAP, also make appropriate metrics for aggregating losses from multi hazards and their events. 
Mitigation and Adaptation Activities and Life Cycle Stages. The mitigation and adaptation activities “causing” changes in impact scores must be defined more broadly for disaster risk management. Much of a community’s ability to recover from disasters relates to their resilience and coping capacity (Poljanšek et al 2017 define these terms in the following image; UNIDSR, 2017b defines these terms in Annex 3). Example 5’s Hotspots Work Processes can be expanded to include improvements to community resilience and coping capacity.

In addition, the following image (Poljanšek et al 2017) shows that several disaster stages require mitigation, including prevention, preparedness, response, recovery, and reconstruction. Example 5’s Hotspots Life Cycle Stages can be replaced with these stages for community disaster management. Alternatively, some damage assessments may prefer to use multiple hazards as replacements.

Stakeholder Impact Analysis (SIA). UNIDSR (2015) discuss the relation between Example 5’s SIA approach and the need for a similar approach for ensuring the fairness of disaster risk management. UNIDSR (2017b) discuss the centrality of understanding “socially constructed risks” in a related manner.
“Fewer countries addressed the need for a social impact analysis (SIA) even though SIA is important because the scale of disasters differ depending on the vulnerability of the community. Poor people, children, the elderly and the disabled are more vulnerable to hazards. SIA is an important tool for supporting social policy planning and requires disaggregated data (e.g. age and gender) to identify the vulnerable segments of society that need support.”
Actual, or Post-Disaster, Impacts: Poljanšek et al (2017) use the following image to discuss the role that post-disaster data collection can play in improving the accuracy of the pre-disaster risk assessment models and techniques. The “start-target-actual” properties used with this algorithm’s Indicators require this approach. The “start and target” properties can still be generated from CTAP-style pre-disaster risk modelling and the “actual”, or post-event, properties support further improvement in those algorithms.

Indirect and Direct, Tangible and Intangible, Benefits and Costs: The SDG impacts being evaluated must relate to both physical losses (i.e. loss of life) and monetary losses (i.e. loss of property). The following images (UNIDSR, 2015, and V. Meyer et al, 2014) summarize the full benefits and costs that must be evaluated using “Reference Case Socioeconomic Results”. The CTAP reference introduced algorithms (i.e. subalgorith10) that address several of these dimensions of costs and benefits. Disaster risk management requires expanded documentation of costs and benefits. Specifically, Example 6A will demonstrate using socioeconomic SDG measurements, such as QASYs, QALYs, and DALYs, rather than relying solely on physical SDG measurements. Example 4B confirms that they measure stakeholder preferences better.

 
Disaster Impact Scenario Analysis: Example 5’s scenario analysis addressed alternative scenarios for SDG Target-related Impacts. The following image (UNIDSR, 2017b) summarize the impacts to analyze in disaster scenario analysis. These authors also discuss the use of logarithmic scales, related to the probability of disaster events, for this algorithm’s 3 certainty factors in disaster scenario analysis.



Stakeholder Communication: Unlike most DRR references, Poljanšek et al (2017) devote several chapters to this topic. The authors address public perception of risk, communication of uncertainty, the role of modern social media, and the importance of interactive communication between experts and audience. UNIDSR (2017b) describe the ultimate purpose of these communication aids: “The end result of the risk evaluation is a decision by the authorities […] after stakeholder consultations and […] public participation, on the “prioritization of risk””. The following image (Michelini et al, 2015) demonstrates the use of standard communication aids proposed for an EU-wide hazard reduction system.


D. Risk and Impact Identification
The following 2 datasets demonstrate one approach that RCA Assessors can use to carry out these assessments. This approach focuses on harmonizing the approaches taken with the CTAP algorithms with the most important features of this algorithm –population epidemiology or stakeholder impact, and equity.

Dataset 1. Watershed Disaster Risk Management Plans
The CTAP reference datasets (i.e. typhoons, earthquakes, floods, droughts, related risk management indexes) are summarized in related SDRR-related Targets and Indicators that impact specific stakeholder groups. For this example, Example 5’s 2nd dataset is stylistically modified for these purposes (i.e. disaster losses are being illustrated but not measured). Key differences from Example 5 include:
1st TEXT dataset. Demographics and SDRR Indicators (CTAP-like indexes): The Indicators derive from CTAP-like Categorical Indexes. Many of the following examples of SDRR Indexes and Indicators from UNIDSR (2017a) can be derived directly from the CTAP datasets. However some CTAP or CTAP-like datasets may need further harmonization, or modification, to make them fully compatible with the SDRR (i.e. INFORM (2018) demonstrates how to harmonize existing datasets with SDG and SDRR targets). The CTAP and SPA2 references demonstrate that several algorithms related to MCDA (i.e. subalgorithms 10, 11, 12 and 13) also support direct measurement of these Indexes and Indicators.
Target A. Mortality. Substantially reduce global disaster mortality by 2030, measured as the number of deaths and missing persons attributed to disasters, per 100,000 population.
Indicator 1. Number of deaths attributed to disasters, per 100,000 population.
Indicator 2. Number of missing persons attributed to disasters, per 100,000 population. 
Target B. Affected People. Substantially reduce the number of affected people globally by 2030, measured as the number of directly affected people attributed to disasters, per 100,000 population.
Indicator 1. Number of injured or ill people attributed to disasters, per 100,000 population, per 100,000 population.
Indicator 2 to 4: …
Target C. Economic Loss. Reduce direct disaster economic loss in relation to global gross domestic product (GDP) by 2030, measured as the direct economic loss attributed to disasters in relation to global gross domestic product
Indicator 1. Direct agricultural loss attributed to disasters
Indicator 2 to 6: …
Target D. Infrastructure Damage. Substantially reduce disaster damage to critical infrastructure and disruption of basic services, among them health and educational facilities, including through developing their resilience by 2030, measured as the damage to critical infrastructure attributed to disasters.
Indicator 1. Number of destroyed or damaged health facilities attributed to disasters.
Indicator 2 to 8: …
Target F. Total International Support. Total official international support, (official development assistance (ODA) plus other official flows), for national disaster risk reduction actions.
Target F is an example of Sendai Targets and Indicators that must be adapted to work at landscape and business scale. For example, a better Target for watershed planning might be Total State or Province Support.
* Categorical Indexes (CIs for specific stakeholder groups): This plan uses exactly the same stakeholder groups and land use types as Example 5’s 1st dataset.
* Location Indexes (LIs for specific hazards) and Total Risk Indexes: Annex 1 in UNIDSR (2017a) demonstrates the use of a Hazard Classification System to fully identify all of the hazards that can impact targeted stakeholders. This example uses this classification system for Total Risk Indexes (i.e. Geophysical, Hydrological) and Locational Indexes (Earthquakes, Tsunamis). Alternatively, the Hotspots Life Cycle Stages recorded in the 2nd dataset can be used for this purpose.
* Indicator Metadata: Most of the properties displayed in each Indicator (i.e. Q1 … QTMost) are generated automatically and calculated as the summation of the all of the TR Indexes in a dataset.

…

The Treatment and Non Treatment Villages in the previous table supports the use of counterfactual evidence in Impact Evaluations. In the case of disasters, the ethical consequences of that approach has to be balanced with the need to understand what works and what doesn’t work. Gertler et al (2016) discuss the use of review boards and ethics committees for these purposes.
2nd TEXT dataset. Population transition states and SDRR allocations. The following dataset employs the same techniques as introduced with Example 5 -the CIs record uncertain population measurements for the 1st dataset’s stakeholder groups while the CI Indicators allocate the 1st dataset’s SDRR Targets to these stakeholders. Additional population algorithms may employ alternative population modeling techniques. 

…

Indicators 2 to 7. Population Transition States using Performance Monitoring Approach: The separate calculator Indicators measure population epidemiology using the same M&E approach introduced in Example 5. The CTAP Average Annual Loss (AAL) and risk metrics can be used for this purpose. Using Example 5’s approach, the 1st Indicator measures AALs and risks for the period 2017 to 2019 and subsequent Indicators measure the trend in losses and risks during 2020 to 2030 as new mitigation and adaptation efforts are undertaken. This approach requires including all of the probable hazards in each separate Indicator’s datasets. Example 1A in SPA2 demonstrates using SAFA-like Total Risk Indexes to report each separate hazard in each Indicator’s dataset (i.e. TRA = earthquakes, TRB = landslides, TRC = droughts, TRD = human-induced hazards caused by weakened environmental rules, …).
Alternatively, some assessments may need to use CTAP-like hazards or hazard events in each Indicator. With the latter approach, each Indicator holds the loss data associated with specific hazards or hazard events.
Score using Impact Evaluation Approach: The Score serves as an Impact Evaluation by summarizing the losses and risks (i.e. for the period 2017 to 2030) in the 15 Performance Monitoring Indicators (Indicator 1 = 2017 to 2018, Indicator 2 = 2019 to 2020, …).
Dataset 2. Example 6A. Business Continuity and Emergency Preparedness Plans
The Watershed Disaster Management Plan becomes the “reference plan” needed to complete separate plans for individual firms and local communities within the watershed. The plans demonstrated in Example 6A complement the watershed plan and may contain the exact same Indicators and Indexes, except they use abbreviated SIAs and measurements related to individual firms, local communities, and impacted stakeholders. Example 6A modifies Example 5’s 1st dataset for these purposes.
Advanced Disaster Risk Management
This reference [may] demonstrate the use of algorithms to train machine learning algorithms to define likely disaster impact paths. Once trained, the algorithms predict SDG and SDRR impacts on targeted stakeholders that arise from company and community service organization disaster risk reduction, and CTA-Prevention, activities. More advanced algorithms predict likely transition states and advise companies and communities how to dynamically tweak their mitigation and adaptation actions to achieve SDRR-related Targets.

E. Mitigation and Adaptation Activities and Stages

The first paragraph (Poljanšek et al, 2017) explain why the definition for CTAP in the second paragraph focuses on disaster prevention. The role of disaster risk management is to prevent disasters from causing costly losses by “adopting and diffusing” mitigation and adaptation activities that increase the resiliency and coping capacity of communities and companies.

“Based on an analysis of the benefits arising from avoided losses, mitigation and prevention measures are widely considered more cost-effective than expost disaster interventions. An increase in mitigation investment has occurred in some … countries, but the lack of public and therefore political interest in prevention and mitigation remains a problem.”

“CTA-Prevention (CTAP) is the numeric assessment of the costs and benefits of a portfolio of mitigation and adaptation interventions that prevent or correct resource stock damages. Assessments use relevant Conservation Technology Assessment (CTA) algorithms to quantify the risk and uncertainty associated with resource stock measurement and valuation.”
F. Quality of Life, or Sustainability. Scenarios

The following Scenario Analysis is used by communities and companies to more fully understand the transition risks they face from changes needed in their disaster risk management activities and social interventions to accomplish the SDRR targets.

General Scenario
Threatened Quality of Life: High GHG result in 1.5ºC temperature increase with higher incidence of biodiversity loss, droughts, severe heat waves, crop and livestock production risks, air pollution, floods, migration, and social discord that leads to the incapacity to achieve the SDRR targets. 
Targeted Social Performance Risks and Targets: Combinations of the SDRR targets, adapted to local stakeholders, including (1*):
* [Local] target A: Substantially reduce [local] disaster mortality by 2030 
* [Local] target B: Substantially reduce the number of affected people [locally] by 2030
Mitigation and Adaptation Actions: Portfolio 1 consists of a) …, b)…, and c)…. 

Sustainability Scenarios. 
In terms of SDRR goal accomplishment, logical sustainability scenarios relate directly to a community’s or company’s transition to higher levels of sustainability, as illustrated by the following scenarios:
A. Transition State _A: baseline (i.e. current disaster mortality and high economic losses from disasters)
B. Transition State _B: low sustainability
C. Transition State _C: medium sustainability
D. Transition State _D: high sustainability (i.e. substantial reduction in disaster mortality and affordable economic losses from disasters)

Simple data conventions, including the use of sibling base elements and Example 1’s “_xx” labelling convention, can be used to model these more comprehensive transition scenarios.
Section J, Decisions, demonstrates how to use the Reference Case Cost Effectiveness Results introduced in Example 4B to assist making decisions about the degree of sustainability, or stage of sustainability, appropriate for specific communities and companies.

Example 7 may demonstrate how machine learning algorithms anticipate transition states, given current circumstances, and tailor sustainability recommendations to specific scenarios, based on their assessment of the company and community’s most probable sustainable development paths. 

E. Social Performance Score

Dataset 1. Watershed Disaster Risk Management Plans
The following images show the properties of the 1st Indicator. The SDRR Indicators used in this dataset derive from results summarized from CTAP or CTAP-like algorithms. In this example, the first Total Risk Index, or TRG, measures disaster risks associated with earthquakes while the second, or TRC, measures droughts. The MathResults properties show the correct convention of storing the results in TEXT files identified by URLs derived from the Resources Application.

  The MathResults appear as follows. The data is stylistic and has no meaning other than to debug the algorithm.

…

Indicators 2 to 7. Population Transition States using Performance Monitoring Approach. The following image displays the results for the 2 year Performance Monitoring periods contained in 6 additional Indicators (6*). For convenience, the same datasets were used for the 7 Indicators and the Score. 



Score using Impact Evaluation Approach. The following Score properties show that the Score employs the same TEXT dataset as Indicators, but support longer term Impact Evaluations. Although Scores only partially display the “target-actual” properties returned in Indicators, the raw MathResults contain all of that data for reporting. For testing purposes, Indicator 1’s datasets were reused in this Score. Resource Stock and M&E Analyzers, as explained in their respective tutorials, can be used to further analyze all of the “actual” scores for both the Score and the Indicators (i.e. Totals, Statistics, Change Bys, and Progress, Analyzers).

 

Example 6A demonstrates additional approaches, including the use of related algorithms, for using Scores to support disaster risk management decisions.

F. Decisions
The following image (UNIDSR, 2017b) introduces the use of scoring criteria in national risk assessment efforts. The image shows that UNISDR (2017b) recommends using the 3rd risk factor to describe the “level of uncertainty” of the first 2 risk factors.

Example 5 used the 3rd “certainty”, or risk factor, for measuring the degree of responsibility for the disaster risk by a “driver” of hazards, such as a company or community. This risk factor focuses on further drivers, or causes, of human-induced disasters. For example, community leaders, or more accurately, misinformed leaders, who fail to adequately manage disasters (i.e. by weakening environmental laws or allowing special interest groups to pillage public lands) must be held accountable for whatever carnage ensues (7*). 
SDG and SDRR Accounting Standards Board Scorecards (SDG-ASB)

The following 4 Scorecards illustrate the use of SDRR metrics in the same manner as Example 5’s SDG metrics.

The CTAP algorithms demonstrate direct measurement of monetary benefits and costs. The following Reference Case CEA is based on these benefit and cost metrics. Benefit metrics for many disaster-related assessments, as documented in Column 4 and explained in Example 4B, are based on QALYs, DALYs, or QASYs.



The evaluation question being answered by the following Business Value Report is:

To what extent has the [company] [organization] [community] been effective and efficient in achieving SDRR-related targets that are relevant to our stakeholders? 

Example 7 [may] demonstrate the use of machine learning algorithms to codify this report.


G. Communication

The following image demonstrates using a calculator’s Media View to communicate the Math Results to keep impacted stakeholders informed.



One of the most comprehensive examples of communicating disaster risk information relates to the INFORM project summarized in the following image (INFORM, 2018) (4*). This tutorial demonstrates that INFORM-like MCDA techniques can be adapted to work in a similar fashion at landscape and community, or “sub-national”, scale. Example 10 in the SDG Plan reference, demonstrates using a similar approach for assessing the feasibility of mitigation and adaptation technologies for local communities and local businesses within watersheds or landscapes.


The following image (UNIDSR, 2017b) summarizes the types of communication aids commonly used to communicate disaster management information to stakeholders. CTAP introduces algorithms that contain the raw data needed by several of these communication aids.


H. Performance Monitoring and Impact Evaluation (M&E)
The references largely discuss disaster risk management M&E systems in terms of Internet of Things (IoT) device metrics, such as ocean wave height measurements, earthquake tremors, snow quantity, and soil moisture levels. The references discuss how many of these systems are highly developed and used currently for real life disaster planning purposes (i.e. global tsunami monitoring systems, radiation discharge networks). These IoT systems add an important, missing, dimension to the M&E accounting systems introduced in this tutorial. They suggest that additional interfaces are needed that allow both M&E systems to communicate dynamically with one another to advance the overall, disaster risk management, objective. This reference [may] introduce additional algorithms for this purpose.

This reference focuses on disaster risk management Performance Monitoring and Impact Evaluation accounting systems. CTAP introduced 3 examples showing how some of these systems work for disaster risk management. For example, the following image (Khazai  et al, 2015) introduces 3 Indexes related to these purposes.


I. Conclusions

The World Bank (2017) use the following statement to summarize the role played by Performance Management frameworks, such as the RCA Framework that underlies this tutorial, in improving government accountability and helping voters to become “informed”. The citizens being impacted by disasters may welcome having accountable governments in place that know how to prevent disasters from taking place, and, in the event of their occurrence, can take care of them, affordably and equitably, after the disasters strike (8*).

“The more citizens hold their governments accountable, the greater the demand and use for data will be for measuring government performance against indicators and targets. One way to make government agencies, [legislators, and executives,] more accountable and more efficient is to widely publicize data about their achievements and shortfalls, and then [to cast informed votes that will lead to adjustments in the] funds delivered in the next budget cycle to reward strong performers [and chasten bad performers].”

Footnotes

1. As mentioned or implied throughout this tutorial, given that the conventional institutions in some countries appear incapable of understanding, let alone ameliorating, these conditions, networks and clubs may need to take independent action.
2. The Poljanšek et al (2017) reference, in particular, highlight how much more advanced IT technologies, many involving satellite imagery, IoT technologies, and emerging GIS software, are improving disaster risk identification and reduction at national and international scales. The reference also highlights how thousands of separate, peer-reviewed, silos are carrying out the work, with little to no coordination. Section 7.8 in UNISDR (2015) further discusses the drawbacks to this conventional approach. The reference offers no reference dataset that demonstrates alternative ways to carry out each chapter’s algorithms and recommendations (i.e. see Gertler et al 2016, or any of DevTreks tutorials for examples). The reference makes passing mention of the fundamental role of a standard IT data platform[s] to actually accomplish all of the needed objectives, partnerships, and “innovative approaches” (i.e. see Section K and L of Appendix A in SPA1 for examples –keep in mind that the author is a technologist). 

DevTreks emphasizes the importance of “doing it right” – by starting with the basics: community capital improvements, cause and effect impact pathways, population transition states, multi-stakeholder value assessments, standardized TEXT datasets accessed via URIs, uniform reporting, scientific reference datasets applied in online tutorials, simple software object models, mathematical algorithms derived from open source software libraries, networks, clubs, generic-open source-IT platforms, and morally-grounded $28 donations. 

The “next generation” needs to critically assess whether, although this type of reference represents a valuable legacy from the “last generation”, their generation can, and must, do better. They can start by doing better than the algorithms introduced in these tutorials. Then proceed to automate all of the best algorithms in the Poljanšek reference and include complete reference TEXT datasets, URLs, tutorials, and open source code. Demonstrate how the quality of life of impacted stakeholders is improving. That is, do it right.
3. Although this GIS-style map can make a lot of complex information understandable to stakeholders, the author has found that the use of “pretty pictures and maps” can distort reality, foster audience complacency, and lead to misinformation. UNISDR (2015) use the phrase “decision makers need numbers not maps” to make a similar point. That’s why this reference uses a lot of images of tabular TEXT datasets. The TEXT files can still be inaccurate and boring, but raw data has less tendency to be misleading.
4. Prior to becoming a technologist, the author worked “in the field” for more than 15 years. Although this reference makes a good faith effort to follow “best disaster risk management practices”, that experience raises warning flags about data intensive efforts like this. On the other hand, the technologist experience suggests these practices, which follow UNIDSR recommendations, are inevitable because they’re necessary. Footnote 14 in CTAP is worth repeating. “These are all tools that have the potential to assist decision making. Assess which ones make sense for local contexts, experiment with them, adapt, but ultimately figure out how to make affordable and transparent decisions for [improving peoples’ lives and livelihoods].” 
5. A new algorithm for automating INFORM was decided against because the goal of this reference is not global reporting using existing international datasets. The goal is to use impact pathways to understand cause and effect attribution at national, landscape, and business scale. This reference encourages national sustainability efforts to investigate duplicating INFORM’s approach using national datasets (i.e. census, SDG, and SDRR). Given that this tutorial’s algorithms can use Indicators derived from these national datasets, priority is placed on further development of the SPA2 and SPA3 algorithms. 
6. This dataset could not, at first, be run with subalgorithm17 because of a bug displayed in the following image’s function:

The bug highlighted several problems: a. this was completely the wrong function to use with subalgorithm17, b. even though 17 was under active debugging, several sloppy areas of coding also surfaced associated with previous releases, c. DataSet2[0] has no error checking and is the immediate bug (but also the symptom of another bug), and d. this code only supported a minor part of row headers but had the potential to completely prevent the results of a CTAP algorithm from being printed. Even if 99% of an algorithm’s code is basically sound, or even if an algorithm appears to work for 99% of datasets, that’s not good enough for algorithms with these potential consequences. Footnote 10 in CTAP is worth repeating: “Serious [local], national, and international, IT shops must plan on having larger IT staffs.” And adding “Serious local, national, and international, SDG and SDRR efforts must divert staff from last generation jobs to next generation jobs.”
7. For example, many western states in the USA now require the persons responsible for causing fires to pay for the ensuing maelstroms. Impacted communities and stakeholders can require similar accountability for the persons, past and present, responsible for causing losses attributed to their actions and policies that contributed to the human-induced hazards. Modern digital evidence supports such expanded accountability. Consequential digital activism can help to deal with related institutional “issues”, such as rigged judicial systems.
12. “Informed stakeholders” should recognize that rating government agencies, legislators, judicial bodies, and public executives for their contribution to “societal quality of life” may be inevitable and necessary. This reference recommends basing the ratings on “international SDG and SDRR reporting norms”, “evidence-based” science, and concrete, open-source, IT that employs “next generation”, M&E-based machine learning algorithms. Refer to the Version 2.1.8 SDG Plan reference for additional guidance.

Case Study References

Example 5, plus the following.
INFORM Global Risk Index Results 2018 (last accessed February 29, 2018:  www.inform-index.org) (see the related reference by Martin-Ferrer et al, 2017)
International Recovery Platform. Guidance Note on Recovery. Private Sector. 2016
Khazai, Bijan; Bendimerad, Fouad; Cardona, Omar Dario; Carreno, Martha-Lilliana; Barbat, Alex H.; Burton, Christopher G. A Guide to Measuring Urban Risk Resilience. Principle, Tools and Practice of Urban Indicators (Prerelease Draft). Earthquakes and Megacities Initiative - EMI. 2015
Marin-Ferrer, M. Vernaccini, L. Poljansek, K. INFORM. Index for Risk Management Concept and Methodology Version 2017
V. Meyer, N. Becker, V. Markantonis, R. Schwarze, J. C. J. M. van den Bergh, L. M. Bouwer, P. Bubeck, P. Ciavola, E. Genovese, C. Green, S. Hallegatte, H. Kreibich, Q. Lequeux,I. Logar, E. Papyrakis,C. Pfurtscheller, J. Poussin, V. Przyluski, A. H. Thieken, and C. Viavattene. Review article: Assessing the costs of natural hazards – state of the art and knowledge gaps. Nat. Hazards Earth Syst. Sci., 13, 1351–1373, 2013
Alberto Michelini, Gerhard Wotawa, Delia Arnold-Arias, Daniela Pantosti, et al. EU ARISTOTLE Project. 2017
Poljanšek, K., Marín Ferrer, M., De Groeve, T., Clark, I. (Eds.), 2017. Science for disaster risk management 2017: knowing better and losing less. EUR 28034 EN, Publications Office of the European Union, Luxembourg, ISBN 978-92-79-60678-6, doi: 10.2788/688605, JRC102482.
United Nations Office for Disaster Risk Reduction. Technical Guidance for Monitoring and Reporting on Progress in Achieving the Global Targets of the Sendai. Framework for Disaster Risk Reduction 2015-2030. 2017a
United Nations Office for Disaster Risk Reduction. National Disaster Risk Assessment Words into Action Guidelines. Governance System, Methodologies, and Use of Results, 2017b
United Nations Office for Disaster Risk Reduction. Global Assessment Report (GAR). 2015
United Nations Environment Programme (UNEP) Loss and Damage: The Role of Ecosystem Services. 2016


Example 6A and 6B. Stakeholder Abbreviated Resource Conservation Value Accounting (RCA7)
Algorithms: algorithm1, subalgorithm18; CTA algorithms
URLs: 
Example 6A. Resource Stock Output Calculator
https://www.devtreks.org/greentreks/preview/carbon/output/Disaster Risk Management, Example 6A/2141223486/none
https://www.devtreks.org/greentreks/preview/carbon/resourcepack/Disaster Accounting, Example 6A/1559/none
http://localhost:5000/greentreks/preview/carbon/resourcepack/Disaster Accounting, Example 6A/546/none
http://localhost:5000/greentreks/preview/carbon/output/Disaster Risk Management, Example 6A/2141223501/none

Example 6B. Correlated Indicators 
https://www.devtreks.org/greentreks/preview/carbon/resourcepack/Disaster Accounting, Example 6B/1561/none
https://www.devtreks.org/greentreks/preview/carbon/output/SDRR, Example 6B/2141223487/none
http://localhost:5000/greentreks/preview/carbon/resourcepack/Disaster Accounting, Example 6B/547/none
http://localhost:5000/greentreks/preview/carbon/output/SDRR, Example 6B/2141223502/none

A. Introduction

This example offers an abbreviated alternative to Example 5 and 6’s fuller SIA techniques (1*). The primary differences from those examples include:
1. SDG Allocations and Socioeconomic Characteristics in 1 TEXT dataset: Example 5 and 6’s 1st and 2nd datasets are combined into 1 TEXT dataset. This algorithm serves as an abbreviated SIA compared to Example 5 and 6. Rather than allocate the 1st datasets SDG measurements to the 2nd dataset, this algorithm requires the direct SDG allocated measurements. RCA Assessors must explain the reason for the allocated measurements in separate documentation. Either CTAP-like data can be used as the background data for making the allocations or rules can be developed for making the allocations from Example 6’s watershed plans.
2. QASY, QALY, or DALY Measurements: The preferred SDG measurements are based on human quality of life-related measurements, not physical or social Indicator measurements. Example 4B explained the reason for preferring these metrics in socioeconomic decision making. That is, they relate directly to stakeholder preferences which do a better job of measuring stakeholder perspectives and understanding equity and tradeoffs. Example 4B also discusses the need to institutionalize these measurements at local, national, and international, scales.
3. Hotspots Categorical Indexes: Example 5 and 6’s Hotspots-related factors move from the Indicators to the Categorical Index. This approach implies that the original CI measurements taken from related algorithms, must be grouped together in this dataset based on both Hotspots factors, Production Processes and Life Cycle stages. In the context of this example, these processes and stages should focus on Coping Capacity and Resiliency Processes and Disaster Risk Management Stages. The goal is to understand the most effective and equitable ways to achieve SDRR targets at individual business scale (2*).
4. CEAs use the SPA2 techniques. This algorithm does not support direct CEA measurement because no costs are added to the datasets. CEA and Reference Cost Effectiveness Results must be completed using the techniques introduced in Examples 4A, 4B, and 4C.
5. Additional DRM support from additional algorithms. Example 6B demonstrates how to supplement the Example 6A decision making by using additional risk assessment algorithms.
B. Social Performance Score

The following algorithm properties define the socioeconomic characteristics and an “impact transition state” for the stakeholder groups identified in these datasets. 

Indicators
The following images of Indicator 1 displays the “target-actual” properties explained more thoroughly in this section that produces the target and actual scores. Unlike Example 6, these properties don’t include “starting”, or benchmark, properties. These “scores” are calculated as the sum of the TR Indexes for all locations. They are calculated as the sum of the TR’s children normalized and weighted LIs. This dataset uses 2 locations for testing purposes. These scores will be used with the “standard sustainability scorecards” that will be explained in Section I, Decisions.
  

TEXT dataset. Demographics and SDG Allocated Indicators. This stylized dataset applies Example 5’s coffee farm dataset with 2 hazards, earthquakes and droughts, and 2 location ids.

…


Stakeholder Characteristics Titles. Rows used to describe socioeconomic variables must include a 0 in the 2nd data column, the location, property.

Locational Index Measurement. The following properties show that only the last 2 columns of data are used to run calculations. These 2 columns will normalize and sum all of the calculated LIs within each TR Index. This is the same normalization and weighting as used in subalgorithm15.
* factor1 to factor9: none
* factor10: normalization type
* factor11: weight

Categorical Indexes. The following list defines how the final 11 columns of data for Categorical Indexes are used. 
* factor1 to factor6: socioeconomic characteristic measurements for separate stakeholder groups; most of the UN references identify the most important characteristics for reporting: age, gender, race, disability, income, and education. 
* factor7: population/land use count at end date
* factor8: population/land use unit of measurement 
* factor9: Hotspots Coping and Resiliency Mechanism
* factor10: Hotspots Disaster Risk Management Stage
* factor11: end date of measurement

Indicators. These properties measure the actual and target SDRR and CTAP-like measurements at the parent CI.end date. QASY, QALY, or DALY measurements are preferred for these allocated measurements.
* factor1: actual most likely, allocated, SDRR and CTAP-like measurement count at end date; 
* factor2. unit of measurement (i.e. QASY)
* factor3: actual low, allocated, CTAP count at end date; 
* factor4: actual high, allocated, CTAP count at end date; 
* factor5: unit of measurement for low and high estimates; 
* factor6: target most likely, allocated, CTAP measurement count at end date; 
* factor7. certainty1; severity, and probable consequence, of this SDG risk on this population/land use as of end date
* factor8: certainty2; likelihood of this CTAP risk on this population/land use as of end date
* factor9: certainty3; probable impact of this population/land use on company/CSO from this CTAP risk as of end date
* factor10: normalization type 
* factor11: weight 

Business Resource Conservation Value Accounting MathResults. The following table displays the partial results for this abbreviated SIA. 

…


Indicator Math. This algorithm employs the following calculations for Indicators. As mentioned, RCA Assessors must provide separate documentation about how they made the SDRR allocations for each CI Indicator.
* percenttarget: (factor1 / factor9) * 100

Categorical Index Math. The first 11 columns display the initial data. The final 7 columns are calculated by normalizing and weighting all of the Indicators in each parent Locational Index and then summing the normalized and weighted Indicators into their respective CIs, as displayed in the following list. These properties differ from Example 6 in not having population PRA calculations in the CIs (i.e. no high and low confidence interval) and requiring the totalcosts property to be filled in manually (i.e. to support the Reference Case CEA reports).
* qtmost: sum of CIs
* percenttarget: average of CIs
* qtlow: sum of CIs
* qthigh: sum of CIs 
* certainty1: factor6 averages
* certainty2: factor7 averages
* certainty3: factor8 averages
* totalcost: 0 (fill in the quantity manually)

Locational Index Math. Locational Indexes are calculated as the sum of the normalized and weighted Indicators contained in all of the LI’s children CIs.

TR Index Math. The TR Indexes are calculated as the sum of the normalized and weighted children LIs, and display the same properties as the LIs. The LIs can be normalized and weighted separately from their children Indicators. Section I, Decisions, demonstrates displaying these Scores in uniform, “SDG-ASB Scorecards”.

Indicators 2 to 15. Hazard Impact Transition States. The following image demonstrates a similar M&E approach for documenting population epidemiology as introduced in Examples 5 and 6. The Indicator metadata displayed in these reports results from summing all of the TR Index data across all locations in each dataset. Note that this algorithm’s datasets do not include starting, or benchmark, SDRR allocations, resulting in benchmark Indicator scores of zero.



Score using Multi-Index, MCDA-style, approach. Example 9 in CTAP demonstrates using MCDA to support disaster risk management. INFORM (2018) demonstrates using a uniform 10 point, low risk = 0, high risk =10, MCDA scoring system for disaster risk management at global scale. They demonstrate using advanced MCDA techniques to rank countries by degree of risk, to cluster countries together according to risk threshold levels, and to assess the trend of each country’s risk. Importantly, they also demonstrate how Indicator-based decision support systems can take several years to mature and require substantial effort and evolution to get the Indicators and Indexes “right”. 

Rather than explain again the Score properties explained in Example 6A that are used for these purposes, the following images demonstrates using the Resource Stock and M&E Analyzers to conduct further analyses of the metadata Indicators and Scores. The Scenarios sections of this reference recommends using sibling base elements, which can be analyzed using these techniques, to document sustainability transition states or scenarios.

Totals Output Stock Analysis. Three children Output Series were added to this Output and then the Stock Calculator was copied into the 3 children to produce these results.


Statistical Output Stock Analysis. The image shows that Statistical Analyzers produce basic total, mean, median, variance, and standard deviation, statistics for the 3 children observations.

Change By Id Output Stock Analysis. The zero changes in the following image correctly result from copying the same Stock Calculator into the 3 children Output Series, including the same URLs to the same datasets. Changes will be shown when each of the children datasets are changed.



Example 6B. Score using Correlated Indicator Approach. This example demonstrates using related algorithms to conduct more advanced disaster risk assessment. INFORM (2018) uses the following image to explore the correlation between the Categories, or Categorical Indexes, in their MCDA disaster risk management system. The authors explain how to use more advanced statistical analysis techniques, including correlation analysis and factor analysis, to ensure that Indicator-based Indexes, as demonstrated throughout this tutorial, are “well-structured and balanced”.  OECD (2008) also introduce several statistical techniques, referred to as algorithms in DevTreks, for improving the applied use of composite indicators and indexes.


The Conservation Technology Assessment 1 (CTA01) tutorial introduced examples of algorithms that conduct basic correlation analysis for “interacting” Indicators. Their purpose is to ensure that Indicators that use simulated data, such as the CTAP datasets, maintain correct correlations when the simulated data is used in joint calculations, such as the Gross Revenue Scores generated in CTA01 from separate Price and Yield Indicators. Example 6, Section C introduced an example of studying the interrelationship of multiple hazards. Poljanšek et al (2017) discuss why it’s important to study how hazards interact, as follows: “various kinds of interactions between hazards that often lead to significantly more severe negative consequences for the society than when they act separately.”

The following example demonstrates the basic approach introduced in CTA01 for analyzing correlated Indicators. In this example, the correlated Indicators are final, aggregated, fictitious, Total Risk Indexes taken from 11 locations for 3 earthquakes that are correlated in some manner with 3 droughts (i.e. the earthquakes destroyed reservoirs). This example’s Score is measuring some type of aggregated risk being measured by aggregating the separate Indicators, such as aggregated, multiplicative, costs, benefits, net benefits, or, in this example, qualitative, MCDA-style, disaster risk TR Indexes. 

The following 2 dataset URLs are added to the Score and used with algorithm1, subalgorithm3, to generate the final calculations. That algorithm uses a correlated multivariate analysis technique known as spearman correlation. In preparation for the machine learning algorithms, Version 2.1.6 upgraded older calculator patterns to make them compatible with newer patterns that are compatible with R, Python, and Machine Learning (i.e. the pattern found in the DevTreksStatsAPI app). 

Dataset 1. Script File or Training Dataset. The following dataset specifies the type of correlation technique to use, such as pearson or spearman, and the Indicators that are correlated (i.e. E1, earthquake 1 is correlated with D1, or drought 1).
Stock Calculator using Score.JointDataURL[0]

M&E Calculator using Score.URL[0] 

Dataset 2. Full Data File or Test Data File. The following dataset generates the correlation matrix for the interacting Indicators. The assumption is that separate datasets were collected from 11 different entities, such as locations or businesses.
Stock Calculator using Score.JointDataURL[1] 

…

M&E Calculator using Score.URL[1] 


…

Score. The following Score properties display the final aggregated risk generated from the 3 correlated Indicator datasets. The main purpose of this example is to introduce additional statistical techniques that can be automated in algorithms and then used to improve MCDA-style Indicators and Indexes. This image also shows the simpler calculator patterns used in Version 2.1.6.

 

  

The reason that the previous images TEXT datasets have names that end with “4” (…Inds4.csv) is that three more complex datasets were unsuccessfully tested with this algorithm before defaulting back to the simpler, CTA01 introductory datasets (confirming that these introductory algorithms need dedicated staff working on the next generation algorithms introduced in Example 7).

C. Reporting
The differences from Example 5 and 6’s reports and scorecards include:
1. CEAs measurements come directly from techniques in 4A, 4B, and 4C.
2. Starting SDG count comes from the “ending” SDG count from the previous Indicator. The first Indicator measures starting benchmark, or baseline, conditions.

Advanced risk analysis, such as illustrated in the following chart from Example 6B’s Score.MathResults, support the manual completion of the following types of decision support aids. The CTA01 and CTAP reference demonstrates adding monetary benefit and loss metrics, such as supported by the Reference Case Cost Effectiveness Results reports, to these decision support aids. INFORM (2018) demonstrates how standardizing these scoring systems to a 10 point scale supports comparsions over time and place.


 
Footnotes

1. Although supporting abbreviated SIAs, this algorithm still does not target “lowest common denominator” sustainability officers, or RCA Assessors. The author has found that professional work is best accomplished with professional workers. Training those professionals for the work is better than assuming the workers aren’t capable of advancement. Workers may appreciate being trained for “next generation” jobs, rather than being led to believe that “last generation” jobs have a future on a sustainable planet.
2. The only way to fully proof the validity of these algorithms is field testing and running controlled experiments with businesses. It’s likely that all of the algorithms in these references need further refinement as well as the development of related algorithms that offer alternative approaches for achieving the desired targets. DevTreks doesn’t raise funds for these purposes because the author dislikes administrative work and because equating success with money sends the wrong message to any potentially “wiser generation”.
3. The following error message was generated when the Make Base command element was first clicked. The Calculators and Analyzers tutorial explains that, since only children Input and Output Series are used in Budgets, each Output and Input must have at least 1 child series before calculations can be run. That tutorial also explains that this contended error message has been retained because DevTreks is not consumer software –it requires study and hard work (and because clicking on an Add Output Series command element is not part of the hard work).


References

Example 5, plus the following.
OECD. Handbook on Constructing Composite Indicators. Methodology and User Guide. 2008


Example 7. SDG Stakeholder Resource Conservation Value Accounting using Machine Learning (ML)
A. Introduction to Impact Evaluation
Gertler et al (2016) explain how to conduct formal Impact Evaluation for social interventions. The authors provide concrete examples that use standard statistical analysis techniques, such as regression analysis, to evaluate, or verify, the impact of the interventions. The authors summarize the “fundamental equation” underlying impact evaluation in the following images.

…

Appendix A introduce automated versions of Gertler’s standard statistical analysis techniques.  Examples 8A, 8B, and 8C introduce complementary Machine Learning (ML) techniques. These techniques will be applied during the Impact Evaluation stage of the sustainable M&E accounting system demonstrated throughout this tutorial. 
B. Introduction to RCA Technologists and RCA Assessors Roles, or Automated Statistical Analysis
This example differs from Gertler’s, and most conventional statistical analyses, techniques in several IT-based ways.
1. Developers or Data Scientists, such as RCA Technologists, build online algorithms. Analysts, such as RCA Assessors, apply the online algorithms. Current statistical packages require every user to learn the statistical languages and data conventions and to work on a local server. That’s inefficient and wasteful. While the developers of algorithms need to understand the intricacies of algorithms, the majority of analysts only need to understand the logic behind an algorithm. For example, is it really necessary for every analyst to understand every scripting language convention demonstrated in Examples 8A to 8C, or can 1 IT team build the scripts and then have thousands of analysts employ the packaged algorithms? Once a Social Performance Analysis, or sustainable accounting, data system has been set up, RCA Assessors consult with the RCA Technologists on the best algorithms that will achieve the desired targets. The Assessors focus on the mechanics of using the algorithms: determining the base element data context, facilitating data collection, setting up TEXT datasets needed by the algorithm, understanding stakeholder perspectives, assessing tradeoffs, forecasting scenarios, and completing the final communication aids that support decision making.
2. Developers use APIs to Automate Everything. DevTreks started with algorithm1 because it employs an open-source mathematical library, MathNet or Numerics, with a straightforward API that can be used as a standard C# module, and easily compiled into each released version of DevTreks. Standard statistical analysis packages, including R, Python, or Stata, require separate deployment and awkward coding, such as the use of batch files or DLL syntax (1*) (i.e. see the CTA02 and CTA03 references). Most statistical analysis software is evolving in ways that might make them easier to code and deploy. The machine learning algorithms will employ the latest developments in this field.
3. Occam’s Rule supports imperfect socioeconomic decision making. DevTreks applies Occam’s Rule for data management because it supports the final, imperfect decision making, objective in 95% of all cases: a) Basic algorithms should be used over complex algorithms, b) When possible, TEXT datasets should be limited to 10 columns of explanatory variables, and c) Indicator metadata should be limited to a minimal number of uniform quantitative properties (i.e. Q1-Q5, QTM-QTL). Occam appears to have recognized that the data, the analyst, the algorithms, the statistical techniques, and the decision makers, are all imperfect. Chakraborty and Joseph (2017) provide a formal explanation about under-fitting and over-fitting of algorithms and discuss ML and Occam further.
4. Qualitative analysis complements quantitative analysis. This tutorial emphasizes “mixed methods analyses” techniques that include qualitative techniques such as informant interviews, case studies, multimedia story-telling, and review of historical context. Besides triangulation of evidence and better understanding of actual stakeholder needs, they also recognize that quantitative techniques are imperfect and have a tendency to exaggerate claims of proof.

C. Introduction to Machine Learning (ML) Impact Evaluation
The following images (Chakraborty and Joseph, 2017; Varian, 2016) explain some of the primary differences between how ML and standard statistical algorithms are used. Physical scientists may note that the validation and testing of algorithms closely matches their approach for building simulation models (i.e. of climate change).




Athey (2018) highlights the major difference between ML’s emphasis on using algorithms and the data itself to make predictions and statistics emphasis on using statistical models to infer causal attribution. The author suggests that the differences between the 2 approaches are gradually becoming complements. The following image (Mullainathan and J. Spiess, 2017) highlight the approach taken by many analysts to discern the differences between standard statistical techniques, in this case ordinary least squares regression, and alternative ML approaches. Most of these authors conclude that ML complements, and in some cases, surpasses, conventional statistical techniques, such as regression.

The following image (self-referenced) summarizes prominent differences between the techniques employed by statistics and ML.

D. Stakeholder System Boundary and Stakeholder Engagement

Examples 5 and 6 explain the importance of ensuring that individual business and CSO activity takes place as part of a larger, integrated landscape management, or watershed planning, approach. Applying that principle to Gertler’s dataset means that improved health care delivery is 1 target needed in an overall sustainable development end goal, or SDG target. Those targets become the basis for the measurements being made in the sustainable accounting M&E business systems introduced in this tutorial.
Several references discuss basing Impact Evaluations on the needs and expectations of the impacted stakeholders. These references point out that, prior to conducting a full scale evaluation, smaller scale, mixed methods, qualitative and quantitative studies, such as case studies, key stakeholder interviews, preparation of operating and capital budgets, can help to ensure that the evaluation is really addressing the greatest concerns of the stakeholders. The point is that, as Gertler et al (2016) explain, Impact Evaluation must take place as part of an ongoing, comprehensive, societal improvement context. They must also be included at the start of societal improvement interventions. This reference argues that the SDG and SDRR make good contexts for societal improvement.
Datasets
Dataset A. Standard Statistics. Gertler et al (2016) use the following reference dataset, containing 19,800+ rows of population socioeconomic characteristics, to conduct an Impact Evaluation of a health insurance CSO social intervention. This dataset is the primary dataset used by all algorithms introduced in this example.

The following images (Gertler, 2016) define these socioeconomic characteristics. 


[Dataset B. Big Data Statistics. An additional dataset will introduce a simplified “big data” technique that breaks Gertler’s dataset into separate datasets, derived from randomly changing each of Gertler’s village datasets, using Example 5 and 6’s data structures. This dataset explores potential ML techniques that aggregate the individual datasets from this tutorial’s M&E sustainable accounting data system.]
E. Risk and Impact Identification
Gertler et al (2016) employ the following image of their results chain to summarize the logic for an Impact Evaluation being conducted for a CSO social intervention. These pathways serve as longer term risk identification and reduction mechanisms, or Impact Evaluations, than the shorter term Performance Monitoring techniques introduced in Examples 1 through 6. 

The authors use the following statement to describe the purpose of this CSO intervention. The statement confirms that Impact Evaluation focuses on the “Final Outcomes” (i.e. Impacts) part of the results chain. The key evaluation question addressed by Gertler is:
“What is the impact of HISP on poor households’ out-of-pocket health expenditures?”
The algorithms introduced in this reference supplement Gertler’s evaluation question with the following types of additional impact evaluation questions (2*). Given that Gerler’s dataset may not have been designed for these purposes, additional datasets may be introduced. Setting up these types of evaluation questions at the start of the social intervention (i.e. setting up the underlying M&E system) helps to identify the data that needs to be collected.
Evaluation Question 1. Which villages have the highest risk of not achieving targeted savings in poor households’ health expenditures (i.e. or more generally, SDG-SDRR targets)?
This evaluation is similar to how the INFORM (2018) global index of disaster risk assessment in Example 6 identifies countries that face the greatest risk of negative impacts from disasters. Chakraborty and Joseph (2017) provide a similar case study demonstrating how to use ML to detect banks in financial trouble that need to be alerted of the central bank’s concerns.
Evaluation Question 2. How accurately can the impact of recommended M&E actions be predicted for individual village, poor households’ health expenditures (i.e. or more generally, SDG-SDRR targets)?
This evaluation demonstrates a typical use of ML techniques –making predictions. The example also explores the relation between prediction and “cause and effect” attribution.
Evaluation Question 3. What are the principal factors causing improvements in individual village, poor households’ health expenditures (i.e. or more generally, SDG-SDRR targets)?
This evaluation explores the relation between prediction and “cause and effect” attribution at greater depth.
Evaluation Question 4. What are the trends in poor households’ health expenditures (i.e. or more generally, SDG-SDRR targets)?
This evaluation is similar to how the INFORM (2018) global index of disaster risk assessment in Example 6 identifies trends in individual country social risks. Chakraborty and Joseph (2017) provide a similar case study demonstrating how to use ML to forecast trends in short term inflation.
F. Quality of Life, or Sustainability, Scenarios

Typical scenarios for assessing alternative ways to achieve the SDG and SDRR targets include:

General Scenario
Threatened Quality of Life: High GHG result in 1.5ºC temperature increase with higher incidence of health care cost increases, biodiversity loss, droughts, severe heat waves, crop and livestock production risks, air pollution, floods, migration, and social discord that leads to the incapacity to achieve the SDG and SDRR targets. 
Targeted Social Performance Risks and Targets: Combinations of the SDG and SDRR targets, adapted to local stakeholders, including (2*):
* [Local] target A: Substantially reduce [local] household expenditures on health care while increasing health outcomes
* [Local] target B: Substantially reduce the number of people [locally] who do not have adequate health care
Mitigation and Adaptation Actions: Portfolio 1 consists of a) …, b)…, and c)…. 

G. Introduction to Advanced Machine Learning Impact Evaluation
Examples 1 to 5 demonstrate that full SDG-related accountability involves considerably more complexity. Watersheds, ecosystems, the 7 “community capitals”, demographics, stakeholder perspectives, and land use management, are melded together into business and organization sustainable accounting data systems. The impact pathways, impact transition states, scenarios, and hierarchical base elements, used with these accounting systems and applied through algorithms, help companies and organization to identify and reduce risks. Gertler’s dataset, and most of the Impact Evaluation references, have little or nothing to do with private and public sector sustainable accounting systems. 
The real value of machine learning comes from turning all of that complexity into actionable decision support. Specifically, companies and organizations want to know the most effective and efficient ways, such as concrete mitigation and adaptation actions, or management policies, needed to achieve the SDG and SDRR targets while advancing the company’s financial performance goals. The holy grail of sustainable development is to more fully understand the cause and effect attribution that leads to better knowledge of what works and what doesn’t work. This reference [may] demonstrate additional machine learning algorithms that turn sustainable accounting system complexity into actionable decision support results. 
Advanced machine learning algorithms will answer evaluation questions similar to the following (3*):
1. M&E: Given the changing circumstances identified by the current Performance Monitoring System, how can long term SDG and SDRR Targets be achieved, given the constraints associated with the socioeconomic and physical characteristics present in this ecosystem and community?
2. Business Value Reports: To what extent has the [company] [organization] [community] been effective and efficient in achieving SDG and SDRR targets that are relevant to our stakeholders? [examine ML causal chain algorithms]
3. Tradeoff Analysis: Given the limited resources this company or community can muster, what are the most effective, efficient, and equitable, mitigation and adaptation activities that will further the SDG and SDRR Targets for targeted stakeholder perspectives?
4. Scenario Analysis: Given the predicted impact transition states for the scenarios relevant to this company or community, which alternatives have the greatest probability of achieving cost effective, equitable, SDG and SDRR Targets for targeted groups of stakeholders?
5. Public Official and Company Executive Accountability: Given the known costs and negative societal impacts associated with human-induced hazards caused by bad public service or private sector management, what are the best mechanisms for holding the transgressors accountable for their past irresponsibility?

H. Introduction to Standard Impact Evaluation using Statistical Software
The following image (Varian, 2016) summarizes most of the statistical techniques employed by Gertler et al and their relation to machine learning.

Appendix A demonstrates carrying out Impact Evaluation using some of the standard statistical analysis techniques introduced in the Gertler et al (2016) reference. The Appendix modifies Gertler’s Stata scripts (see the Technical Companion, Version 1, September, 2016) to related statistical software scripts.
I. Introduction to Machine Learning Impact Evaluation (4*)
Appendix B introduces several ML algorithms that address specific impact evaluation questions.  Wikipedia URLs provide the definition for each algorithm. Bontempi (2017) provides the mathematical basis for each algorithm. Smola and Vishwanathan (2008) provide the actual algorithm. Examples 8A, 8B, and 8C demonstrate using alternative statistical software to run each algorithm.
Appendix B introduces the following algorithms:
Example a. Naïve Bayes. Predicts the poverty index classification of households and villages in the Gertler dataset.
Example b. Deep Neural Network. Predicts a household health care expenditure category for households and villages in the Gertler dataset.
Example c. Time Series Deep Learning. Predicts health care expenditures for upcoming years for fictitious households and villages in the Gertler dataset.
J. Decisions
Athey (2018) makes the following predictions about how ML will change the role of applied socioeconomic experts, including economists, in assisting decision making.


K. Conclusions

Gertler et al (2016), IITA and COSA (2016), and Hoebink (2014) prove that standard statistical analyses techniques have tremendous value for understanding the cause and effect attribution that helps to achieve sustainable social performance. Example 7 and 8 demonstrate that all of those techniques can be automated as online algorithms, run from subnational, national, or cloud computing data centers, and used by greater numbers of sustainability workers to monitor social performance, evaluate social impacts, report transparently, keep governments and companies aligned with societal objectives, and improve people’s quality of life.
Athey (2018) use the following statement to predict a prominent change in the role of socioeconomic experts, including applied sustainability workers. That is, they’ll become akin to RCA Technologists and RCA Assessors (5*).
“As digitization spreads across application areas and sectors of the economy, it will bring opportunities for [sustainability workers] to develop and implement policies that can be delivered digitally.”
Footnotes

1. The DevTreksStatsApi web application (DTSA) was built because standard statistical packages, such as R and Python, can only be deployed in cloud computing data centers using dedicated servers, or custom data science virtual servers. DTSA exposes a simple API that allows remote software developers to access the remote statistical packages by sending standard http request commands, containing this example’s Script URL and Data URL, and receiving http responses that return the URL holding the statistical results. These types of applications often support long running remote calculations using server-side techniques such as data queues. DevTreks will clearly document whether or not the cloud computing app will support long running calculations, and if not, DTSA will be upgraded to support that purpose. In the latter case, the remote client processes the Stat Result URL after it returns (i.e. in an email message).
2. As mentioned or implied throughout this tutorial, given that the conventional institutions in some countries appear incapable of understanding, let alone ameliorating, these conditions, networks and clubs may need to take independent action.
3. An appropriate question for “informed” software developers to ask is:
Will algorithm 350, subalgorithm 2154, really help to solve this societal problem?
4. Larger datasets take longer to load and run, which online algorithms must take into account. The first 3 examples of regression were added to 3 consecutive Indicators and run concurrently. The results were displayed after a noticeable delay, suggesting: 1) don’t run an excessive number of algorithms concurrently, 2) break the dataset into the subset data used by the algorithm prior to running the calculation, 3) do not run children series algorithms from their parents –run them separately when conducting long running calculations, and 4) use the next evolution of the DevTreksStatsAPI app to run long running calculations on remote data sciences machines.
5. This reference continually points out “doing it right” will only occur if substantial changes take place in existing, conventional, institutions. Software development is 1 means towards an institutional reform, or replacement, end goal. For example, the primary goal of RCA Technologists and RCA Assessors is evidence-based societal improvement through sound IT. In this context, open source repository publishing is far more important than academic publishing. Conventional institutions may have skewed incentives (or at least, pre-IT era performance targets). This reference predicts that consequential digital activism will find its own path.
Case Study References

Example 5, plus the following.
Susan Athey. The Impact of Machine Learning on Economics. January, 2018. last accessed: April, 2018: http://www.nber.org/chapters/c14009.pdf 
Gianluca Bontempi, Handbook Statistical foundations of machine learning. Machine Learning Group Computer Science Department Universite Libre de Bruxelles, ULB Belgique. June 2, 2017
Chiranjit Chakraborty and Andreas Joseph. Staff Working Paper No. 674 Machine learning at central banks. Bank of England. 2017. last accessed April 18, 2018: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-learning-at-central-banks.pdf?la=en&hash=EF5C4AC6E7D7BDC1D68A4BD865EEF3D7EE5D7806
S. Mullainathan and J. Spiess. Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2):87–106, 2017. Online Appendix. April, 2017
Smola, Alex and S.V.N. Vishwanathan. Introduction to Machine Learning. Yahoo! Labs Santa Clara –and– Departments of Statistics and Computer Science Purdue University –and– College of Engineering and Computer Science Australian National University. Cambridge University Press. 2008
Varian, Hal R. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives—Volume 28, Number 2—Spring 2014—Pages 3–28
Varian, Hal R. Causal inference in economics and marketing. pgs. 7310–7315, PNAS, July 5, 2016, vol. 113, no. 27
World Bank. Miscellaneous blogposts.
http://www.worldbank.org/en/news/video/2018/02/27/machine-learning-future-of-poverty-prediction


Example 8A. Introduction to Machine Learning (ML) Algorithms using .NetStandard Libraries
Algorithms: algorithm1, multiple subalgorithms; 

A. ML NetStandard Examples (algorithm1, subalgorithm_xx)
This example uses the .NetStandard mathematical libraries, MathNet, Accord, Cognitive Toolkit, and System.Math (1*), to demonstrate examples of basic ML algorithms. All of these libraries are compiled and run directly within DevTreks with no need for separate statistical or mathematical software (2*). 
Version 2.1.4 began distinguishing machine learning (ML) subalgorithms from standard statistical subalgorithms for algorithm1 using the following conventions:
a. Subalgorithm Names: Names must use a “_xx” suffix, such as “subalgorithm_01”. 
b. Indicator.URL and Score.URL Training and Testing Datasets: ML datasets must use 2 separate TEXT files –the first is for a ML training dataset, the second for a ML test dataset. The URLs to these datasets must be stored using the Indicator.URL property for Indicators and Score.URL for Scores. Most algorithms will return a new testing dataset that holds row by row calculations. This testing dataset will be stored in either an Indicator.MathResult or Score.MathResults URL. 
c. Datasets broken into subsets. Example 7, Footnote 4, explained that online algorithms must be treated differently than desktop algorithms. Specifically, they should be 1) broken into subsets of data prior to running the algorithms, 2) not used to run children algorithms, and c) run using the DevTreksStatsAPI app for long running, remote, calculations.
d. Meta MathExpression determines dataset columns. The Indicator.MathExpression and Score.MathExpression must define the columns of data to include in the analysis.
e. Score.Iterations determine number of training rows to use. Certain machine learning algorithms take a long time to process and not all algorithms need every row to adequately train the model. The Score.Iterations property determines the number of rows to use in the training dataset.
f. Data Transformation: Version 2.1.4 started using the 2nd row of the training and test datasets and the Indicator and Score metadata to pass instructions to ML algorithms. The following table defines the data transformation commands available in this release. 
2nd Row and MetaCommandPurposecolumns 1 to 3variedAlgorithm-specific instructionscolumns 4+noneDo not transform the data columncolumns 4+indexTransform the data column to integer categoriescolumns 4+textDo not transform the data column. Treat the data as a string datatype.columns 4+ and MetaqindexTransform the data column to integer categories derived from index positions for thresholds defined in Indicator.Q1 to Indicator.Q5. At least 2 thresholds must be defined.columns 4+ and MetaqcategoryTransform the data column to double categories derived from double thresholds (i.e. midpoints) used in Indicator.Q1 to Indicator.Q5. At least 2 thresholds must be defined.columns 4+ and MetaqtextTransform the double data column to double categories derived from double thresholds (i.e. midpoints) used in Indicator.Q1 to Indicator.Q5. Then transform the double categories to their corresponding Indicator.Q1Unit to IndicatorQ5Unit text categories. At least 2 thresholds must be defined.columns 4+minmax, zscore, logistic, logit, tanhNormalize the data column according to the command.Meta: Indictor.RelatedLabelAlternative statistical library name (under further planning)Run the algorithm using the library specified
g. Automatic Indicator Meta Completion: The final 3 rows of the Test TEXT file will be used to automatically fill in Indicator.meta properties. 
h. Manual Indicator Meta Completion: The quantitative Indicator.meta properties (i.e. Q1, Q2, Q3, Q4, Q5, QT, QTD1, QTD2, QTM, QTL, and QTU) are only replaced by the algorithms when their starting values = 0 or “none”. These algorithms assume that when their starting values are not 0, the analyst has made manual calculations based on the mathematical results (i.e. manual predicted values, confidence intervals, or algorithm instructions). In the same vein, the text Indicator.meta properties (i.e. Q1Unit, Q2Unit, Q3Unit, Q4Unit, Q5Unit, QTUnit, QTD1Unit, QTD2Unit, QTMUnit, QTLUnit, and QTUUnit) are only replaced when their values are string.empty, null, or “none”. 
i. TEXT csv Math Results: When possible, statistical results are returned as row-column TEXT, csv, datasets (i.e. see Examples 1 to 5). 
Example a. Naïve Bayes (algorithm1, subalgorithm_01) (3*)
System.Math and MathNet URL: 
Many of the machine learning datasets exceed the 500KB db storage limit. They are not stored in the db and therefore can’t be automatically stored in file systems and blobs by previewing them. They must be manually uploaded to their referenced Resource. The Source Code reference has a URL to a zip file where these files can be obtained.
https://www.devtreks.org/greentreks/preview/carbon/resourcepack/Resource Conservation Value Accounting Example 8A/1563/none
https://www.devtreks.org/greentreks/preview/carbon/output/Machine Learning 8A/2141223488/none
http://localhost:5000/greentreks/preview/carbon/resourcepack/Resource Conservation Value Accounting Example 8A/549/none
[requires first uploading the dataset manually because of its size]
http://localhost:5000/greentreks/preview/carbon/output/Machine Learning 8A/2141223503/none 
McCaffrey (MSDN, February, 2013) uses System.Math to demonstrate using this algorithm to make predictions about how a dependent variable, or label, can be classified based on the independent variables, or features. The following image introduces McCaffrey’s code for this algorithm. The code is available as open access from the referenced issue. McCaffrey cautions that the purpose of these algorithms is to teach software developers how to build better algorithms. All of McCaffrey’s code has been modified for this tutorial’s M&E sustainability accounting system.

The following images show that the dataset and code have been modified to run as a standard DevTreks machine learning algorithm. The data conventions for ML subalgorithms for algorithm1 include:
* Statistical library name specified using the Indicator.RelatedLabel property
* separate training and test datasets stored in Indicator.URL, 
* 2nd row of datasets pass instructions to the algorithm (in this image, the 1st column tells the algorithm whether or not to include Laplacian data transformation, and columns 4 to 7 retain their text categorical data). The 1st source code image show that all of the initial data goes through data normalization prior to being used in the analysis.
* 3 columns of row identifiers, 
* dependent variable in the 4th column.
* remaining columns store the independent variables.
* MathExpression defines the columns of independent variables to include in the analysis
* Score.Iterations determines number of training dataset rows to use
* Categories of dependent variable data optionally defined by the Indicator.Q1 to Indicator.Q5 properties

In the following table, the 2nd row data instructions include the command, qcategory, which converts the double data type to a) double category thresholds defined by Indicator.Q1 to Q5 for analysis and b) text thresholds defined by Indicator.Q1Unit to Indicator.Q5Unit for reporting. The poverty_index and hhsize columns are normalized by their respective commands.


…

The following image shows that the algorithm is predicting the poverty index classification of households in the Gertler dataset. These poverty_index categories were introduced in Example 5. This example used Indicator.Q1 to Indicator.Q3 to categorize the poverty_index label into 3 poverty categories (high, medium, and low). The low probabilities reflect the large number of categories in both the household size and land size independent variables. Example 7 and Appendix B discuss the importance of testing and validation to fine tune these algorithms.
The best way to fill in the Indicator.meta properties using classifying algorithms is not clear yet (i.e. what is a confidence interval in this type of algorithm?). This example manually set the Indicator.QTM, QTL, and QTU, properties.
 

[The Accord Nuget packages were removed from Version 2.1.6 due to their alpha status].
Accord URL:
http://accord-framework.net/docs/html/T_Accord_MachineLearning_Bayes_NaiveBayes.htm
http://accord-framework.net/docs/html/T_Accord_Statistics_Filters_Codification.htm
https://www.devtreks.org/greentreks/preview/carbon/output/ML 8A Accord/2141223489/none
The following images show that Version 2.1.4 allows multiple statistical libraries to be used to carry out most machine learning algorithms by setting the Indicator.RelatedLabel to the name of the library (or in the source code, Label2). The default uses standard .netstandard libraries. This feature will not be fully functioning until R and Python ML algorithms are fully supported and documented in the CTA02 and CTA03 references in the Technology Assessment tutorial.

This is the first algorithm built using Accord and that machine learning library is still being critically assessed. The following 2 tables contrast the MathResults using System.Math and Accord.  
System.Math MathResults: Only the final row of data has not been classified property, but inspection of the 4 predictor inputs explains why (i.e. dirtfloor = 0, land = 0, householdsize = 0)

[The Accord Nuget packages were removed from Version 2.1.6 due to their alpha status].
Accord MathResults: For testing purposes, the following table displays Accord’s integer transformation of the previous table’s data using Accord’s codebook. These results failed to classify the low poverty indexes properly. Accord’s documentation for codebooks explains about the need to preprocess some data and use more advanced processing options. The reason for these transformations suggest a learning curve (and closer attention to the R and Python examples). 

Example b. Deep Neural Network Predictions (algorithm1, subalgorithm_02)
System.Math and MathNet URLs:
https://www.devtreks.org/greentreks/preview/carbon/input/Machine Learning 8Ab/2147397565/none
http://localhost:5000/greentreks/preview/carbon/resourcepack/Resource Conservation Value Accounting Example 8A/549/none
http://localhost:5000/greentreks/preview/carbon/output/Machine Learning 8A/2141223503/none
McCaffrey  (MSDN, August and September, 2017) uses System.Math to demonstrate using this algorithm to make predictions for categories of dependent variables. The following image displays McCaffrey’s model. All of McCaffrey’s algorithms are available as open access.


The following image shows that the algorithm is trying to predict the health expenditure classification of households in the Gertler dataset. The 2nd row in the table specifies the types of data transformation to use in each column.

The following Indicator.MathExpression specifies the subset of column data used in the analysis.
I1.Q1.treatment_locality + I1.Q2.hhsize + I1.Q3.dirtfloor

The Score.Iterations in the following image specifies the subset of training dataset rows used in this analysis (i.e. 2000 out of 5600+). The complete dataset runs very slowly and is a candidate for remote data sciences machines. When the confidence interval is filled in, it will be used to calculate a confidence interval for the Indicator.QTM and Score.QTM. Alternatively, Example c demonstrates that a threshold value can be added to the first column of the ML instructions row. In the latter case, the value is substracted from Indicator.QTM to derive Indicator.QTL or added to get Indicator.QTU.

The following 2 images of the Indicator.MathResults compares the difference when the dataset is trained with 2000 rows versus the full 5600+ rows in the second table. Neural networks are important machine learning algorithms and will continue to be explored as more ML algorithms are developed and tested.


Cognitive Toolkit URL:
The following image displays the code for this algorithm. Note that the current version of this toolkit is only compatible with netframework, not the netstandard framework used by DevTreks. This example assumes the toolkit will eventually migrate over, but until it does, the libraries are not included with DevTreks. For now, this example is included as evidence that .Net libraries support machine learning.
Under planning for a future release.
Example c. Time Series Neural Network (algorithm1, subalgorithm_03)
System.Math and MathNet URL:
https://www.devtreks.org/greentreks/preview/carbon/input/Machine Learning 8Ac/2147397566/none
http://localhost:5000/greentreks/preview/carbon/input/Machine Learning 8Ac/2147409851/none
McCaffrey (MSDN, October, 2017, April, 2018) uses System.Math to demonstrate using this algorithm to make time series predictions. Example b’s dataset was altered to time series data by adding 4 additional columns of health expenditures collected for 4 quarters (+-5% and +-10% of actual health cost). The threshold used to determine the accuracy of the dataset has been set to $2.50 in the first column of the ML instructions row. Predictions that are within $2.50 return TRUE in the accuracy column. The following Indicator.MathResults occur when the Score.Iterations is set to 2000, thereby truncating the data used to train the network (from the full 5600+ row datasets). Given that the dataset has been sorted from low to high health expenditures, it’s not surprising that the algorithm does a reasonable job of making time series predictions for lower health care costs (i.e. compare the health_expenditures column with the qtm column).

The following results take place when the full dataset, consisting of 5600+ rows, is used to train the network. Now the algorithm does a reasonable job only with high initial health care costs. Production machine learning algorithm development requires additional understanding and work.

Cognitive Toolkit URL:
The following image displays the code for this algorithm.
Under planning for a future release.
B. Conclusions

Algorithm1 uses open source statistical libraries that have been compiled and run within DevTreks to develop custom algorithms. This approach accesses the libraries directly through their exposed APIs. Unlike Examples 8B and 8C, they require no separate statistical packages with separate deployment, related batch files, and DLL syntax. Their primary advantages include fine-tuned control over algorithm development, increased performance, flexible deployment, and no reliance on “black box” algorithms. Their primary drawbacks involve the maturity and institutional backing of the platforms when compared to alternative statistical libraries (with allowances for System.Math and CNTK).
Footnotes
1. System.Math is the internal mathematical library found in the open source .NET Standard 2.0. MathNet and Accord are open source statistical libraries that use MIT licenses which are similar to DevTreks license (see the References section). This release runs the libraries from their latest Nuget packages (i.e. compiled and distributed with the DevTreks source code). Prior to this release, the MathNet source had been included as a separate project, but proved hard to keep updated. DevTreks may also employ a Nuget package approach for the data layer in an upcoming release. Version 2.1.6 removed the Accord Nuget package because the source is still in alpha testing status.
2. MathNet and Accord, like DevTreks, appear to originate with one person. These technologists may have been displeased with the conventional approaches used by most of their peers and decided to take an independent, automated, digital, approach in their field of specialty. Caution must be exercised because these efforts seldom have the institutional backing of large teams with sufficient budgets. They have their role, but the next generation needs to critically assess whether they can not only “do it right”, but also “do it better”.
References
McCaffrey, James. MSDN Magazine. Various issues referenced in the Example. Microsoft Corporation. Source code is available as open access by going to the referenced issues.
Version 2.1.6 removed the Accord Nuget package because the source is still in alpha testing status
Accord. Last accessed April 24, 2018 (currently being reviewed): 
http://accord-framework.net/
https://github.com/accord-net/framework
MathNet. Last accessed April 24, 2018: 
https://numerics.mathdotnet.com/
https://github.com/mathnet/mathnet-numerics
System.Math. Last accessed April 24, 2018: 
https://msdn.microsoft.com/en-us/library/system.math(v=vs.110).aspx
https://docs.microsoft.com/en-us/dotnet/api/?view=netstandard-2.0&term=math
https://github.com/dotnet/standard
 [Cognitive Toolkit. Last accessed April 24, 2018: 
Not yet compatible with netstandard 2.0 and not distributed with DevTreks: 
https://docs.microsoft.com/en-us/cognitive-toolkit/
https://github.com/Microsoft/CNTK] 
review
https://blogs.msdn.microsoft.com/dotnet/2018/05/07/introducing-ml-net-cross-platform-proven-and-open-source-machine-learning-framework/


Example 8B. SDG Stakeholder Resource Conservation Value Accounting using R
Algorithms: algorithm2, various subalgorithms; 
URLs: under planning for 216

A. ML R Examples (algorithm2, subalgorithm_xx)
This example uses the R statistical library to demonstrate the following examples of basic ML algorithms. The R Library, as explained in the CTA02 reference, must be deployed as a stand-alone software library that is accessed using batch TEXT files.
Version 2.1.4 began distinguishing machine learning (ML) subalgorithms from standard statistical subalgorithms for algorithm2 using the following conventions:
a. Subalgorithm Names: Names must use a “_xx” suffix, such as “subalgorithm_01”. 
b. Indicator.URL and Score.URL Datasets: ML datasets must use 2 separate TEXT files –the first is for an R statistical script TEXT file, the second for a ML TEXT dataset. The URLs to these datasets must be stored using the Indicator.URL property for Indicators and Score.URL for Scores.
c. Training, Test, and Subset Datasets: The R statistical script TEXT file must hard code how the Data TEXT file is broken into training, testing, and subset datasets. 
d. Data Transformation. The R statistical script TEXT file must hard code any dynamic data transformation needed in the dataset. 
e. Manual and Automatic Indicator Meta Completion: uses the same rules as Example 8A employed using R statistical script.
Example a. Naïve Bayes (algorithm2, subalgorithm_01)
The following image displays the code for this algorithm.
Under planning for a future release.
Example b. Logistic Regression (algorithm2, subalgorithm_02)
The following image displays the code for this algorithm.

Under planning for a future release.
Example c. Time Series Neural Network (algorithm2, subalgorithm_03)
The following image displays the code for this algorithm.
Under planning for a future release.

B. Conclusions
R (algorithm2) fully supports standard and ML algorithm development.



Example 8C. SDG Stakeholder Resource Conservation Value Accounting using Python (1*)
Algorithms: algorithm3, various subalgorithms; 
URLs: : under planning for 216

A. ML Python Examples (algorithm3, subalgorithm_xx)
This example uses the Python statistical library to demonstrate the following examples of basic ML algorithms. The Python Library, as explained in the CTA03 reference, must be deployed as a stand-alone software library that is accessed using batch TEXT files.
Version 2.1.4 began distinguishing machine learning (ML) subalgorithms from standard statistical subalgorithms for algorithm3 using the following conventions:
a. Subalgorithm Names: Names must use a “_xx” suffix, such as “subalgorithm_01”. 
b. Indicator.URL and Score.URL Datasets: ML datasets must use 2 separate TEXT files –the first is for a Python statistical script TEXT file, the second for a ML TEXT dataset. The URLs to these datasets must be stored using the Indicator.URL property for Indicators and Score.URL for Scores.
c. Training, Test, and Subset Datasets: The Python statistical script TEXT file must hard code how the Data TEXT file is broken into training, testing, and subset datasets. 
d. Data Transformation. The Python statistical script TEXT file must hard code any dynamic data transformation needed in the dataset. 
e. Manual and Automatic Indicator Meta Completion: uses the same rules as Example 8A employed using the Python statistical script.
Example a. Naïve Bayes (algorithm3, subalgorithm_01)
The following image displays the code for this algorithm.
Under planning for a future release.
Example b. Logistic Regression (algorithm3, subalgorithm_02)
The following image displays the code for this algorithm.
Under planning for a future release.
Example c. Time Series Neural Network (algorithm3, subalgorithm_03)
The following image displays the code for this algorithm.
B. Under planning for a future release.
C. Conclusions
Python (algorithm3) fully supports standard and ML algorithm development.



Appendix A. Standard Statistical Impact Evaluation Analysis
A. Introduction to Standard Impact Evaluation
The following examples demonstrate carrying out Impact Evaluation using some of the standard statistical analysis techniques introduced in the Gertler et al (2016) reference. Sections B and C modify the following statistical scripts found in Gertler et al’s Technical Companion (Version 1, September, 2016) to related statistical software scripts.
Example a. Randomized Assignment in a Regression Framework
The following Stata script concludes that health expenditures decreased by $10.14 from the intervention (with an average expenditure without the treatment of $17.98).

The following images of multivariate regression analysis explains why, with this randomized dataset, multivariate regression doesn’t contribute much to the statistical conclusions.


Example b. Randomized Treatment Regression Analysis
The following Stata script concludes that health expenditures decreased by $10.72 from the intervention (with an average expenditure without the treatment of $20.06).



Example c. Difference in Differences Regression Analysis
The following Stata script concludes that health expenditures decreased by $8.16 from the intervention (with an average expenditure without the treatment of $20.79).

The following Stata script concludes that health expenditures decreased by $8.16 from the intervention (with an average expenditure without the treatment of $17.02).

Example d. Discontinuity Regression Analysis
The following Stata script concludes that health expenditures decreased by $11.19 from the intervention (with an average expenditure without the treatment of $20.55).

Example e. Propensity Score Matching Regression Analysis
The following Stata script concludes that savings in health care expenditures can be explained best by the dirt floor and household size variables.



The following Stata script concludes that health expenditures decreased by $9.97 from the intervention.


B. Introduction to Standard Impact Evaluation using .NetStandardLibraries (algorithm1)
Stock Input Calculator
http://localhost:5000/greentreks/preview/carbon/input/Impact Evaluation 7, 1 and 2/2147409847/none
Example a. Randomized Assignment in a Regression Framework (algorithm1, subalgorithm6)
The following script and image display the equivalent results using MathNet.

The following image shows the result when the MathExpression didn’t follow the required algorithm1 dataset conventions: the 1st 3 columns are row identifiers, the 4th column is the dependent variable, and the remaining independent variables are identified by the MathExpression (i.e. healthexpenditures is the dependent variable (y) column and does not belong in the expression).

Examples b to e. Additional Regression 
Example a is the only regression algorithm completed using the .NetStandard libraries. It’s very likely the remaining regressions can be developed using these libraries, but this reference’s current focus is on developing Machine Learning algorithms. Demonstration of the remaining regression techniques is deferred to a possible future release.
C. Introduction to Standard Impact Evaluation using R
Cloud testing is planned for 216.
https://www.devtreks.org/greentreks/preview/carbon/resourcepack/Resource Conservation Value Accounting, RCA Example 7B/1562/none
http://localhost:5000/greentreks/preview/carbon/resourcepack/Impact Evaluation, RCA Example 7B/548/none

M&E Input Calculator
http://localhost:5000/greentreks/preview/carbon/input/Impact Evaluation 7, 1 and 2/2147409847/none
Stock Input Calculator
http://localhost:5000/greentreks/preview/carbon/input/Impact Evaluation 7A-3/2147409848/none

The following examples modify the Stata statistical scripts found in Gertler et al’s Technical Companion (Version 1, September, 2016) to R statistical package scripts (4*). Given that related CTA references already introduce statistical libraries, including R and Python (i.e. algorithms 2 and 3), which support those techniques, this example does not provide additional documentation for the statistical techniques.
Example a. Randomized Assignment in a Regression Framework (algorithm2, subalgorithm1)
The following script and image demonstrate using R to run this regression. The image also demonstrates that Version 2.1.4 has been upgraded to more fully support the R and Python algorithms (algorithms 2, 3, 4, and 5 are being upgraded for use with machine learning algorithms).
args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset1 <- read.table(url, header=TRUE, sep=",")
#dataset2 is dataset1 minus the last 3 lines of data
dataset2 <- head(dataset1, n=-3)
#dataset3 is the last 3 lines of data used for ci
dataset3 <- tail(dataset1, n=3)
model <- lm(health_expenditures ~ treatment_locality, data=dataset2)
f1 <- summary(model)
print(f1)
ci <- predict(model, dataset3, interval='confidence')
print(ci)


t
The following image shows that a DevTreks convention with several algorithms is to use the last 3 rows of a dataset to generate confidence intervals for 3 predicted ranges of values. Version 2.1.4 upgraded the rules for several algorithms to use these 3 predicted ranges in the same manner as introduced in Example 5 -to fill in Indicator metadata properties for most likely, low, and high, predictions for 3 sets of Indicator.metadata properties. The analyst must fill in appropriate units for the 3 predictions (i.e. Example 5’s benchmarks, targets, and actuals). 
In this image, the reason why Gertler’s last 3 rows of data generate the same prediction, even though each has different household expenditures has something to do with the R command’s predict() function, but is not explored here (i.e. F value = 2416?, binomial explanatory variables?) (3*). 

 
Gertler’s Stata regression scripts include a cluster command, or cl(locality_identifier) which can be replicated using the R package’s miceadds package with the following script. Given that this script only changes the standard errors, but not the coefficients or statistical validity of the results, these scripts will not cluster this dataset further.

The following R Script and image confirm that the 10 variable limit for explanatory variables is only enforced with algorithm1. Even if not required, DevTreks still recommends the 10 explanatory variable limit. 
model <- lm(health_expenditures ~  treatment_locality + age_hh + age_sp + educ_hh + educ_sp + female_hh + indigenous + hhsize + dirtfloor + bathroom + land + hospital_distance, data=dataset2)
 






\



The reason why the final 3 predicted ranges differ from the previous results has not been explored for this release.




Example b. Randomized Treatment Regression Analysis (algorithm2, subalgorithm1)
The following script and image demonstrate using R to run this regression. This statistical analysis technique, known as two stage least squares regression analysis, requires loading a custom R package, AER. The model’s formula employs “subset” syntax because this dataset is the full 19,000+ row dataset but only 9,000+ rows are needed by this regression (5*). The R predict() function did not return valid confidence intervals for the 3 predicted ranges demonstrated in the previous regression example and was replaced by the alternative confint() function.
install.packages("AER")
library(AER)
args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset1 <- read.table(url, header=TRUE, sep=",")
model <- ivreg(health_expenditures ~ enrolled | treatment_locality, data=dataset1, subset = round == "1")
f1 <- summary(model)
print(f1)
ci <- confint(model, 'enrolled', level=0.80)
print(ci)

The following images show that the R results match the Stata results. The confit() function returned an 80% ci for the estimated housing savings coefficient (i.e. $10.72 +- 0.44) which was used to manually fill in the QTM, QTL, and QTU properties. Similar techniques can be used for the benchmark and targets but are not demonstrated in this example (3*).
 

 
Example c. Difference in Differences Regression Analysis (algorithm2, subalgorithm1)
The following script and image show that the R results match the Stata results. In this example, the QTM, QTL, and QTU, values were manually entered from the R results and the calculations were run a second time to process the edits.

args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset1 <- read.table(url, header=TRUE, sep=",")
#did regression
model <- lm(health_expenditures ~ round * eligible, data=dataset1, subset = treatment_locality == "1")
f1 <- summary(model)
print(f1)
ci <- confint(model, 'round:eligible', level=0.80)
print(ci)

 

The following R script and image shows similar results. The confit() function for generating a confidence interval does not work with the plm package. In practice, the interval must be calculated from the xtenrolled t-statistic. For convenience, they were taken from the previous example.

args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset1 <- read.table(url, header=TRUE, sep=",")
#generate a new column
dataset1$xtenrolled <- '0'
#change column values based on condition
dataset2 <- within(dataset1, xtenrolled[enrolled == '1' & round == '1'] <- '1')
#fixed effects regression
install.packages("plm")
library(plm)
model <- plm(health_expenditures ~  xtenrolled + round, data=dataset2, subset = treatment_locality == "1", index=c("household_identifier", "round"), model="within")
f1 <- summary(model)
print(f1)

 

Example d. Discontinuity Regression Analysis (algorithm2, subalgorithm1)
The following R script and image derive from the Baumer reference. Although not a perfect match to the Stata results, the statistical results are similar. A new Version 2.1.4 rule caused the quantitative Indicator.meta properties to be filled in automatically. When the final lines of script in algorithms 2, 3, 4, and 6, return a 3 row, 4 column, dataset with titles that include the term “predict”,  or when a predict() function is used in the script, the Indicator.meta quantitative properties will be filled in automatically based on the final 3 rows of data. The analyst must still fill in their units of measurement manually.

args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset1 <- read.table(url, header=TRUE, sep=",")
dataset2 <- subset(dataset1, dataset1$round == '1'& dataset1$treatment_locality == '1')
model<-lm(dataset2$health_expenditures~I(dataset2$poverty_index-58)*dataset2$eligible) 
f1 <- summary(model)
print(f1)
mean = coef(summary(model))[1]
mean2 = mean-2.5
mean3 = mean+2.5
savings <- coef(summary(model))[3] * -1
df1<-data.frame(predict=(c(mean,mean2,mean3)),low=(c(mean-savings,mean2-savings,mean3-savings)),high=(c(mean+savings,mean2+savings,mean3+savings)))
print(df1)


Example e. Propensity Score Matching Regression Analysis (algorithm2, subalgorithm1)
The following script and image demonstrate using R to run this regression. This script does not reshape the data but returns similar results.

args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset1 <- read.table(url, header=TRUE, sep=",")
#data_wide <- dcast(dataset1, household_identifier ~ round, 
#value.var = c("health_expenditures", "age_hh", "age_sp", "educ_hh", "educ_sp", "hospital"))
model <- glm(enrolled ~ age_hh+age_sp+educ_hh+educ_sp+female_hh+indigenous+hhsize+dirtfloor
+bathroom+land+hospital_distance, family = binomial(link = "probit"), data = dataset1)
f1 <- summary(model)
print(f1)
dataset2 <- tail(dataset1, n=3)
ci <- predict(model, dataset2, interval='confidence')
print(ci)

The following R script and image shows similar results. R contains several more advanced packages for conducting propensity score matching. The Indicator.meta displays 3 predicted household expenditures (i.e. benchmark, target, actual) based on the last 3 rows of data.

args <- commandArgs(TRUE)
url <- args[1]
print(url)
dataset2 <- read.table(url, header=TRUE, sep=",")
dataset3 <- subset(dataset2, round == '1')
# generate propensity scores for all of the data
ps.model <- glm(enrolled ~ age_hh+age_sp+educ_hh+educ_sp+female_hh+indigenous+hhsize+dirtfloor
	+bathroom+land+hospital_distance, data = dataset3, family=binomial(link="logit"), na.action=na.pass)
#summary(ps.model)
# add pscores to study data
dataset3$pscore <- predict(ps.model, newdata = dataset3, type = "response")
# distribution of ps
#summary(dataset3$pscore)
dim(dataset3)
# restrict data to ps range .10 <= ps <= .90=
dataset4 <- dataset3[dataset3$pscore >= .10 & dataset3$pscore <=.90,]
summary(dataset4$pscore)
 
# regression with controls on propensity score screened data set
model <- lm(health_expenditures~enrolled+age_hh+age_sp+educ_hh+educ_sp+female_hh+indigenous+hhsize+dirtfloor
	+bathroom+land+hospital_distance, data = dataset4)
 
f1 <- summary(model)
print(f1)
dataset2 <- tail(dataset4, n=3)
ci <- predict(model, dataset2, interval='confidence')

print(ci)


D. Introduction to Standard Impact Evaluation using Python (algorithm3) (1*)
The following examples demonstrate applying these principles. This example uses algorithm3 to modify Example 7’s Stata statistical scripts (1*).
Example a. Randomized Assignment in a Regression Framework (algorithm3, subalgorithm1)
The following script and image display the equivalent results. 

Example b. Randomized Treatment Regression Analysis (algorithm3, subalgorithm1)
The following script and image display the equivalent results. 

Example c. Difference in Differences Regression Analysis (algorithm3, subalgorithm1)
The following script and image display the equivalent results.

Example d. Discontinuity Regression Analysis (algorithm3, subalgorithm1)
The following script and image display the equivalent results.

Example e. Propensity Score Matching Regression Analysis (algorithm3, subalgorithm1)
The following script and image display the equivalent results.

Case Study References

Example 5, plus the following.
https://cran.r-project.org/doc/manuals/r-release/R-intro.html
https://cran.r-project.org/web/views/
https://cran.r-project.org/doc/manuals/r-release/R-data.html
Standard Stats
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer Texts in Statistics. 2017
https://www.rdocumentation.org/packages/AER/versions/1.2-5/topics/ivreg
https://dss.princeton.edu/training/
Baumer, Patricia. Regression Discontinuity. last accessed: April 13, 2018: faculty.smu.edu/kyler/courses/7312/presentations/baumer/Baumer_RD.pdf
https://sejdemyr.github.io/r-tutorials/statistics/tutorial8.html
Footnotes
1. Although Python examples haven’t been prepared yet, that doesn’t imply any preference for statistical packages. DevTreks endorses any package that is open source or does not charge fees, can be automated by software developers, and carries out calculations that can be verified using reference datasets. DevTreks assumes additional statistical packages, such as Julia (algorithm6), will be supported in future releases.



Appendix B. Machine Learning Statistical Impact Evaluation Analysis
A. Introduction to Machine Learning Impact Evaluation
The following examples demonstrate carrying out Impact Evaluation using Machine Learning techniques that correspond to Gertler et al’s (2016) standard statistical techniques.
This section introduces several ML algorithms that address specific impact evaluation questions.  Wikipedia URLs provide the definition for each algorithm. Bontempi (2017) provides the mathematical basis for each of these algorithms. Smola and Vishwanathan (2008) provide the actual algorithm. Only the first example demonstrates how these authors explain the algorithms. Examples 8A, 8B, and 8C demonstrate using alternative statistical software to run each algorithm.
In the following examples, the ML term “labels” refer to the equivalent statistical term “y = dependent variable”. The ML term “features” refer to the equivalent statistical term “x = independent variables”. ML training datasets teach ML algorithms how to make predictions for ML testing datasets. 
Example a. Naïve Bayes. This classification algorithm uses training datasets to predict the classification of the members of a testing dataset. The following examples predict the poverty index classification of households and villages in the Gertler dataset.
Wikipedia definition: https://en.wikipedia.org/wiki/Naive_Bayes_classifier
The following image (Smola and Vishwanathan, 2008) displays this algorithm.

The following image (Bontempi, 2017) displays the underlying mathematics.


The impact evaluation question addressed by this example is:
How accurately can the poverty status of individual households and villages be predicted?
The following image shows that the dependent variable, poverty_index, for the dataset used in this example has not yet been classified into categories. The 2nd row passes the following instructions to the algorithm: 
	column1: laplacian=true (use Laplacian data transformation), 
	column4: qcategory (classify the poverty_index into categories that can be defined by Indicator.Q1 to Q5). 
This dataset used all of the rows in the Gertler dataset but only a subset of columns. Up to 5 categories can be classified this way by filling in properties for Q1 to Q5. Example 8A demonstrates that the category names will be taken from the corresponding Q1Unit to Q5Unit properties.

Examples
https://web.stanford.edu/class/cs124/lec/naivebayes.pdf
Example b. Deep Neural Network. This algorithm uses training datasets to predict the value of an explanatory variable for the members of a testing dataset. The following examples predict a household health care expenditure category (high costs, medium costs, low costs) for households and villages in the Gertler dataset.
Wikipedia definitions: 
https://en.wikipedia.org/wiki/Deep_learning
https://en.wikipedia.org/wiki/Artificial_neural_network
The impact evaluation question addressed by this example is:
How accurately can the approximate size of household health care expenditures be predicted for individual households and villages?
In the following dataset, the label is the health_expenditures. The features are the hhsize, dirtfloor, land, and genderhh, parameters. By 2.1.4 conventions, the final 3 rows of the Test TEXT dataset are used to fill in Indicator.meta properties. 

Examples:
https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_101_LogisticRegression.ipynb
Example c. Time Series Neural Network. This deep learning algorithm uses training datasets to predict the health care expenditures for upcoming years for households and villages in the Gertler dataset.
Wikipedia definition https://en.wikipedia.org/wiki/Time_series
https://en.wikipedia.org/wiki/Artificial_neural_network
The impact evaluation question addressed by this example is:
What are the trends in household expenditures for individual poor households and villages?
. The following table shows that Gertler’s dataset has been modified to include time series data. . Example b’s dataset was altered to time series data by adding 4 additional columns of health expenditures collected for 4 quarters (+-5% and +-10% of actual health cost). 

Examples:
https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2017/machine-learning-at-central-banks.pdf?la=en&hash=EF5C4AC6E7D7BDC1D68A4BD865EEF3D7EE5D7806


 DevTreks –social budgeting that improves lives and livelihoods


1