CSRHub Blog Research on ESG metrics and comments on sustainability best practice

A Big Data Approach to Gathering CSR Data

[fa icon="calendar'] Oct 28, 2014 9:19:20 AM / by Bahar Gidwani

The following is part 2 of a 3-part series on “Big Data.”

By Bahar Gidwani

We have previously defined “Big Data” and shown how we feel a Big Data system built by CSRHub could help address some problems that exist in collecting corporate social responsibility (CSR) and sustainability data on companies.  We have also further described the problems with the currently dominant method of gathering this data—an analyst-based method.

CSRHub uses input from investor-driven sources (known as “ESG” for Environment, Social, and Governance or “SRI” for Socially Responsible Investment), non-governmental organizations, government organizations, and “crowd sources” to construct a 360 degree view of a company’s sustainability performance.

The illustration below shows the steps in our process.

CSRHub Ratings

The steps are:

  •  Convert measurement from each data source into a 0 (low) to 100 (high) scales.  This  requires understanding how each source evaluates company performance.
  •  We next connect each rating element with one or more of our twelve subcategory  ratings.  (Some elements may also map partially or exclusively to special issues such as  animal testing, fracking, or nuclear power.)
  •  We compare each source’s ratings with those for all other sources.  Each company we  study gives us more opportunities to compare one source’s ratings with another.  The  total number of comparisons possible is very large and growing, exponentially.  We  use the results of our comparison to adjust the distribution of scores for each rating  source so that they fall into a “beta” distribution that has a central peak around 50.
  •  Some sources match up well with all of our other data.  Some sources don’t line up.  We add weight to those who match well but continue to “count” those who don’t.

We then repeat steps A to D as many times until we have found a “best fit” for the available data.  Each time we add a new source, we go through an initial mapping, normalization, and weighting process.

An Example

It may help explain our data analysis process by using a specific example.  Hewlett Packard is a heavily tracked company.  We have 154 sources of data for this company that together provide 17,571 individual data elements.  Only 62 of these data sources provided data for our July 1, 2014 rating—the rest of the data sources provided data for previous periods (our data set goes back to 2008).  The 62 current data sources provided 575 different types of rating elements and a total of 610 different ratings values that do not affect/apply to special issues.

After their conversion to our 0 to 100 scale, we map the rating elements into our twelve subcategories.  We now have 1,403 ratings factors.  We selected our subcategories to allow an even spread of data across them. You can see that we have a reasonably even spread for Hewlett Packard:

CSRHub Category

Number of Data Elements

Board

95

Community Dev & Philanthropy

78

Compensation & Benefits

63

Diversity & Labor Rights

95

Energy & Climate Change

149

Environment Policy & Reporting

154

Human Rights & Supply Chain

77

Leadership Ethics

205

Product

93

Resource Management

156

Training, Health & Safety

48

Transparency & Reporting

190

Total

1,403

Before we can present a rating, we need to check first that we have enough sources and enough “weight” from the sources we have, to generate a good score.  In general, we require at least two sources that have good strength or three or four weaker ones, before we offer a rating.  As you can see, we have plenty of sources to rate a big company such as HP.

CSRHub subcategory sources

Even after normalization, the curve of ratings for any one subcategory may have a lot of irregularities.  However, we have enough data to provide a good estimate of the midpoint of the available data, for those ratings we report.  Below you can see that some sources have a high opinion of HP’s board while others have a less favorable view.  The result is a blended score that averages to less than the more uniform Leadership Ethics rating.

CSRHub subcategory rating variations


The overall effect of our process is to smooth out the ratings input and make them more consistent.  As you can see in the illustration below, the final ratings distribution is organized well around a central peak.  The average overall rating of 64 is below the peak, which is around 80.  The original average rating was 61.

analysis charts part 2

By making a few assumptions about how the errors in data are distributed, one can assess the accuracy of ratings.  In a previous post, we showed that CSRHub’s overall rating accurately represents the values that underlie it to within 1.8 points at a 95% confidence interval.

In our next post, we will discuss the benefits and drawback of using this complex and data intensive approach to measuring company CSR performance.

See part 1, Using “Big Data” to Rate Corporate Social Responsibility: One Company’s Approach.


Bahar GidwaniBahar Gidwani is CEO and Co-founder of CSRHub.  He has built and run large technology-based businesses for many years. Bahar holds a CFA, worked on Wall Street with Kidder, Peabody, and with McKinsey & Co. Bahar has consulted to a number of major companies and currently serves on the board of several software and Web companies. He has an MBA from Harvard Business School and an undergraduate degree in physics and astronomy. Bahar is a member of the SASB Advisory Board.  He plays bridge, races sailboats, and is based in New York City.

CSRHub provides access to corporate social responsibility and sustainability ratings and information on 9,200+ companies from 135 industries in 106 countries. By aggregating and normalizing the information from 348 data sources, CSRHub has created a broad, consistent rating system and a searchable database that links millions of rating elements back to their source. Managers, researchers and activists use CSRHub to benchmark company performance, learn how stakeholders evaluate company CSR practices and seek ways to change the world.

 

Read More [fa icon="long-arrow-right"]

[fa icon="comment"] 0 Comments posted in Accountability, Asset4/Thomson Reuters, Bahar Gidwani, Better World, Big Data, Carbon Disclosure Project, CorporateRegister, CR 100, EICC, EIO, EIRIS, EPEAT, ESG, FCPA, GovernanceMetrics International/Corporate Library, Government & Consumer, social, Investment-related sources, UN Global Compact, Uncategorized, Working Mother, IW Financial, MSCI, socially responsible investment, Top 50 Socially Responsible, Trucost, UNODC, Vigeo, Activists and NGOs, and Governance, Black Engineer, BSR, CSRHub, environment, Hewlett Packard, SRI

A Big Data Approach to Gathering CSR Data

[fa icon="calendar'] Sep 26, 2012 9:45:50 AM / by Bahar Gidwani

The following is part 2 of a 3-part series on “Big Data.”

By Bahar Gidwani

We have previously defined “Big Data” and shown how we feel it could help address some problems that exist in collecting corporate social responsibility (CSR) and sustainability data on companies.  We have also further described the problems with the currently dominant method of gathering this data—an analyst-based method.

CSRHub uses input from investor-driven sources (known as “ESG” for Environment, Social, and Governance or “SRI” for Socially Responsible Investment), non-governmental organizations, government organizations, and “crowd sources” to construct a 360 degree view of a company’s sustainability performance.  To better understand this process, let’s consider an example.

Hewlett Packard is a heavily tracked company.  We have 56 sources of data for this company that together contribute 494 different rating elements.  We map each of these elements into one of twelve different CSR subcategories.  For instance, here are mappings for 20 of the elements that contribute to the Hewlett Packard rating:

Description of Data Element

Subcategory Mapping

Source

Participant in the Walmart Sustainability Assessment Environment Policy & Reporting Carbon Disclosure Project 2010 Full Data
Better World product rating Product Better World Companies
Board Structure/Board Diversity Board Thomson Reuters Asset4
Commitment to Society and to Human Rights Protection Policies Leadership Ethics ISOS Group Assessments
Committed to improving sustainability performance Human Rights & Supply Chain BSR Member
Corporate Governance Rank Transparency & Reporting CR’s 100 Best Corporate Citizens 2011
Green House Gas (GHG) Footprint Energy & Climate Change Trucost
Human Rights/ Child and Forced Labor Issues Community Dev & Philanthropy MSCI ESG Intangible Value Assessment
Member of the Electronic Industry Citizenship Coalition Human Rights & Supply Chain Electronic Industry Citizenship Coalition
Most Admired Companies for Minority Professionals in 2011 Diversity & Labor Rights BlackEngineer Most Admired Companies 2011
North America 300 Carbon Rank Energy & Climate Change Environmental Investment Organisation
Number of corporate sustainability reports issued Transparency & Reporting CorporateRegister.com
Number of EPEAT certified products Environment Policy & Reporting EPEAT
On FCPA Corporate Investigations List Leadership Ethics FCPA Corporate Investigations
Same-sex benefits Compensation & Benefits IW Financial
Statement references corruption Leadership Ethics UN Global Compact 2010
Top 100 most accountable companies according to AccountAbility Transparency & Reporting AccountAbility
Top 50 Socially Responsible Environment Policy & Reporting Top 50 Socially Responsible
Supports UN Drugs and Crime Anti-Corruption Measures Leadership Ethics UN Office on Drugs and Crime Anti-Corruption Measures
Working Mother list 2010 Compensation & Benefits Working Mother List 2010

Some of these data elements could map to more than one subcategory.  For instance, a company that is on the list of “Best Workplaces for Commuters” would get credit both for its energy saving effort (in “Energy & Climate Change”) for the benefit its programs bring to its employees (in “Compensation & Benefits).

The list above includes examples of each of the three main contributors to the system: Investment-related sources (Asset4/Thomson Reuters, Carbon Disclosure Project, GovernanceMetrics International/Corporate Library, IW Financial, MSCI, Trucost, Vigeo); Activists and NGOs (Accountability, BSR, CorporateRegister, CR 100, EIO, FCPA, Top 50 Socially Responsible); and Government & Consumer (Better World, Black Engineer, EICC, EPEAT, UN Global Compact, UNODC, Working Mother).  The completed mapping process connects the 494 data elements from the 56 sources for HP into our twelve subcategories in 971 different ways.

Subcategory

Investment-Related

Activists & NGOs

Government & Consumer

Total By Subcategory

Board

67

12

1

80

Community Dev & Philanthropy

39

16

10

65

Compensation & Benefits

34

6

6

46

Diversity & Labor Rights

51

8

13

72

Energy & Climate Change

37

44

15

96

Environment Policy & Reporting

46

69

14

129

Human Rights & Supply Chain

40

19

8

67

Leadership Ethics

80

27

14

121

Product

56

8

6

70

Resource Management

46

31

11

88

Training, Health & Safety

30

6

2

38

Transparency & Reporting

51

30

18

99

Total By Type

577

276

118

971

While investment-related sources contribute more data elements than the other types, there are at least some of each type present in each subcategory.  Another way to look at this is to see that many sources contribute to each subcategory:

Subcategory

Number of Sources

Total Elements

Board

11

80

Community Dev & Philanthropy

21

65

Compensation & Benefits

18

46

Diversity & Labor Rights

23

72

Energy & Climate Change

24

96

Environment Policy & Reporting

25

129

Human Rights & Supply Chain

23

67

Leadership Ethics

29

121

Product

14

70

Resource Management

24

88

Training, Health & Safety

13

38

Transparency & Reporting

25

99

Each value from each data element is converted into a zero to 100 rating (zero = lowest, 100 = highest).  These scores are then adjusted by comparing them to each other.  In the example above, there are 11 sources for HP’s board performance.  Suppose three of them gave it a great rating, six a medium rating, and two a poor one.  Computer analytics would guess that the six scores that agree are correct and that HP’s board rating is in the medium range.  The assumption is that three sources tended to be biased towards high scores and two towards low scores.  This chart shows the actual distribution of scores at the subcategory level, along with a calculation of the “normal” error curve that results.

CSRHub big data

When the analysis is repeated across thousands of companies, a picture emerges as to which sources tend to be overly positive or negative and which tend to predict the “mean” of the other sources.  All sources can be adjusted, based on this feedback—moving them up or down so they more accurately match the opinion of all other sources.  After a large number of iterations in this process, there is a consensus score for each subcategory for each company analyzed.

CSRHub ratings process

By making a few assumptions about how the errors in data are distributed, one can assess the accuracy of ratings.  In a previous post, we showed that CSRHub’s overall rating accurately represents the values that underlie it to within 1.8 points at a 95% confidence interval.

In our next post, we will discuss the benefits and drawback of using this complex and data intensive approach to measuring company CSR performance.


Bahar Gidwani is a Cofounder and CEO of CSRHub. Formerly, he was the CEO of New York-based Index Stock Imagery, Inc, from 1991 through its sale in 2006. He has built and run large technology-based businesses and has experience building a multi-million visitor Web site. Bahar holds a CFA, was a partner at Kidder, Peabody & Co., and worked at McKinsey & Co. Bahar has consulted to both large companies such as Citibank, GE, and Acxiom and a number of smaller software and Web-based companies. He has an MBA (Baker Scholar) from Harvard Business School and a BS in Astronomy and Physics (magna cum laude) from Amherst College. Bahar races sailboats, plays competitive bridge, and is based in New York City.

CSRHub provides access to corporate social responsibility and sustainability ratings and information on nearly 5,000 companies from 135 industries in 65 countries. By aggregating and normalizing the information from over 170 data sources, CSRHub has created a broad, consistent rating system and a searchable database that links millions of rating elements back to their source. Managers, researchers and activists use CSRHub to benchmark company performance, learn how stakeholders evaluate company CSR practices and seek ways to change the world.

 

Read More [fa icon="long-arrow-right"]

[fa icon="comment"] 1 Comment posted in Accountability, Asset4/Thomson Reuters, Bahar Gidwani, Better World, Big Data, Carbon Disclosure Project, CorporateRegister, CR 100, EICC, EIO, EPEAT, ESG, FCPA, GovernanceMetrics International/Corporate Library, Government & Consumer, social, Investment-related sources, UN Global Compact, Uncategorized, Working Mother, IW Financial, MSCI, socially responsible investment, Top 50 Socially Responsible, Trucost, UNODC, Vigeo, Activists and NGOs, and Governance, Black Engineer, BSR, CSRHub, environment, Hewlett Packard, SRI

Subscribe to Email Updates

Lists by Topic

see all

Posts by Topic

see all

Recent Posts