Team 13 Mission 1

Potential questions that Team 13 would like to explore:


Notes: Maybe as we add to the list we should begin to choose the top 3 key questions that we would like to find out from the data....or that we think are key impact factors? We could agree that between us first and then see if we need any other related information. (Vilinda)
Gus had noted this key impact factor which may be useful as a starting point??

Data needed: Country GDP per year

Location: http://data.worldbank.org/indicator/NY.GDP.PCAP.CD

Reason: Explore relationship between CO2 emissions and GDP

Team 13 (Team members - Vilinda, Gus and Shrividya)

Mission 1: Questions
1. Are CO2 emissions related to the wealth of the nation or GDP per capita?
Team action(s): Explore relationship between CO2 emissions and GDP.
Also, explore the relationship by population of the country (per capita)?

2. Is decline in CO2 emissions related to (a) economic growth (b) energy use (c) fuel type and/or (d) government policy or other initiatives put in place by country, nation?
Team action(s): Explore the decline in CO2 emissions in relation to possible key impact factors.

3. Is geographical factors related to CO2 emissions? Do countries that endure colder winters or warmer climates have more or less CO2 emissions?
Team action(s): Explore the relationship between geography and CO2 emissions and possibly climate change factors?

Some of these questions could be explored by case study provided by a specific country??

General Question: It is likely that a combination of various factors produce changes in the rise or fall of CO2 emissions. If so, how do we unpick these factors to tell a coherent story?



Mission 2: Cleaning the Data 

Some comments (Shrividya)

Summary
- I have been trying to analyse and visualise the main spreadsheet in R. The misison is centered around spreadsheet analysis but I believe that R is a more flexible solution. It includes a lot of functions for data munging as well as some neat visualisation libraries. 

What I found
- Loading the data into R was a little non-trivial. I had to load the data and the column names separately due to poor parsing of the column names by the read.csv function. 
- To visualise the top N (N > 1) CO2 emitters in 2009, I had to remove 'NA' from this column alone before plotting the data. See the following plot (in shared dropbox folder) for a time series of the top 5 emitters of 2009

To do:
- Legend for plots! (still learning R) ;-)
- plot the contribution from different fuel sources for the top N emitters. I have downloaded the data but I still need to look at it. 
- identify important features in time series data. Research connections between features and indicators like population, GDP etc. 

Some Comments (Vilinda)
What I have done: 
Data: I have looked at the main data set (Total CO2 Emissions) and the Per Capita CO2 data. I have been trying to get acquainted with the data at a basic level; have looked overall whether the 'change in place' (last column) shows any pattern for whether emissions have declined, increased or stayed the same. From the list provided, Down is 80; Up is 81 and Same is 59 so a very similar pattern of decline as well as increase. I sorted the data in a separate worksheet to do that. 
I also looked at removing the missing data and N/A in the Per Capita worksheet. Again I removed them in a separate worksheet/workbook.
Missing Values: Have looked how you would deal with missing values in Excel. 
Duplicates: I also looked for duplicates. Couldn't find anything apart from Germany and Germany East and West. I had never heard of 'Reunion' (it is an island).

Challenges: So far the challenges I have are 
(a) There is a lot of information to take in the main data set
(b) My geography has never been that good when I am looking at the data without a visual :)
(c) Accuracy of doing spreadsheet actions - my lack of confidence is making me take small sections of data and looking at them closely. Probably a good thing when it is not your own data!!
(d) Related to (c) above this is more time consuming than I thought 
(e) I usually use Excel only for graphics and very seldom so my knowledge/skills are very limiting with this package. 

Will do more when I am back in the office which is Wednesday