STEP 1: Choose a data set that you would like to work with.
After I have analyzed the data set from the suggested codebooks, I decided to use the Gapminder. I have chosen this dataset because it has subjects that is related to my areas of interests.
STEP 2: Identify a specific topic of interest.
There is a lot of different subjects in Gapminder but the one that most drew me attention was the “Breast cancer, new cases per 100,000 women” because this disease was my subject of study in my master degree.
STEP 3: Prepare a codebook of your own.
The codebook was created based on Gapminder codebook. It contains all the variables details for both my topics.
Variable Name | Description of Indicator | Main Source |
---|---|---|
breastCancerAll | Total number of new female cases of breast cancer during the 2002 year. | IARC (International Agency for Research on Cancer) |
breastCancer100th | Number of new cases of breast cancer in 100,000 female residents during the 2002 year. | IARC (International Agency for Research on Cancer) |
meanSugarPerson | Mean of the food consumption quantity (grams per person and day) of sugar and sweeters between years 1961 and 2002 | FAO modified |
meanFoodPerson | Mean of the total supply of food (kilocalories / person & day) available in a country, divided by the population and 365 (the number of days in the year) between the years | FAO modified |
meanCholesterol | The average of the mean TC (Total Cholesterol) of the female population, counted in mmol per L; (calculated as if each country has the same age composition as the world | MRC-HPA Centre for Environment and Health |
STEP 4: Identify a second topic that you would like to explore in terms of its association with your original topic.
When I looked again in Gapminder, there were three topics that I thought that could have some link with the incidence of new breast cancer cases:
- Sugar per person (g per day);
- Food supply (kilocalories / person & day);
- Cholesterol (fat) in blood, woomen, (mmol/L).
The arrangement of each data set was informed in a relation with country and year. For data set 1 (Sugar per person) the year range is between 1961 and 2004. The second data set (Food supply) is between 1961 and 2007, and the last one (Cholesterol) is between 1980 and 2008. Therefore, I realized that it would be interesting to make the years values average of each country from the start of data set until 2002, as the breast cancer new cases data set is of 2002. With that information, and, as the three topics has a relation with alimentation, I would be able to explore if there is a relation between alimentation and the incidence of breast cancer.
So in my research, I can make until three question that has some link:
- Does the sugar consumption has some relation with the incidence of breast cancer?
- Does the food consumption quantity has some relation with the incidence of breast cancer?
- Does the Cholesterol in blood has some relation with the incidence of breast cancer?
At first, I will focus on the first question, trying to answer all of them during the course.
STEP 5: Add questions/items/variables documenting this second topic to your personal codebook.
Done at codebook above.
STEP 6: Perform a literature review to see what research has been previously done on this topic.
The breast cancer is the second disease that most cause obits among women in all the world.
To make my review I used the scholar site with the text:
- “breast cancer diet”
- “breast cancer sugar”
From the results, I separated two studies:
The study [1] investigate the incidence and mortality of cancer with known risk factors and dietary practices. To do that, it collects information from differents data sets like height, weight, food consumption, etc. The information gathered is an average value from several countries. The results from the study realized that the height and weight is both highly correlated with total fat consumption and, the total fat consumption is the variable most highly correlated with the mortality rates.
In the study [2] were calculated multivariate odds ratios and population attributable risks for breast cancer with dietary b-carotene and vitamin E intake, alcohol consumption, physical activity, and, for postmenopausal women, body mass index. The data was from a case control study conducted in Italy from June 1991 through April 1994. The study presented that the risks associated with alcohol and b-carotene intake were larger among premenopausal women, and the risk related to physical activity were larger among postmenopausal women. In the end, the study indicates that about 1/3 of the breast cancer cases in this Italian population could be avoided. It would be possible by the intervention on a few selected and modifiable risk factors: reducing alcohol intake, having a diet richer in fruit, vegetables, and vegetable oil, and a higher level of physical activity.
The correlation between the alimentation way and the incidence of breast cancer is a explored subject. There were found some studies that consist in analyzing the food consumption in a general way; others only examine in one country. Thus, it would be interesting to explore deeper this correlation and study if the sugar consumption has a direct correlation with the incidence of breast cancer.
Reference
[1] Gray G. E., Pike M. C., Henderson B. E. (1979). Breast-cancer incidence and mortality rates in different countries in relation to known risk factors and dietary practices. Jan;39(1):1-7.
[2] Mezzetti M., La Vecchia C., Decarli A., Boyle P., Talamini R., Franceschi S. (1998). Population attributable risk for breast cancer: diet, nutrition, and physical exercise. Mar 4;90(5):389-94.
STEP 7: Based on your literature review, develop a hypothesis about what you believe the association might be between these topics.
Hypothesis:
As we see in the review, a good alimentation can be a determining factor to avoid this disease.
- I believe that sugar consumption is positively correlated with the incidence of breast cancer.
Review Criteria
Your assessment will be based on the evidence you provide that you have completed all of the steps. When relevant, gradients in the scoring will be available to reward clarity (for example, you will get one point for having a research question presented in an unclear fashion, but two points for being clear). In all cases, consider that the peer assessing your work is likely not an expert in the field you are analyzing. You will be assessed equally on your reflection of the literature you’ve discovered and the project you are proposing.
Specific rubric items, and their point values, are as follows:
- Has the learner selected a data set and indicated that selection? (1 point)
- Has the learner clearly stated a research question and hypothesis? (2 points)
- Does the literature review include clear information about search terms used? (1 point)
- Does the literature review clearly identify references used? (2 points)
- Does the literature review clearly present a summary of findings (e.g., variables considered, patterns of findings, etc.)? (2 points)