Author Archives: Marianne Chirica

Data Sharing and Code Commenting: Best Practices for Graduate Students

Scientific computing has become increasingly important in psychological science research. However, proper data management and techniques related to analytic workflow are rarely directly taught to graduate students. This short piece highlights a few best practices for data sharing and code commenting that can be incorporated as a graduate student to facilitate data reproducibility and replicability.

Why Should I Share My Data and Code?

Based on concerns of poor self-correction in psychological science (Klein et al., 2018; Open Science Collaboration, 2015), much attention has been drawn to the replicability crisis, also known as the “credibility revolution” (Vazaire, 2018). Accordingly, the open science movement has strengthened the scientific community’s expectation of access to key components of research (e.g., protocols, resources, data, and analysis software code) in order to assess, validate, and replicate prior research (Ioannidis, 2012). Data sharing is one of several key practices research organizations and major funders have begun to mandate (Houtkoop et al., 2018). Despite this, data sharing remains quite rare in the psychological sciences, due to two primary reasons:

  • 1) a lack of knowledge on how to get started
  • 2) researchers are unaware of the benefits of data sharing or may not be confident in the quality of their data

Here, we will be focusing on the latter concern, however, for specific guidelines on how to prepare and share your data, see Klein et al. (2018).

Benefits of Data and Code Sharing

When a researcher makes their data and code widely available, they are, in effect, endorsing:

  • analytic reproducibility (i.e., statistical analyses that can be re-run to detect unintended errors or bias and verify the logic and sequence of data analysis steps; Hardwicke et al., 2018, Wilson et al., 2017)
  • analytic robustness (i.e., alternative analytic decisions that may be used to verify results)
  • analytic replication (i.e., replication of the same analytic steps with new data to investigate generalizability; Houtkoop, 2018)

Importance of Code Commenting

Data sharing should be, however, the bare minimum. Above and beyond this, making your data management and analysis code publicly available, requires that the code be readable (i.e., understandable) and reproducible. There are some basic scientific computing practices that ensure research is not just reproducible, but also efficient, transparent, and accessible in the future (Cruwell et al., 2019). As an example, one way to ensure analytic code can be easily understood, is to provide a detailed, commented version of the code. When you have to come back to modify or review the code you wrote weeks, months, or even years ago, will you be able to remember what you did and what that code means? Even more important, will other people be able to understand what you did? Within an open science framework, it’s essential for other people to be able to easily interpret your code for data quality checks and reproducibility. Although incorporating code comments may seem tedious at the time,  the long run benefits afforded to your future self, your peers, your co-authors, and other researchers in the field cannot be understated.

How to Comment your Code

Now that we covered the why, let’s talk about the what. Here are some concrete steps to take when addressing your code (adapted from Wilson et al., 2017). First, create a commented-out section, i.e., the “header”, at the top of your code. Here, create an overview of your project to self-reference. List the project title, filename, the co-authors, a description of the purpose (e.g., initialization, data cleaning, analysis) and any dependencies including required input data files, software version, and calendar date.

Then insert a table of contents that describes the sections of the code. An example table of contents can include:

  • 1) Loading in Data and Libraries
  • 2) Descriptive Statistics
  • 3) Preliminary Analysis
  • 4) Main Analysis for Aim 1
  • 5) Main Analysis for Aim 2
  • 5) Sensitivity Analysis
  • 6) Tables

Adjust this based on your project, your aims, and your workflow. It might also be helpful to include a list of all the variable names and a brief description of each variable in the dataset at the top of the code.

Next, it’s important to record all the steps used to process the data. Find out how to make a comment in the code in the software you are using (e.g., SAS, R). Place a brief explanatory comment at the start of a data step or analytic move.

***An example of commented out code in SAS:

/* Load in Library */
libname gradPSYCH 'N/project/APAGS/data';

/* Designate data */
data APAGS; set gradPSYCH.AGAPS; run;

/* See Contents of Data File */
proc contents data=APAGS; run;

If you carefully comment chunks of functionally-related code, i.e., writing out what you are doing and why, other researchers, and your future self, will be able to easily reproduce your data steps.

Final Thoughts

As a scientist, being committed to open science means engaging in responsible data management techniques, embracing transparency, and preparing your data for a reproducible research workflow. Publicly sharing data as well as a well-organized, carefully commented data management and analytic code that enables other researchers to engage in analytic reproducibility, analytic robustness, and analytic replication is a good start. Overall, these practices will also improve your personal research efficiency and external credibility.


References

  • Crüwell, S., van Doorn, J., Etz, A., Makel, M. C., Moshontz, H., Niebaum, J. C., Orben, A., Parsons, S., & Schulte-Mecklenbeck, M. (2019). Seven easy steps to open science: An annotated reading list. Zeitschrift für Psychologie, 227(4), 237-248. https://doi.org/10.1027/2151-2604/a000387
  • Hardwicke, T. E., Mathur, M. B., MacDonald, K. E., Nilsonne, G., Banks, G. C., Kidwell, M., … Frank, M. C. (2018, March 19). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. https://doi.org/10.1098/rsos.180448
  • Houtkoop BL, Chambers C, Macleod M, Bishop DVM, Nichols TE, Wagenmakers E-J. (2018). Data Sharing in Psychology: A Survey on Barriers and Preconditions. Advances in Methods and Practices in Psychological Science.;1(1):70-85. doi:10.1177/2515245917751886
  • Ioannidis, J. P. (2012). Why science is not necessarily self-correcting. Perspectives on Psychological Science7(6), 645-654.
  • Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Mohr, A. H., Ijzerman, H., Nilsonne, G., Vanpaemel, W., & Frank, M. C. (2018). A Practical Guide for Transparency in Psychological Science. Collabra: Psychology, 4(1), 20. https://doi.org/10.1525/collabra.158
  • Open Science Collaboration. Estimating the reproducibility of psychological science.Science349,aac4716(2015).DOI:10.1126/science.aac4716
  • Vazire S. (2018). Implications of the Credibility Revolution for Productivity, Creativity, and Progress. Perspectives on psychological science : a journal of the Association for Psychological Science13(4), 411–417. https://doi.org/10.1177/1745691617751884
  • Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing. PLOS Computational Biology, 13(6), e1005510. https://doi.org/10.1371/journal.pcbi.1005510

The Argument Against P-Values

There is concern that a substantial proportion of published research presents largely false findings (Ioannidis, 2005). This problem, in part, stems from social science’s reliance on null hypothesis statistical testing (NHST) given the incentive to achieve statistical significance (e.g., publications, grant funding). Research in the social sciences has historically adopted a Frequentist perspective, primarily reporting results using a dichotomous reject or non-reject decision strategy based on whether some test statistic surpasses a critical value and results in a statistically significant p-value (usually p > 0.05). Although useful in several ways, p-values are largely arbitrary metrics of statistical significance (Greenland et al., 2016), and they are often used incorrectly (Gelman, 2016). The use of p-values encourages a binary mindset when analyzing effects as either null or real, however, this binary outlook provides no information on the magnitude or precision of the effect. P-values can vary dramatically based on the population effect size and the sample size (Cumming, 2008). This reliance on an unstable statistical foundation has been discussed in the literature (Wasserstein, 2016), and while some journals have taken matters into their own hands (for example, Basic and Applied Social Psychology banned p-values and NHST), the field of psychology has largely failed to address the concerns raised by the use of NHST.                                     

Research is moving towards adopting new statistics as best practice, relying instead on estimations based on effect sizes, confidence intervals, and meta-analysis (Cumming, 2014). We, as graduate students in training, are in a position to push towards thinking in terms of estimations and away from dichotomously constrained interpretations. In contrast to the binary nature of p-values, a confidence interval is a set of plausible values for the point estimate. Although perhaps wide, the confidence interval accurately conveys the magnitude of uncertainty of the point estimate (Cumming, 2014), as well as the level of confidence in our results. For example, a 95% confidence interval that includes values for a population mean, μ, indicates 95% confidence that the lower and upper limits are likely lower and upper bounds for μ. The APA Publication Manual (APA, 2020) specifically outlines recommendations to report results based on effect size estimates and confidence intervals, rather than p-values. P-values are not well suited to drive our field forward in terms of precision and magnitude of estimates. Researchers should therefore focus on advancing the field by gaining an understanding of what the data can tell us about the magnitude of effects and the practical significance of those results. It is important for graduate students to adopt practices to produce reproducible and reliable research. One way to do so is to move beyond p-values.

How to move beyond p-values:

  • Prioritize estimation instead of null hypothesis testing or p-values
    • Formulate research questions in terms of estimation. Ex: How large is the effect of X on Y; to what extent does X impact Y?
  • Report confidence intervals and corresponding effect sizes
  • Include confidence intervals in figures (preferred over standard error bars)
  • Make interpretations and conclusions based on the magnitude of the effects rather than a dichotomous decision based on “statistical significance”

References                                                     

American Psychological Association. (2020). Publication manual of the American Psychological Association 2020: the official guide to APA style (7th ed.). American Psychological Association.

Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286– 300. doi:10.1111/j.1745-6924.2008.00079.x     

Cumming, G. (2014). The new statistics: why and how. Psychological science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966

Gelman, A. (2016). The problems with p-values are not just with p-values. The American Statistician, 70(10).

Greenland, S., Senn, S.J., Rothman, K.J. et al. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31, 337–350. https://doi.org/10.1007/s10654-016-0149-3                                          

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, e124. Retrieved from http:// www.plosmedicine.org/article/info:doi/10.1371/journal .pmed.0020124

Wasserstein, R.L., & Lazar, N.A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108

Written by Marianne Chirica, an APAGS Science Committee member and a third-year graduate student in the Psychological and Brain Sciences Ph.D. program at Indiana University. Feel free to reach out to Marianne with any questions you may have!

Mentorship as a Grad Student: How and Why

Mentoring is a dynamic, collaborative relationship wherein a mentor and mentee work together to facilitate a mentee’s professional development and success. As a first-generation American, my success in academia would not have been possible without the help of several mentors along the way. My current mentoring efforts are inspired by positive experiences with my own mentors, and the hope that I can pay it forward to others. Becoming a mentor can help disseminate knowledge, foster an environment for psychological safety, learning, and development, reveal the “hidden curriculum” of graduate school, and enhance diversity in higher education.

Mentoring Prospective Graduate Students

As a graduate student, there are several opportunities to become a mentor. One opportunity is during the graduate school application process. There are many programs designed to connect eager undergraduates and post-baccalaureates with mentors during the graduate school application process. Some are within certain schools or types of programs (e.g., Social Psychology; School Psychology), for underrepresented individuals (e.g., NextGen Psych Scholars; Project Short), or just general programs (e.g., APSSC, APA Division 19). The structure and level of commitment will vary by the program and by each mentor/mentee relationship. Typically, the mentor will meet a couple of times with the mentee, provide helpful materials for crafting personal statements and CVs, and edit materials. Particularly salient for underrepresented mentees, the mentor can help select graduate programs that are a good match and identify scholarships and funding opportunities. If that sounds too time intensive, there are other ways to help prospective PhD psychology applicants. See Application Statement Feedback Program (ASFP).

Mentoring Undergraduate and Younger Graduate Students

A more traditional mentorship role in graduate school is to mentor undergraduate students in your research lab. Undergraduates often help with literature reviews, running participants, or other miscellaneous research tasks. As a mentor, you could help lead journal clubs or provide direct research opportunities within one of your research projects, carefully observing and guiding them throughout the process. For example, I am currently mentoring an undergraduate student involved in a systematic review I am working on. As a mentor, I provide didactic instruction on types of reviews and guidance on the research process.

As a graduate student, you might also serve another role as a teaching assistant. Teaching assistants have a more formal teaching relationship with their students but can still provide mentorship. This can look like holding office hours and meeting one-on-one with students who show interest in your research or in a PhD program.

You can also mentor students in their early graduate school career. As you progress through the program, you gain valuable knowledge not only in your selected discipline, but also in how to be a graduate student. Younger students may benefit from any wisdom, tips, and skills you can pass on. These topics can vary from time management skills, course selections, clinical practicum advice, advice on advocating for oneself, and handling rejection. Things that may seem like second nature now, may be valuable wisdom to a first-year graduate student.

Irrelevant of the type of mentoring relationship, mentees are eager to learn from you. You can disseminate information about the field (e.g., different types of paths post undergrad), research (e.g., how to go about starting a literature review), classes (e.g., which classes to take to best prepare for graduate school or for a certain degree), or other areas that may help your mentee grow professionally. Although it may seem challenging, mentoring is a rewarding experience, fostering a collaborative relationship that benefits both individuals involved!

Want to become a mentor?

  1. Sign up to mentor students applying to graduate programs (see links above).
  2. Reach out to your advisor and ask if you can mentor undergraduates.
  3. Take initiative and offer to help younger graduate students.

Written by Marianne Chirica, an APAGS Science Committee member and a second-year graduate student in the Psychological and Brain Sciences Ph.D. program at Indiana University. Feel free to reach out to Marianne with any questions you may have!