PSR Data Science
FORCE11 has announced that the third annual FORCE11 Scholarly Communication Institute (FSCI) will take place at UCLA from August 5 to 9, 2019. With this move, FORCE11 begins a long-term collaboration with the UCLA Library to plan and present FSCI, and improve understanding and engagement with the fast-changing world of research communication on campuses everywhere. FSCI started in 2017 as a partnership between FORCE11 and the University of California at San Diego. Now setting down roots in Los Angeles, FSCI is a week-long summer school in open research for researchers, librarians, publishers, university administrators, funders, students and post-docs that incorporates intensive coursework, seminars, group activities, lectures and hands-on training. Participants learn from leading experts, have the chance to discuss the latest trends and to gain expertise in new technologies. FSCI is transdisciplinary and relevant across the sciences, social sciences and humanities.
“Working together with the academic community to explore frontiers in research communications is key to changing practices,” said Ginny Steel, UCLA Norman and Armena Powell University Librarian. “The UCLA Library has been actively involved in efforts to enhance and expand scholarly discourse through openness, and the summer institute will be a valuable forum for us to consider the opportunities and challenges in concert with the international research community. We look forward to welcoming everyone in August.”
FSCI courses explore changing practice in data-sharing, authorship, peer review, research assessment, publishing and more. There are courses for those who know very little about current trends and technologies and courses for those ready to pursue advanced topics. FSCI covers scholarly communication from a variety of disciplinary, regional and international perspectives. Course information and registration will be available in the spring. To stay updated on details as they emerge, sign up to receive email updates, join the Facebook page, follow @force11rescomm on Twitter, or visit FSCI2019@UCLA online.
Apply by January 4 for RDM 102: Beyond Research Data Management for Biomedical and Health Sciences Librarians!
Biomedical and health sciences librarians are invited to participate as students or mentors in RDM 102: Beyond Research Data Management for Biomedical and Health Sciences Librarians, a rigorous NNLM online training course going beyond the basics of research data management, sponsored by the National Library of Medicine and the National Network of Libraries of Medicine Training Office (NTO). This course will expand on concepts covered in RDM 101: Biomedical and Health Research Data Management Training for Librarians. The librarian’s role in research reproducibility and research integrity will be threaded throughout the course, which will also include practice in using Jupyter notebooks through an open-source browser-based application (jupyterhub) that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. The major aim of this course is to provide an introduction to the support of data science and open science with the goal of developing and implementing or enhancing data science training and services at participants’ institutions.
Applicants must have previous training or experience in research data management through the RDM 101 course or attest to these learning objectives. Applications are open to health science information professionals working in libraries located in the US; or with permission of the instructors, persons living outside the US with LIS training and wishing to obtain a position in a US based library. A letter of institutional support is required. Enrollment is limited to 40.
The online asynchronous component of the program is six weeks, running from February 20 – April 5, including a catch-up week, and then followed by a synchronous online session during the week of April 8. Participants can expect to spend about six hours each week on coursework and the project. There is no charge for participating in the program. MLA CE credit will be awarded (TBD). Mentors will assume the role of a researcher with a dataset seeking data services support. They will work with groups of 4-5 mentees. Mentors will be compensated $1,000 for their time and required to submit a W-9 and a contract with the University of Utah. For more details and knowledge requirements, consult the course description link at the beginning of this message. To apply, submit the online application form, and upload PDFs of a current CV and letter of institutional support by January 4, 2019. For questions, contact Shirley Zhao, RDM Project Lead and Training Development Specialist.
Significant advances in technology, coupled with decreasing costs associated with data collection and storage, have resulted in unprecedented access to vast amounts of health- and disease-related data. The National Library of Medicine and the Division of Mathematical Sciences in the Directorate for Mathematical and Physical Sciences (DMS) at the National Science Foundation (NSF) recognize the need to support research to develop innovative and transformative mathematical and statistical approaches to address important data-driven biomedical and health challenges. The goal of this interagency program is the development of generalizable frameworks combining first principles, science-driven models of structural, spatial and temporal behaviors with innovative analytic, mathematical, computational, and statistical approaches that can portray a fuller, more nuanced picture of a person’s health or the underlying processes.
Specific information concerning application submission and review process is through the National Science Foundation via solicitation NSF-19-500. Applicants may opt to submit proposals via Grants.gov or via the NSF FastLane system. For applications that are being considered for potential funding by NLM, the PDs/PIs will be required to submit their applications in an NIH-approved format. Anyone invited to submit to NIH will receive further information on submission procedures. Applicants will not be allowed to increase the proposed total budget or change the scientific content of the application in the submission to the NIH. The results of the first level scientific review will be presented to NLM Board of Regents for the second level of review. NLM will make final funding determinations and issue Notices of Awards to successful applicants. NLM and DMS anticipate making 8 to 10 awards, totaling up to $4 million, in fiscal year 2019. It is expected that each award will be between $200,000 to $300,000 (total costs) per year with durations of up to three years. The application submission window deadline is in early January, 2019.
Collaborative efforts that bring together researchers from the biomedical/health and the mathematical/statistical sciences communities are a requirement for this program and must be convincingly demonstrated in the proposal. While the research may be motivated by a specific application or dataset, the development of methods that are generalizable and broadly applicable is preferred and encouraged. Proposals should clearly discuss how the intended new collaborations will address a biomedical challenge and describe the use of publicly-available biomedical datasets to validate the proposed models and methodology. Applicants are expected to list specific datasets that will be used in the proposed research and demonstrate that they have access to these datasets. The Data Management Plan should describe plans to make the data available to researchers if these data are not in the public domain. Some of the important application areas currently supported by the National Library of Medicine include the following:
- Finding biomarkers that support effective treatment through the integration of genetic and Electronic Health Records (EHR) data;
- Understanding epigenetic effects on human health;
- Extracting and analyzing information from EHR data;
- Understanding the interactions of genotype and phenotype in humans by linking human sensor data with genomic data using dbGaP;
- Protecting confidentiality of personal health information; and
- Mining of heterogeneous data sets (e.g. clinical and environmental).
Inquiries should be directed to Jane Ye, PhD at the National Library of Medicine, (301) 594-4882.
NLM Director Patricia Flatley Brennan, RN, PhD, has announced the appointment of Clem McDonald, MD, to the newly created position of Chief Health Data Standards Officer for the National Library of Medicine. His appointment will be effective November 1, 2018. The new position demonstrates NLM’s strong and enduring commitment to health data standards. The Chief Health Data Standards Officer’s responsibilities will involve integrating standards efforts across the Library, including the Fast Healthcare Interoperability Resources (FHIR) interoperability standard, Common Data Elements, and the vocabularies specific to clinical care (e.g., RxNORM, LOINC, SNOMED). The chief will also develop partnerships with industry, academia, and other federal agencies to advance the use of health data standards in clinical practice, public health, and observational data, including sensors.
For the last 12 years, Dr. McDonald served as Director of the Lister Hill National Center for Biomedical Communications (LHNCBC) and Scientific Director of its intramural research program. His research focuses on clinical informatics; tools based on HL7’s FHIR to facilitate the use of electronic health records and research bases; the analysis of large clinical databases; the promotion, development, enhancement, and adoption of clinical messaging and vocabulary standards; and text de-identification. Prior to coming to NLM, Dr. McDonald served as the Regenstrief Professor of Medical Informatics at the Indiana University School of Medicine and the Director of the Regenstrief Institute for Health Care, a privately endowed research institute working to integrate research discovery, technological advances, and systems improvement into the practice of medicine. Dr. McDonald developed the Regenstrief Medical Record, one of the first electronic health record systems, and introduced the use of randomized trials to study health information systems. With NLM support, he and his colleagues developed the first Health Information Exchange, now loaded with 6 billion results from hospitals across Indiana. He also initiated the Logical Observation Identifier Names and Codes (LOINC) database observations for laboratory tests, clinical measurements, and clinical reports, and he was one of the founders of the Health Level 7 (HL7) message standards, used in hospitals today.
Effective November 1, Milton Corn, MD, Deputy Director of NLM for Research and Education, will also assume the responsibilities of Acting Scientific Director, LHNCBC. Olivier Bodenreider, MD, PhD, Chief of the Cognitive Science Branch at LHNCBC and a Principal Investigator in NLM’s Intramural Research Program, has been selected to become Acting Director, LHNCBC. Jerry Sheehan, NLM Deputy Director, will provide executive oversight and guidance.
The National Library of Medicine has teamed up with the National Academies of Sciences, Engineering, and Medicine (NASEM) to conduct a study on forecasting the long-term costs for preserving, archiving, and promoting access to biomedical data. The study is being conducted as part of the NLM’s efforts to develop a sustainable data ecosystem, as outlined in both the NLM Strategic Plan and the NIH Strategic Plan for Data Science. Such an ecosystem is possible because the products and processes of research are now digital by default, and increasingly sophisticated and powerful computation can now be brought to data, rendering meaning that had previously been hidden. Across the biomedical sciences, decisions must be made about where in this ecosystem to invest limited resources to maximize the value of the data for scientific progress; strategies are needed to address question such as: What is the future value of research data? For how long must a dataset be preserved before it should be reviewed for long-term archiving? And what are the resources necessary to support long-term data storage?
For this study, NASEM will appoint an ad hoc committee to develop a framework for forecasting these costs and estimating potential benefits to research. The committee will examine and evaluate:
- Economic factors to be considered when examining the life-cycle cost for data sets (e.g., data acquisition, preservation, and dissemination);
- Cost consequences for various practices in accessioning and de-accessioning data sets;
- Economic factors to be considered in designating data sets as high value;
- Assumptions built in to the data collection and/or modeling processes;
- Anticipated technological disruptors and future developments in data science in a 5- to 10-year horizon; and
- Critical factors for successful adoption of data forecasting approaches by research and program management staff.
The committee will provide a consensus report and two case studies illustrating the framework’s application to different biomedical contexts relevant to NLM’s data resources. Relevant life-cycle costs will be delineated, as will any assumptions underlying the models. To the extent practicable, NASEM will identify strategies to communicate results and gain acceptance of the applicability of these models. As highlighted in a recent blog post, NASEM will host a two-day public workshop in late June 2019 to generate ideas and approaches for the committee to consider. Further details on the workshop and public participation will be made available in the coming months.
NLM is supporting NASEM’s efforts to solicit names of committee members, as well as topics for the committee to consider. Suggestions should be sent to Michelle Schwalbe, Director of NASEM’s Board on Mathematical Sciences and Analytics, or Elizabeth Kittrie, NLM Senior Planning and Evaluation Officer.