Humans of Data 23

“I’m excited that people are now starting to think about data sharing. For the last few years it’s been me, as the institutional data manager, going to people and saying, ‘You should make your data available!’  Now people are getting in touch and saying they want to do it, because they’re recognising they can get more stuff published that they can get recognition for.

It’s also good that we’re getting more than just the raw or aggregated data – we’re also getting the survey tools, the Stata code and the files for the processing scripts for how the data is analysed.  It’s exploding out into all the different stages of research.  If you’re thinking about reproducibility of research, you still only see tiny snapshots of that.  I’d like to do more about that: my frustration is that we don’t have software to document all stages of the research process.

A lot of those research outputs are useful but also ephemeral.  If you wanted to reapply a questionnaire, you’d have to do an update of it 2 or 3 years down the line.  Research approaches change, the language changes and so on.  But you could actually go back and do a comparison about how interviewing has changed over a specific time period – as long as we start managing those research outputs too, alongside the data and publications.”

Humans of Data 22

“In my previous life as an academic, I always liked interdisciplinary work: to come at things from a slightly sideways perspective. But in this area, I get to encounter more than most people do – collections, ideas, researchers, people, stories … I get to discover everything from every different area of knowledge, from lots of different perspectives.  The data itself is obviously really interesting but it’s what goes into the creation of that data, and what people then do with that data – that’s what’s really fascinating to me.

When people ask me, ‘What do you do?’, I’m still not sure how best to describe it.  Whenever someone asks, I give a different answer, but it doesn’t actually capture what the day-to-day work is about, which is the exchange of social and cultural knowledge.  I think that’s the most appealing thing to me.  There’s always something new to find out about, and this central thing that we call ‘data’ is a conduit into discovery of all kinds of stories and narratives.  It’s a window into lots of different worlds.”

Humans of Data 21

I’m not a data scientist but I know how to read and fiddle with code. This is what drives me – I want to understand and know something practically, not just by reading about it but by getting first-hand experience in collecting data, doing things with it, manipulation. I enjoy this and find it valuable. I do theory about data practice, so I’m interested in asking what data does to knowledge practices, but I’m looking at it as a philosopher rather than anything else. I’m interested in how data can be used to tell stories, but want to take this one step further. How do we use data to make arguments? I’m interested in how we can move to a critical way of looking at argumentation – how we can use data as evidence, to convince, to tell stories. I’m asking what is ‘good enough’ knowledge, what is ‘responsible’ knowledge, what is ‘valuable’ knowledge? What are the ethical considerations about data when we use it to make decisions?

Humans of Data 20

“Still, I’m inspired by the fact that the field is cross-disciplinary.  To be able to talk about digital preservation in a holistic way you need data producers and data consumers including people from information sciences, library scientists and researchers.  With every domain we need to understand a whole new idea of how data is produced and consumed and the use cases for the value of data.  It never gets boring.  There will always be work.  And if I have a question about a file format or metadata problem I can ask colleagues in New Zealand or the States or Scotland or the Netherlands and they know what I’m talking about.  I love that.  To me it’s like a cool kids’ domain!”

Summary of Linked Open Data for Global Disaster Risk Research activity involving Dr Bapon Fakhruddin and Professor Virginia Murray

Dr Bapon Fakhruddin

The fourth Pacific Meteorological Council and second Pacific Meteorological Ministers Meeting (PMMM) was held in Honiara, Solomon Islands, 14-17 August, 2017.

Dr Bapon Fakhruddin’s presentation on end-to-end impact based multi-hazard early warning systems and disaster loss data collection for risk assessment, beginning with community ownership and engagement, was exceptionally well received.  More

Disaster Risk and Resilience Roundtable, 19 June 2017, Wellington, New Zealand

Professor Virginia Murray

The Global Platform disaster loss data working session reinvigorated a high level roundtable followed a seminar on Global experiences on managing disaster risk – rethinking NZ’s policy approach by Elizabeth Longworth (ex UN Office for Disaster Risk Reduction . The roundtable emphasized to strengthen risk governance system of New Zealand. There is a very strong business case to be made for investing in disaster risk reduction. It has been estimated that an annual global investment of USD 6 billion in disaster risk management strategies would generate USD 360 billion worth of benefits in terms of reducing risk. On that basis, New Zealand might expect a return on investment of 60 times for every dollar spent on reducing disaster risk. In terms of creating shared value, investment in disaster risk management has co-benefits of strengthening resilience, competitiveness and sustainability.

The estimates for direct losses are considered to be perhaps 50% under-reported due to the pervasive nature of smaller scale, localised and recurring disasters. It is concerning that, internationally, the mortality and economic losses from extensive disaster risk are trending upwards. For New Zealand and its Pacific Island neighbours, climate change will magnify disaster risk and increase the costs. With the New Zealand economy heavily reliant on the agricultural sector, it is particularly exposed to weather-related events.

In the same way that New Zealand’s approach to social investment requires improved data and analysis, so too does the production of NZ-based risk information and integrated databases. Greater sensitivity as to the causes and consequences of disaster risk could strengthen accountabilities as to disaster impacts.

A modern-day approach to risk governance also requires greater inclusiveness and transparency. New Zealand needs to pursue an ‘NZ-Inc’ approach. The nature of disaster risk necessitates a whole-of-government response. Dr Bapon Fakhruddin attended the roundtable as an expert.

Workshop on developing a disaster loss database for New Zealand, 28 September 2017

MCDEM will be holding an initial all day workshop on 28 September to discuss all elements of the Loss Database Project. 5th Global Platform for Disaster Risk Reduction (DRR) was held in Mexico between 22-26 May 2017. The Platform was hosted by the United Nations Office for Disaster Risk Reduction (UNISDR) and the Mexican government to support the continual progress assessments of the Sendai Framework (SFDRR) implementation. The New Zealand delegation was led by Special Envoy for Disaster Risk Management (Philip Gibson, MFAT) accompanied by officials from MFAT (1) and MCDEM (3), plus a wider NZ Inc. delegation of 20 which comprised representation from academia, NGOs, local government and private sector providers.

Following the Platform, a number of key pieces of work are in progress, or need to be considered to give effect to the Framework, put priorities into action and report on the Global Indicators. Of note, these are:

  • Finalising the National Disaster Resilience Strategy
  • Developing the concept for a National Platform for DRR
  • Developing a National Disaster Loss Database and routine disaster loss reporting
  • Project to develop better methods of pricing risk and forecasting losses

The first project MCDEM wish to seek your engagement on is the Loss Database. This is something given consideration to in the past, but is now critical due to its significance to future Sendai reporting. Unlike previous reporting on the Hyogo Framework for Action that focussed on qualitative data on inputs and outputs, Sendai reporting is focussed on outcomes, i.e. losses from disasters, and whether seeing a downwards trend.

ISCRAM Asia Pacific 2018 Conference, Wellington, New Zealand

Dr Bapon Fakhruddin and Professor Virginia Murray will be chair a session on disaster data Issues for situational awareness in the ISCRAM Asia Pacific Conference in late 2018 (http://www.confer.co.nz/iscramasiapacific2018/)

Humans of Data 19

“Digital preservation is a perfect field because it unites two things I’m passionate about: humanities and IT.  I can work on a framework to keep the data for future generations.  It’s always been important to do that whether the data is analogue or not.  Data presents evidence, evidence that’s subject to story telling and interpretation.  It opens up unlimited possibilities.  If you want to understand how a community ticked at a certain time, literature gives you a representation of the time, of what moved people.  Data that we create today can do the same thing.

Data can be literature, poetry, art or factual experimentation.  It’s not just an output of research; it’s an output of creativity and of our life today.  Sometimes we forget that.  
 
But we should spend more time talking about what works and what doesn’t work.  We need to not always invent new models, but apply a model and see what happens – to use models and tools to curate and treat our data, and then it’s very important to look at these tools critically.  And to improve them. There’s a lot of great output that has come out of projects but does anyone use it?  There’s a gap in implementation.  And funding’s becoming scarcer, so we need to find more effective ways to make tools sustainable and useable for the user communities.  It’s frustrating.”

 

ENERGIC-OD: International co-operation to promote FAIR GIS Open Data and the growth of European SMEs

Giuseppe Maio

This post was written by Giuseppe Maio and Jedrzej Czarnota. Giuseppe is a Research Assistant working on innovation at Trilateral Research. You can contact him at giuseppe.maio@trilateralresearch.com . Giuseppe’s twitter handle is @pepmaio

Jedrzej is a Research Analyst at Trilateral Research. He specialises in innovation management and technology development. You can contact Jedrzej at Jedrzej.czarnota@trilateralresearch.com, and his Twitter is @jedczar.

The value of open data business is increasing at a very fast pace. The open data market is projected to be worth over 75 billion in 2020. Yet, accessing this expanding market is not easy. Open data sources are difficult to find, not interoperable and  hardly reusable, as argued by a recent Open Knowledge Foundation’s report.

Jedrzej Czarnota

ENERGIC-OD, a European Commission project, aims precisely to facilitate access to the open data market in the Geographic Information System (GIS) sector.  The project built a pan-European Virtual Hub (pEVH) simplifying the access to and the use of GIS open data in Europe. Readers can view and utilise the pEVH here.  pEVH brokers together an infinite number of geo-spatial open data sources, harmonising them, rendering them accessible through a single API and ready to be reused for various purposes. pEVH-brokered data is available under freemium licence:  data is free to use and users can pay for some extra features of the pEVH. The freemium model guarantees the promotion of knowledge exchange, the extraction of value from such an exchange and from the services provided by the actors involved.

ENERGIC-OD functions as a data facilitator by improving the quality of the open data available in the GIS sector: the pEVH was designed to ensure that data is aligned with FAIR principles.  These principles advocate that open data should be easy to Find, Accessible, Interoperable and freely Reusable.  pEVH-brokered data is FAIR as the single website where data is stored allows GIS OD sources to be much more findable than before; the single API adopted by ENERGIC-OD makes data usable and interoperable; finally, the freemium model guarantees the re-usability of the data.

To demonstrate the viability of the pEVH, ENERGIC-OD consortium developed 10 applications based on VH-brokered data. These applications range from an app promoting communication between citizens and land consolidation authorities, to a coastline monitoring system that allows people to participate in the scientific observation of coastlines.

 

ENERGIC-OD is committed to enhance innovativeness among SMEs and incentivise local economic development across Europe. Such objectives appear achievable for three reasons.

Firstly, the FAIR principles characterising pEVH-brokered data facilitated SMEs’ ability to utilise GIS data sources, as ENERGIC-OD lowers entry barriers, preventing the usage of such data.

Secondly, the main features of GIS render this branch of IT extremely suitable for business (Azaz 2011). These features are: 1) spatial imaging, namely GIS’s ability to convey information with a spatial dimension; 2) database management: GIS’s capability of storing, manipulating and providing data; 3) decision modelling, or GIS’s potential to provide intelligence supporting decision making; 4) designing and planning, namely GIS’s potential as a design tool (Azaz 2011). Digital mapping, marketing, transportation and logistics, design and engineering, etc. are some of the sectors which have successfully utilised GIS for business. GIS’s potential can be further exploited coupling GIS systems with modelling tools, the so called “intelligent GIS” (Birkin et al 1995). The retail sector has already utilised intelligent GIS integrating shops’ data and spatial pattern data over time to design spatial interaction models and forecast maintenance costs as well as revenue streams (Altaweel 2016). An example of ENERGIC-OD intelligent GIS app is Natural hazard assessment for Agriculture application. Using satellite imagery, this app delivers predictions of the yield reduction in specific crops based on statistical models, considering factors such as draught, humidity, frost, etc.

Thirdly, small and medium enterprises are the greatest beneficiaries of the open data movement, as they are guaranteed free access to data they would not normally have access to and they are more likely to take advantage of open data and become drivers of innovation (Verhulst and Caplan 2015). SMEs constitute the backbone of the European economy and ENERGIC-OD thus functions as a facilitator for these businesses, enabling them, through the put in practice of FAIR principles, to tap more easily into the GIS open data market.

An initial market research conducted by Trilateral Research, a technology consultancy member of the ENERGIC- OD consortium, confirms SMEs’ high interest in the pEVH. These enterprises will, in the next years, drive innovation and economic growth across Europe.  ENERGIC-OD thus represents an example of international cooperation to promote FAIR GIS open data and the growth and development of European SMEs.

References

Altaweel, M. (2016). GIS and Small Business Planning ~ GIS Lounge. [online] GIS Lounge. Available at: https://www.gislounge.com/gis-small-business-planning/  [Accessed 11 Sep. 2017].

Azaz, L. (2011). The use of Geographic Information System (GIS) in Business. International Conference on Humanities, Geography and Economics, pp.299-303.

Birkin, M., Clarke, G. and Clarke, M. (1995). GIS for Business and Service Planning. [online] Available at: http://www.geos.ed.ac.uk/~gisteac/gis_book_abridged/files/ch51.pdf  [Accessed 11 Sep. 2017].

Verhulst, S. and Caplan, R. (2015). Open Data.A Twenty-First-Century Asset for Small and Medium-Sized Enterprises. [online] Available at: http://www.thegovlab.org/static/files/publications/OpenData-and-SME-Final-Aug2015.pdf  [Accessed 11 Sep. 2017].

Humans of Data 18

“I work in a university library but was trained as an engineer.  When I was doing my PhD, my advisor claimed engineering was a liberal art, which I didn’t understand then but I get it now: statistics and computation are all methods.  You need to think about people, products and processes, and the workflows that connect them.  So I brought that to library world and the research data management world, and it’s definitely an interesting space for people, products, processes and workflows.

I’ve always felt very welcome in this community. When I came I didn’t have the Library and Information Sciences degree or the background training but even in the early stages of my interaction, the community was very open, welcoming and accepting.  I try to return that to anyone who is new.

I hope we continue those positive trends in diversity and inclusion. There seems to be more awareness now about that but I think we’ve all been to that panel where you think, ‘Hmm, this isn’t right – everyone there looks the same.’  It’s frustrating when those more formal channels of conferences, things like panels, sometimes aren’t reflective of who’s in the audience.  So here, in research data, it’s a healthy community in many ways but we can always look at what can be done better.”

 

Report to ICTP, Trieste CODATA-RDA Research Data Science Summer School (10th – 21st July, 2017)

This post was written by Neema Simon SUMARI, a Tanzanian national working at the Sokoine University of Agriculture (SUA), at the time of writing. Currently, She is a Ph.D. researcher specializing in Remote Sensing, Cartography and Geographical Information Engineering at the University of Wuhan in China. ​ She holds an M.Sc. and B.Sc., both in Computer Science, from the Alabama Agriculture and Mechanical University (A&M) in the United States of America. Her participation was kindly supported by ICTP and Nature Publishing, via CODATA.

I first heard about CODATA in July 2016 when I attended an International Workshop in Beijing in 2016. I was very happy and excited to meet new people there, learning new things and seeing new places. It was the first time I had participated in an International workshop/conference, and the first time to experience this in China where I am now doing my Ph.D.  Through that workshop, I made lots of new friends and built a strong network of people in and out of my field of study.

The CODATA-RDA Research Data Science Summer School in Trieste, Italy, in July 2017, was the best for me. The summer school was amazing, we exchanged academic knowledge as well as building on our existing networks. I wanted to learn and meet new people, ideas and experience different cultures and CODATA and Springer-Nature supported me in attaining these goals. It has been amongst the best experiences in my life. I met a lot of fascinating people from all over the world, expert professors whose lectures were very interesting and helpful to my academic career. I created strong friendships that I hope I will be able to maintain over the next few years if not more.

At the closing session ceremony, Dr. Simon Hodson, Executive Director of CODATA, asked the participants: “so, what have you learned? and what will you do next?” What I have learned was the idea of Open Science and its principles was a major theme of the summer school. I have learned different issues on why data cannot be shared, how can be analyzed, which data has long term value as well as benefits of storing, protecting, sharing, and publishing data among research scientists. It’s true that most of the researchers would like their data to be publicly stored and accessible by other researchers, however, this is not easy for researchers who do not have clearly defined ways to do this, or do not know how to make their data accessible to others. Knowledge of data management plans for the hosting research institutes is required to ensure that researchers can define ways to store their datasets in a publicly accessible way after their experiments are done. Once the research data is stored in a publicly accessible manner, it then needs to be preserved in a format which can be reused by other researchers. In this summer school, the courses that were taught were: Programming-in-R, Cloud Computing, UNIX Shell, ggplot2, Data Visualisation, SQL, Machine Learning, Data Science Profession, Artificial Neural Networks, Research Computational Infrastructure, HOC and HTC, Research Data Management. These courses gave us very good skills and knowledge about Data Science which can help us to facilitate the sharing of data – it was a great experience. I now know why Open Access and data sharing is important and I will apply and share this knowledge to my professional and social media networks.

Last but not the least, was the wonderful arrangement of having helpers to assist us with any logistical problems occurring during the practical sessions and the use of pink sticker was an outstanding method.  It was one of the most enjoyable and informative moments of my life.

Thanks to CODATA, RDA, ICTP, and Springer-Nature for your support, as well as to all my fellow participants for making it possible and fun.

Report to ICTP, Trieste CODATA-RDA Research Data Science Summer School (10th – 21st July, 2017)

This post was written by Shaily Gandhi, who is currently pursuing a PhD in Geomatics from CEPT University, India. Shaily recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy – her participation was kindly supported by ICTP and Nature Publishing, via CODATA.

The CODATA-RDA School of Research Data Science was a great opportunity for me to work with around 45 students from 29 countries (mostly from lower and middle income countries) and from varied educational backgrounds. Such summer schools or short courses can be the best platforms for learning innovative ways of teaching as well and understanding the work done by different people in the same area. The summer school introduced me to various aspects of data science and intensive hands on training: it has stimulated in me the confidence to start working with concepts which I had just read in books. Now I will be able to implement machine learning and artificial neural networks in my PhD study in Geomatics for developing predictive models.

The school uses the Software Carpentry / Data Carpentry approach of having the students provide daily feedback on pink or green stickers (which signify XXXX). This was a factor which made each us feel that our opinions count. I am very thankful to the organizers who have been on their toes and have been working long hours to make the summer school run smoothly. While working closely with leading academics in the field of data science, it was one of the most wonderful experience for me which not only taught me but also it helped in improving my teaching skills. I have observed many small things in their teaching which I would like to implement in the coming semester’s teaching.

Practical teaching and the use of sticky (this picture was taken during the git session summer school 2017 Trieste)

One of the things which caught my eyes on very first day was the way of using the pink and green sticks for indicating if you are good with the practical or if you need help. I will definitely use this in my teaching because teaching practicals becomes very difficult to handle with a large class and if everyone is waving or calling it makes the environment very noisy.

Women in Research Data Science

Apart from technical learning there was a wonderful experience of cultural exchange. One of the most interesting topics which I discussed with Gail Clement from the California Institute of Technology (who introduced us to Author Carpentry) was the loss of academic identity that can be experienced by women who change their name after getting married (and in some countries this change of name is obligatory). She explained that according to the research men’s research works are more cited then women’s: there are many reasons for this and the loss of identify can contribute as computer search mechanisms and bibliographic tools do not necessarily link the works of women prior to and subsequent to a name change. This is one of the important reasons for a recognised and standardsised researcher ID system: for women who have changed their names, having an ORCiD account will help will keep all your academic work associated with on single researcher ID number. Gail also suggested that it would be better if female researchers could retain both the last names which could “help you built your identity and reputation in the professional world”. Many more interesting discussions regarding the ignorance of credit for work were also brought up. In few institutions are the people doing data analysis included as co-authors to the publication: Gail suggested that a standard criteria should be developed and implemented, such that all contributors (including data analysts and data stewards) are credited and the credit for your contribution stays with you.

Working with Irma and Oscar on some super cool projects (from left to right)

I had a great learning experience by working with people from different countries in groups. Throughout the school, we were working in different groups with different people which gave us lot of exposure to understand the varied situation of data science in different countries. We worked on a project which allowed us to make work on the same file using Git and in the second project we coded the neural network model in python.

The Bring Your Own Data session offered good suggestions regarding my problems with data and the confusions which had been addressed by other students in the summer school working in the same area. I learned a lot about statistical analysis from other students, including Felix Anyiam (Data Analyst, University of Port Harcourt Teaching Hospital (UPTH)) and Ola Karra (Lecturer, Department of Statistics, University of Khartoum).

Friends at help with Statistics Laba, Felix and Ola (from left to right)

This summer school gave us first-hand experience on many languages and command line interfaces: topics included DOS, R, Shell, Github, visualisation of data in most beautiful ways, machine learning, artificial neural networks other machine learning systems and recommender systems.

Working with Github was an excellent experience. I had been using google drives to work on shared presentations but Git looks pretty cool and would like to use it for my future work to share data and work in a shared environment.

It was great working on the research computation infrastructure with all the participants working on different systems and learning how to submit the job and get the job done using external resources. We were taught how to get access to super computers from different geographical locations: this enables researchers to keep going as it allows you to work from any part of the world. Resources to run the processes can be allotted from different locations.
Finally, we also got a good insight into research data management, referencing systems and wonderful tips for publishing and licensing work.

Friends of Data Science

Map of Student participants:

 

I am very thankful to ICTP for accepting my application and supporting my stay in Trieste. I am very grateful to Nature Publication, via CODATA for funding my travel which gave me an opportunity to attend this summer school on big data Science.