Create GeomScale Tutorials

5 minute read

Open call for documentation development

About the organization

GeomScale is a research and development project that delivers open source code for state-of-the-art algorithms at the intersection of data science, optimization, geometric, and statistical computing. The current focus of GeomScale is scalable algorithms for sampling from high-dimensional distributions, integration, convex optimization, and their applications. One of our ambitions is to fill the gap between theory and practice by turning state-of-the-art theoretical tools in geometry and optimization to state-of-the-art implementations. We believe that towards this goal, we will deliver various innovative solutions in a variety of application fields, like finance, computational biology, and statistics that will extend the limits of contemporary computational tools. GeomScale aims in serving as a building block for an international, interdisciplinary, and open community in high dimensional geometrical and statistical computing.

The main development is currently performed in volesti, a generic open source C++ library, with R and Python interfaces, for high-dimensional sampling, volume approximation, and copula estimation for financial modelling. In particular, the current implementation scales up to hundred or thousand dimensions, depending on the problem. It is the most efficient software package for sampling and volume computation to date. It is faster by orders of magnitude compared to packages that solve the same problems in several cases. It can compute challenging multivariate integrals and approximate optimal solutions in optimization problems.

It has already found important applications in systems biology, for analyzing large metabolic networks (e.g., the latest human network), and in FinTech for detecting shock events and evaluating portfolios performance in stock markets with thousands of assets. Other application areas include AI and in particular approximate weighted model integration and data-driven power systems in control.

About the project

The problem

GeomScale develops scientific and research oriented software, therefore, detailed and well written documentation is an important requirement to reach the communities, the users, the practitioners, and the researchers it may concern.

GeomScale’s software can solve several complex and high dimensional problems efficiently in various fields; so our aim now is to create the essential tools to make it well-known and easily accessible across open source communities.

The main bottleneck for onboarding new contributors to GeomScale is the nature of the project that requires knowledge from various fields of advanced applied mathematics and theoretical computer science. The creation of a complete and detailed documentation will be a valuable tool towards overcoming that burden. Therefore, solid documentation is a stepping stone to grow our organization to become the reference open source software in geometric and statistical computing in high dimensions.

We aim to adopt the documentation system of divio.

According to this system there are four types of documentation:

  • learning-oriented tutorials
  • problem-oriented how-to guides
  • understanding-oriented explanations
  • information-oriented technical reference

The project’s scope

The GeomScale project will:

  • Audit and collect the existing tutorials currently distributed in blogs, wiki pages and presentations and create a friction log.
  • Using the friction log as a guide for understanding the gaps in the currently fragmented tutorials.
  • Fix old, out of date tutorials (e.g. see this tutorial and a related issue).
  • Write new tutorials that highlight the usage of GeomScale tools in various applications such as biology and computational finance and statistics. The set of tutorials should range from simple intoductory ones to advanced one that provide solution to spesific applications. See this example.
  • Incorporate feedback from tutorial testers (volunteers in the project) and the wider GeomScale community.
  • Collaborate with GeomScale project administrators to select how tutorials should be structured and presented.

Work that is out-of-scope for this project:

  • This project will not create any explanation, rerefence documentation or how-to guides.

Measuring your project’s success

GeomScale receives an average of 10 pull requests a quarter to add a new feature or optimization or propose a bug fix. Many of these pull requests are from previous contributors. We believe that new tutorials will result in more pull requests and more pull requests from new contributors.

We will track three metrics: (a) number of new feature pull requests, (b) number of pull requests from new contributors, (c) standard metrics (number of views, downloads, web traffic to GeomScale site, time on page), monthly after the documentation is published. We will also track the number of contributors who have made more than three contributions overall, starting quarterly after the documentation is published.

We would consider the project successful if, after publication of the new documentation at least three of the following hold:

  • The number of new feature pull requests increases by 10%
  • The number of pull requests from new contributors increases by 15%
  • The number of contributors who have made >2 contributions increases by 5% (beginning the quarter after the documentation is published)
  • The standard metrics increased by 10% on average
  • The number of forks and stars in our repository increased by 10%

Project budget

Budget item Amount Running Total Notes/justifications
Technical writer audit, update, test, and publish reference documentation of GeomScale project 13000.00 13000.00  
Volunteer stipends 500 14500.00 3 volunteer stipends x 500 each
TOTAL   14500.00  

Additional information

Previous experience with technical writers or documentation:

GeomScale has succeffully participated in GSoD 2022 where volesti’s documentation website was built (using doxygen, Sphynx and ReadTheDocs tech stack) and a basic reference documentation was written!

Apart from GSoD participation, members of GeomScale have experience in developing and reviewing documentation. For example in developing and mentoring documentation development for R CRAN package volesti. The review process used was similar to a coding review. Examples of documentation review process in GeomScale are available from github. In addition to this we usually employ a last step of iterating the final draft of documentation to our peers and users for feedback.

Our previous experience with documentation development gives us confidence to apply best practices and well established review processes and tools to the potential GSoD projects. For example, the GSoD project development will take part in github by opening issues for discussion and pull requests (with a documentation tab) to submit/review the technical writer’s contributions.

Previous participation in Season of Docs, Google Summer of Code or others:

All the members of GeomScale have previously participated in Google Summer of Code as mentors or/and as students. Our members have mentored 7 GSoC coding projects under R-project and Boost C++ libraries organizations (2017-2019). The main software package of GeomScale has been substantially enhanced through 3 GSoC projects under the mentoring of two of GeomScale’s members, while one student of those projects (Apostolos Chalkis) became a GeomScale member and GSoC mentor and admin.

The last four years (2020-2023) GeomScale has been selected as a mentoring organization for GSoC. In 2021 we have successfully mentored 6 projects resulting in >9 pull requests 6 of which are merged into our repositories development branch.

We strongly believe that this experience will play a crucial role for successful GSoD projects. In particular, we have experience in communicating with persons of diverged educational and cultural background and focused to give the right tips and guidance for successful projects. Moreover, we are capable of fixing detailed and realistic time schedules based on the technical writer’s profile and potential.

Updated: