2024
Project | Logos for various software packages
Client | BIFOLD | Haralampos Gavrilidis

P2D
P2D is a tool that helps data scientists perform analyses efficiently in Python environments with data in DBMSes*.
Background | Existing tools first transfer the data to the Python environment, and then perform the operations required for the Python analysis. This approach, however, results in large amounts of data transferred, which limits the processing capabilities of the DBMSes. P2D analyzes the Python code and rewrites it to a more performant variant, which "pushes-down" operations to the DBMS, leveraging its execution capabilities and transferring less data to make the overall data analysis perform better.
Briefing | a logo incorporating visual of a data base system and a python to reflect the relevant programming language
*Data Base Management Systems
polyDMS
PolyDMS is a composable Data Base Management System, where multiple DBMS components can be reformed into "new" systems.
Background | DBMSes consist of several components, i.e., parser, optimizer, execution & storage engine. Existing components are tightly coupled into DBMS "monoliths" and components cannot be easily exchanged. PolyDMS proposes a new architecture which enables coupling and/or integrating of individual DBMS components according to diverse use cases, which may lead to better performance.
Briefing | visualisation of a database fragmented into puzzle pieces representing the coupling/integrating of DBMS components into one.

XDB
XDB helps data practitioners combine and integrate data from multiple sources.
Background | In a hypothetical scenario one DBMS holds information about customers and another DBMS holds information about orders. Data scientists needing to perform an analysis must integrate data from these two different DBMSes. Present approaches centralize the data, importing it to an additional DBMS (a so-called mediating engine) to perform the analysis. XDB decentralizes the execution of the analysis, that is, all execution is done on the existing DBMSes, cutting out the need for a mediating engine.

sheet reader
SheetReader is a blazingly fast and resource-efficient spreadsheet
parser. It helps data scientists load their spreadsheets in data science
environments.
Background | SheetReader facilitates reading spreadsheets into data science environments to perform further analyses not possible in spreadsheet systems (such as Excel). Present approaches for training machine learning models in R or Python environments, require users to first load the spreadsheet into a particular environment before performing their analysis. Due to the complexity of spreadsheet structures, existing approaches are not practical on commodity machines (i.e., laptops) which have constrained hardware resources. SheetReader accomplishes this task by performing multiple operations quickly, simultaneously, with reduced CPU load and without wasting memory resources. https://github.com/polydbms/sheetreader-core