Recent & Upcoming Talks

All Presentations

Research

Software and Data

AlgebraicJulia

AlgebraicJulia is an ecosystem of software tools written in Julia to explore and promote the application of algebraic methods in computer science and scientific computing.

SemanticModels.jl

SemanticModels.jl is a package for representing scientific models at the semantic level to enable augmented scientific reasoning. It applies techniques from applied category theory to build mathematical models of scientific modeling practices.

Julia Graphs

JuliaGraphs is the primary organization dedicated to the advancement of graph theory and algorithms in the Julia Programming language. The flagship project is LightGraphs.jl the premier graph library in the Julia Ecosystem.

Stinger Graph Analysis

Dynamic graphs are all around us. Social networks containing interpersonal relationships and communication patterns. Information on the Internet, Wikipedia, and other datasources. Disease spread networks and bioinformatics problems. Business intelligence and consumer behavior. The right software can help to understand the structure and membership of these networks and many others as they change at speeds of thousands to millions of updates per second.

Recent Publications

Here are some of my recent publications for a complete listing you can find All Publications or All Conference Publications and All Journal Publications.

Scientific computing is currently performed by writing domain specific modeling frameworks for solving special classes of mathematical problems. Since applied category theory provides abstract reasoning machinery for describing and analyzing diverse areas of math, it is a natural platform for building generic and reusable software components for scientific computing. We present Catlab.jl, which provides the category-theoretic infrastructure for this project, together with SemanticModels.jl, which leverages this infrastructure for particular modeling tasks. This approach enhances and automates scientific computing workflows by applying recent advances in mathematical modeling of interconnected systems as cospan algebras.
ACT-2020

Scientists construct and analyze computational models to understand the world. That understanding comes from efforts to augment, combine, and compare models of related phenomena. We propose SemanticModels.jl, a system that leverages techniques from static and dynamic program analysis to process executable versions of scientific models to perform such metamodeling tasks. By framing these metamodeling tasks as metaprogramming problems, SemanticModels.jl enables writing programs that generate and expand models. To this end, we present a category theory-based framework for defining metamodeling tasks, and extracting semantic information from model implementations, and show how this framework can be used to enhance scientific workflows in a working case study.
ACT-2019

Law enforcement requires methods of digital evidence collection from victim or witness devices in a minimally invasive manner. Victims and witnesses are often concerned with minimizing the exposure of data on their phone to authorities. In this paper we describe a system for the secure submission of digital evidence and a micro-service for creating and monitoring chain of custody. These tools minimize device data exposure, encourage cooperation from victims and witnesses, and enforce accountability with regards to handling digital evidence.
HPEC-2018

Community detection in graphs is important to the study of social networks, power grids, and complex biological networks. Graphs can be stored in computer memory in diverse representations with differing performance characteristics. Community detection algorithms create a dynamic graph as an internal data structure for tracking agglomerative merges. This community (block) graph is modified heavily through operations derived from moving vertices between candidate communities. We study the problem of choosing the optimal graph representation for this data structure and analyze the performance implications theoretically and empirically. These costs are analyzed in the context of Peixoto's Markov Chain Monte Carlo algorithm for stochastic block model inference, but apply to agglomerative hierarchical community detection algorithms more broadly. This cost model allows for evaluating data structures for implementing this algorithm and we identify inherent properties of the algorithm that exclude certain optimizations.
HPEC-2018

News media biases and propaganda are a problem for modern societies, reliance on the internet as a primary news source enables the formation of hyper-partisan echo chambers and an industry where outlets benefit from purveying fake news. While modeling text content of articles is sufficient to identify bias, it is not capable of determining credibility. A structural model based on web links outperforms text models for fake news detection.
WSDM-2018

This paper shows that Julia provides sufficient performance to bridge the performance gap between productivity-oriented languages and low-level languages for complex memory intensive computation tasks such as graph traversal. We provide performance guidelines for using complex low-level data structures in high productivity languages and present the first parallel integration on the productivity-oriented language side for graph analysis. Performance on the Graph500 benchmark demonstrates that the Julia implementation is competitive with the native C/OpenMP implementation.
HPEC-2017

Graphs and networks are prevalent in modeling relational datasets from many fields of research. By using iterative solvers to approximate graph measures (specifically Katz Centrality), we can obtain a ranking vector consisting of a number for each vertex in the graph identifying its relative importance. We use the residual to accurately estimate how much of the ranking from an approximate solution matches the ranking given by the exact solution. Using probabilistic matrix norms and applying numerical analysis to the computation of Katz Centrality, we obtain bounds on the accuracy of the approximation compared to the exact solution with respect to the highly ranked nodes. This relates the numerical accuracy of the linear solver to the data analysis accuracy of finding the correct ranking. In particular, we answer the question of which pairwise rankings are reliable given an approximate solution to the linear system. Experiments on many real-world networks up to several million vertices and several hundred million edges validate our theory and show that we are able to accurately estimate large portions of the approximation. By analyzing convergence error, we develop confidence in the ranking schemes of data mining.
ISC-2017

Increasing volumes of data and the desire for real-time query capability make the development of efficient streaming algorithms for data analytics valuable. Streaming graph algorithms that avoid unnecessary recomputation through clever application of data dependency analysis are often more complex to derive than their static counterparts. This paper discusses a method to derive algorithms for streaming graph analysis from static formulations Combining tuned graph algorithms building blocks with an appropriate functional language, a graph query planner should be able to correctly implement most static and streaming versions of an algorithm from a single mathematical formulation. We provide a detailed analysis for the case of updating triangle counts in a streaming graph using linear algebra and an experimental evaluation in Julia.
IPDPS-2017

Many common methods for data analysis rely on linear algebra. We provide new results connecting data analysis error to numerical accuracy in the context of spectral graph partitioning. We provide pointwise convergence guarantees so that spectral blends (linear combinations of eigenvectors) can be employed to solve data analysis problems with confidence in their accuracy. We apply this theory to an accessible model problem, the ring of cliques, by deriving the relevant eigenpairs and finding necessary and sufficient solver tolerances. Analysis of the ring of cliques provides an upper bound on eigensolver tolerances for graph partitioning problems. These results bridge the gap between linear algebra based data analysis methods and the convergence theory of iterative approximation methods. These results explain how the combinatorial structure of a problem can be recovered much faster than numerically accurate solutions to the associated linear algebra problem.
Compl. Networks

Recent Posts

More Posts

I’ve started the AlgebraicJulia blog with Evan Patterson and the rest of the AlgebraicJulia team, so you can find my recent essays there. A particularly interesting piece is some work with Sophie Libkind on compositional dynamical systems.

CONTINUE READING

My colleagues Nigel Campbell, Evan Stuart, Trevor Goodyear, Winston Messer, and I are happy to present our platform for remote evidence collection from volunteers. Digital Witness is an open source platform for evidence submission. Volunteers can upload media files from their phones while reducing the privacy invasion necessary with full disk capture currently used by law enforcement officers.

CONTINUE READING

The Office of the Director of National Intelligence sponsored a challenge competition. I lead a team at GTRI including Natalie Fitch, and Frank Bradfield to win a prize for our entry “Credibility Development with Knowledge Graphs.”

Challenge Website XAMINE

CONTINUE READING

Open source software is stymied by a lack of funds for maintinance tasks, but companies aren’t coughing up charity money to pay open source developers. Open Source in the Enterprise How do we generate the funds to fund development on open source code? Support and services contracts like RedHat Enterprise Version licenses like Mongo or Neo4j which Gil Yehuda thinks are problematic. The Backwards Commerial License proposed by hueniverse Another problem in open source is that enterprise software vendors have a large incentive to make their software as sticky as possible and lock-in their clients.

CONTINUE READING

I have been using hugo to generate this site for quite a while now and I really like it[^1]. But I needed to migrate my old wordpress blog into a static framework. Fortunately, WP practices ethical software development and makes it easy to get your data out of their software. You can get a MySQL database dump, which is useful for migrating from one hosting provider to another, or get a dump in the form of a json array containing all your posts.

CONTINUE READING

Contact

  • 432 Newell Dr. Gainesville, FL 32611
  • Friday 1pm