= Specialty in Knowledge Representation (Neo4j, graph databases, semantic technology, data engineering) / Machine Learning / Data Science / Dynamical Modeling,
with a secondary, side specialty in Full-Stack Engineering
Many of my past jobs had at least some research component (as well as development). I work very well independently as well as with teams (former startup founder.)
My 'home industry' is biomedical, but occasionally switching industry helps circulate ideas/technology
May 2023 - Feb. 2025, AbbVie Pharmaceuticals, USA / Europe 100% REMOTE
Educate, advise and assist various Business Technology Units across the entire company on how to best utilize the power of graph databases (primarily Neo4j) for their use cases.
Produces a number of white papers and Architecture Position statements on topics related to Graph Databases, and especially Neo4j.
Expand and polish the Neo4j-based open-source BrainAnnex.org python technology stack,
to simplify and streamline the use of graph databases for data science and for building web apps, including visualization.
Supervise an engineer to port an open-source library from Neo4j to AWS Neptune, to offer more options.
Present in conferences, including at Bio IT World in Apr. 2024 (spoke on effectively using open-source technology for Graph Databases.)
Research vector databases, genAI, and the synergy between LLM's and graph databases (such as Graph RAG); for example, for document retrieval.
Technologies Neo4j, Cypher query language, BrainAnnex (Neo4j open-source technology stack), Python, Flask, Vue.js, Plotly, Javascript, Jupyterlab, AWS, Google Cloud Platform (GCP), LLM.
Feb. - May 2023, Arista Networks, San Francisco area 100% REMOTE [SHORT-TERM PROJECT]
Utilize the Neo4j graph database, and the Neo4j-based open-source BrainAnnex.org python technology stack, to explore ways to better manage suppy-chain data.
Explain and demostrate at length, to a group of Data Engineers, how Neo4j and the Brain Annex technology stack work, and how to use them.
Enhancements to the Brain Annex technology stack, and substantial extension of its documentation and tutorials.
Set up of the full Neo4j-based technology stack on a VM on the Google Cloud.
Technologies Neo4j (community edition, v. 4), Cypher query language, BrainAnnex (Neo4j open-source technology stack), Python, Flask, Vue.js, Javascript, Jupyterlab, Google Cloud Platform (GCP)
(REASON FOR SEPARATION: short-term demonstration project - "what use can Neo4j be put to, in this company" - which lost funding)
Aug. 2022 - May 2023, SciFind.net, Los Angeles 100% REMOTE
Consulting work involving attending longevity conferences, and other research/networking, to help the company expand to longevity-science researchers its services for building a collaboration network for scientific experimentation.
Some of my reports [now archived] can be seen here.
Early consulting on switching the company to a new technology stack, based on a Neo4j graph database.
(REASON FOR SEPARATION: the company was a startup, and their expected funding did not materialize)
Dec. 2021 - July 2022, Zimperium, USA / Europe 100% REMOTE
IN BRIEF: Research project to utilize Neo4j and D3/Cytoscape visualizations to import, analyze and explore large amounts of complex linked data
Design, develop and manage the entire technology stack: cloud hosting, graph database, data modeling/engineering, API back-end and front-end... all utilizing my open-source project BrainAnnex as an engine for the Neo4j libraries, the API layer and the UI.
Create a Neo4j data model based on existing complex linked data. Bulk-import large numbers of JSON files with that data into Neo4j, and search the graph database for patterns.
Set up and administer the virtual machines and the Neo4j databases on the Oracle cloud.
Technologies Neo4j (community edition, v. 4), Cypher query language, D3.js, Cytoscape.js network visualization library, BrainAnnex, Oracle cloud, Linux, Python, Flask, Vue.js, Javascript, Docker
(REASON FOR SEPARATION: new owner, who decided to re-organize the company - and eliminated my research group)
Feb. 2021 - Oct. 2021, GSK (GlaxoSmithKline Pharmaceuticals), USA / Europe 100% REMOTE [TEMPORARY PROJECT]
IN BRIEF: Research project to enhance the management of clinical-trial data using Neo4j (graph database), and to build a versatile bridge between Data Engineering and Data Science
CONTEXT: Management of Clinical-Trial Data
MISSION: Turn Neo4j into a central hub for clinical-trial data
Merge my Python library to interface with Neo4j with a related library in developement at the company, and expand the combined product, which was eventually made open source by the company. (And I later forked into the "NeoAccess" library, part of the BrainAnnex.org open-source project I lead)
Unit and integration testing.
Development of a UI and API, to interface with Neo4j, using Python/Flask/Vue.js
Data transformation for import into Neo4j
Research and development into a data-schema layer (loosely inspired by RDF) to use in conjuction with Neo4j
SIDE PROJECT: in consultation with colleagues and company executives, start and be the administrator of a group on Longevity Science. Attend Longevity conferences and report on them. (More info)
SIDE PROJECT: research the nearly commercially-available Photonics computers for possible use in the company (details)
TechnologiesNeo4j Enterprise server (v. 4), Neo4j Desktop, APOC, Cypher, Python, PyTest, PyCharm, Pandas, CDISC standards for clinical-trial data, Flask, Vue.js, Javascript, CSS, HTML, Azure cloud, Virtual Machines (Windows and Linux), Docker
(REASON FOR SEPARATION: temporary job, through an agency)
June - Nov. 2020, OpenCures.org (Biomedical/Longevity Science), San Francisco Bay Area, USA / Canada 100% REMOTE
IN BRIEF: Design and oversee implementation of a complete IT stack: cloud infrastructure, Neo4j Graph Database, data modeling, data transfer, API, web app with backend, frontend,
visualization tools, biomedical knowledgebase, data science and machine learning.
Also, be a liaison with scientists and researchers.
CONTEXT: Precision (individualized) medicine and consumer health empowerment.
MISSION: Management of biomarker measurements (especially metabolomics), data access for clients and researchers, development of a biomedical knowledge-base, and data science/machine learning on the data.
KEY ACCOMPLISHMENT: Set up a new IT department, and give the company a platform functioning at every layer of the technology stack.
The company accepted my proposal to ditch a Postgres plan in favor of using Neo4j; the data schema is permanently in flux (research environment into metabolomics and individualized medicine.)
Lead a data migration into Neo4j, overhauling the data model in the process; deploy and administer Neo4j and LAMP stacks on the Google Cloud.
Contrast and compare installed versions of Neo4j with managed ones (Aura.)
Employ Bloom and alternatives (such as Commander) for Neo4j access/easy administration.
Set up a Python programmatic interface to Neo4j, and develop a web app based on it. Back end : REST API's using Flask. Front end: using Vue.js.
Integrate into the company platform parts of my open-source Knowledge Representation and Content Management system, Brain Annex
Deployment of the above web app on a Debian VM on the Google cloud.
Development of a biomedical knowledgebase.
Data visualization, integrated into the web app (SVG, D3.js)
Begin to set up a Data Science/Machine Learning layer (clustering, regression, parameter fitting, etc), to investigate and model existing data.
Assist with marketing, promotion and PR.
Participate in ongoing discussions with investors and collaborating scientists.
Be a liaison with scientific conferences and other companies.
TechnologiesNeo4j, Cypher, Aura, Bloom & alternatives such as Commander (front-ends to Neo4j), Python/Flask, PyCharm, Neo4j Bolt driver for Python, REST APIs, Vue.js, JavaScript, HTML/CSS, SVG, D3.js, Google Computing Cloud, Bitnami VMM’s (LAMP stack, Neo4j), Data Science tools (Python libraries for clustering, regression, etc), TensorFlow
(REASON FOR SEPARATION: the company was a startup, and no additional funding materialized)
Dec. 2019 - May 2020, Evolv.ai, San Francisco Mostly REMOTE
IN BRIEF: Research into creating mathematical/statistical models to predict website visitors' behavior in response to UX (user experience) elements. Implementation in Python / TensorFlow
KEY ACCOMPLISHMENT: Turn the employer's vague intuitions into testable mathematical/ML models
Data fitting, using a Tensorflow-based Machine Learning platform I developed (based on my ongoing research combining Machine Learning and Theoretical Neuroscience / Genetic Algorithms), to reconcile site-visitor traffic with expectations from the mathematical models.
Monte Carlo simulations of paths of web traffic in response to various UX elements.
A genetic-algorithm approach for comparing the relative merits of Heuristics – both general Machine Learning heuristics (such as adaptive Learning-Rate adjustment during Gradient Descent), and heuristics about assessing the impact of UX elements on a website.
Implementation in Python.
TechnologiesTensorFlow, Machine Learning, Genetic Algorithms, Python, Flask, NumPy. Probability Theory / Statistics. AWS
(REASON FOR SEPARATION: Covid hit; the company lost major clients, and downsized)
July - Dec. 2019 [SHORT-TERM PROJECT], Green Mars Consulting, San Francisco 100% REMOTE
IN BRIEF: a variety of IT projects for various clients, including Data Science, Machine Learning, and Backend Web Development.
KEY ACCOMPLISHMENT: Help employer initiate a transition from software-only consulting (a troubled industry in the U.S.!) to a mix that includes Data Science/ML consulting
Machine Learning for associating labeled video of iPhone users' activities to iPhone sensor data (accelerometer, etc.)
Develop micro-services in Django/Python for clinician access to the backend of an Ultrasound Medical Imaging system
Large, distributed Data Engineering for medical radiation-delivery devices (with PySpark)
Data Visualization (Tableau)
Technologies Python, Tensorflow (1.13), Pandas, Django, MySQL, Linux, AWS (EC2, S3, EBS), Go, PyCharm, Deployment (Git, Git-flow, pipenv), PySpark, Tableau, KNIME
(REASON FOR SEPARATION: short-term consulting project)
Feb. 2018 - May 2019, Tuuyi (Federal sub-contractor), Orinda, CA Mostly REMOTE
IN BRIEF: Machine Learning research using a variety of Neural Network architectures, incl. Deep Learning, with a focus on Classification (object recognition.) Comparison of statistical approaches with ML ones.
KEY ACCOMPLISHMENT: Help employer successfully renew grant from Federal govt. Quote from employer: “You're the only one in the team whose research I included in my grant proposal”
Machine Learning to identify object types from a times series of radar data.
Basic research into a wide variety of ML approaches (including RNN's, CNN's, Attention, etc), as well as tools (especially Tensor Flow.)
Exploration the effects of various architectures towards the network's performance.
Ad-hoc approaches, such as complex pruned-tree searches in hyper-parameter space, inspired by my ongoing personal research in Neurocomputing/Theoretical Neuroscience.
Comparison of neural networks vs. traditional statistical models (such as Gaussian Mixture Model.)
Cross-disciplinary research (traditional data science vs. theoretical neuroscience.) For example, insight into applying global (downstream) knowledge locally (upstream), discussed in this blog entry, or insight into decomposing large search spaces into sub-spaces. One of the several research projects is discussed in this blog entry
A small part of the code I developed for this job, a Python generator of training data for Machine Learning, is available in my GitHub repository
Technologies Tensor Flow, Deep Learning (incl. RNN's, Neural Turing Machines, CNN's, Residual Networks, Attention, Squeeze-and-Excite Networks), Python, Flask, NumPy, SciPy, Scikit-Learn, MatPlotLib, Pandas, PyCharm/Jupyter Notebook IDE's, Linux remote connections (XPra), MatLab (Octave), Mathematica
(REASON FOR SEPARATION: the owner retired, and terminated the business)
June 2017 - Feb. 2018, Melissa BioInformatics, Berkeley, CA. Mostly REMOTE
IN BRIEF: Assimilation and transformation of large biological datasets, using Semantic Technology (especially Graph Databases)
KEY ACCOMPLISHMENT: Demonstrate that technology used by the company was inadequate - and identify much-better technologies for the mission
Tech Lead : recruit, interview, train and lead other Engineers (both local and distributed team.)
Research, integrate and manage large biological datasets, such as Reactome, Drugbank and UniProt.
Evaluate the best tool: in particular graph databases such as Neo4j vs. RDF triplestores such as Blazegraph and Virtuoso. Evaluate various semantic software, such as Protégé (ontology editor.)
Develop software to convert relational databases to semantic databases, and other inter-conversion (such as between Neo4j and RDF)
Research Semantic Theory and Technologies, including graph databases, RDF, RDF*, RDFS, Owl, N-ary predicates, reification.
Develop overall system architecture and protocols for an Integrated Biomedical Knowledge Hub (datasets import, integration and web/API presentation.)
Research taxonomies and ontologies to form a knowledge base to model and query knowledge graphs.
ETL on patient data. HIPAA certification.
A couple of entries in my science/tech blog were inspired by that job: A Brief Primer on Proteins for Bioinformatics Non-Biologists and [Semantic Technology] RDF Triple Stores vs. Property Graphs : How to Attach Properties to Relationships
Technologies Semantic Technology, Neo4j graph database, Cypher and SPARQL query languages, RDF, RDF*, RDFS, Blazegraph, Virtuoso and other Triplestores, MySQL Server, ETL (Knime), AWS (EC2, EBS), Windows Server 2008, PHP, Python, Knowledge Explorer (UI front-end to Triplestore databases)
(REASON FOR SEPARATION: the company got acquired by a larger entity, and there was consolidation of positions)
Feb. 2015 - June 2017, OncoMed Pharmaceuticals (now part of Mereo BioPharma Group), Redwood City, CA (2 different groups) Mostly REMOTE
IN BRIEF: Assist Senior Data Scientists (Machine Learning and Statistics), following early phase involving software development (3/4 Backend + 1/4 Frontend)
Jan. 2016 - June 2017 (later group):
Modifications and optimizations (such as dropouts and hyper-parameter tuning) to Deep Neural Networks for bioactivity prediction.
Train CNN Neural Networks for analysis of cytological images of cancer tissue.
Creation of an in-house science journal recommendation system, using NLP.
Merging large datasets using Spark.
Statistical analysis using R and MatLab (Octave) of data about monoclonal antibodies against cancer stem cells, at times collaborating with staff at the partner company Celgene.
Feb. - Dec. 2015 (earlier group):
Supervise other programmers and several technicians.
Backend and frontend development, in particular to manage data from clinical trials of monoclonal antibodies.
Turn scientific software into web apps.
Provide scientists with solutions and support for databases and web apps.
Create APIs and RESTful endpoints, in particular to facilitate collaboration with the partner company Celgene.
Migrate local servers to the AWS cloud.
TechnologiesTensorflow, Pytorch, Python, NumPy, SciPy, Neural Networks, CNN, NLP, MatPlotLib, PyCharm/Jupyter Notebook IDEs, Linux, Spark, R, MatLab (Octave), Tableau, Statistical techniques (correlation, regression, random forests, PCA - Principal Component Analysis, etc.), SQL, AWS (especially EC2, S3, Elastic Block Store, Lambda, AutoScaling/Elastic Load Balance, RedShift), Flask, JavaScript, Vue.js
(REASON FOR SEPARATION: a corporate merger, leading to the elimination of my group)
2004 - 2014, West Multimedia Content Management, Berkeley, CA (Provider of database, web apps and multimedia solutions)
IN BRIEF: Manage startup company, and lead about a dozen employees. Provide services including data modeling & web-app development for multimedia content management.
Provide services including web-app development and databases for media management.
Hiring and supervising dozens of independent contractors (incl. programmers, video editors, videographers, production assistants, photo editors, and photographers)
Databases (MySQL), backend (PHP) & frontend (JavaScript) programming.
Managing a mix of local and cloud IT infrastructure (AWS: EC2 instances, etc)
Develop APIs to interface websites with Payment Processors for online transactions.
Develop specialized custom search engines.
Web design and search-engine optimization.
Production and editing of photography & video.
Develop a number of content-management tools that later formed the basis of the open-source project Brain Annex, and also led to the creation of a new PHP Web framework (pForce) .
TechnologiesPHP, MySQL, NoSQL, JavaScript, HTML5, CSS, Amazon Web Services (AWS), REST-ful web service, Adobe Premiere, Photoshop, Audition. SEO (Search Engine Optimization.) On Windows and Linux
(REASON FOR SEPARATION: the company ceased business after a market downturn and loss of major clients)
2002 - 2003, Chiron Pharmaceuticals (now part of Novartis), Emeryville, CA
IN BRIEF: Dual role, initially in database management, and later in C++ and Java software development.
Database administration and data modeling: primarily MySQL and Oracle. Train staff on database design.
C++ and Java programming for in-house LIMS (Lab Information Management Systems.) Test and evaluate various commercial LIMS.
TechnologiesOracle, MySQL, JAVA, C++
(REASON FOR SEPARATION: temporary project, through an agency)
1998 - 2002, California Public Health Dept., Berkeley, CA
IN BRIEF: Provide database design, development and support for research scientists, lab technicians & clinical trials
MISSION: Management and expansion of HIV vaccine clinical trials.
Data modeling of clinical and scientific data. Design and administration of Oracle, MS Access and SQL Server databases. Data validation and migration.
Create reports for clinical trial of HIV vaccines. Participate in various aspects of clinical trials.
Document and backup all the databases. Develop and implement QA strategies. Provide Windows computer support.
Train staff in data modeling, design of database architecture, and use of Microsoft Access and other software.
Do consulting and teach seminars on data modeling and database design for various Public Health projects (Case report forms, etc.)
Write VBA programs. Streamline and automate the lab’s data management, resulting in the development of an in-house LIMS.
Test and evaluate various commercial LIMS. Statistical analysis, using SAS, R, Python, MATHEMATICA and Matlab.
TechnologiesSQL Server, MS Access, VBA, SAS, R, Python, MATHEMATICA, Matlab
1996 - 1997, Adax Data Communications, Berkeley, CA
IN BRIEF: Develop C++ software to analyze data packets and troubleshoot prototype network cards developed by the company
Testing, product evaluation and liaison with large customers for data-communications hardware and software.
Tech writing. UNIX system administration. Training of new employees.
1993 - 1996, Life Sciences Division, Lawrence Berkeley Lab, Berkeley, CA
IN BRIEF: Database administration and C++ scientific programming (Revisited with hindsight in my 2019 blog entry.)
As part of research to improve PET scanners, design and administer relational databases of chemical information, in a MAC environment.
C programming for scientific simulations, incl. Molecular Dynamics, on UNIX and PCs. Development of graphic user interfaces on MACs.
Data acquisition from an experimental apparatus to investigate the X-ray scintillation properties of various crystals and powders.
1992 - 1993, Physics Division (Particle Accelerator & Nuclear Fusion Research), Lawrence Berkeley Lab, Berkeley, CA
IN BRIEF: Scientific programming in MATHEMATICA and C++
Programming on UNIX workstations for Physics scientific simulations of charged particles moving through magnetic fields, using MATHEMATICA and C++
1991 - 1992, Dept. of Mathematics, UC Berkeley
IN BRIEF: Teach Math discussion sections to college students
Teach discussion sections for Algebra and Calculus. Attend teaching training workshops and conference.
A Full-Stack web app (Python/Neo4j/Flask/Vue.js)
It contains NeoAccess, a python interface to Neo4j graph databases, and NeoSchema, a schema layer for Neo4j, as well as an API and a UI.
Heavily used by several of my recent employers as an engine to power their use cases.
My previous employer (GSK Pharmaceuticals) still uses a fork
of the NeoAccess library, which I co-developed while working there (graciously made open-source by the company.)
A (soon to be GPU-accelerated) python platform to simulate
chemical reactions and diffusion in biological cells with cellular compartments. JupyterLab-based.
With visualization in plotly/D3.js/Cytoscape.js, and ML to tweak parameters so as to attain desired phenotypes.
“Julian has paid exquisite attention to the internals and given us a foundation that we can rapidly iterate on”
“You're the only one in the team whose research I included in my grant proposal” [Note: he successfully renewed the grant from the federal govt.]
“Julian is very knowledgeable about databases structure.
During his tenure, he undertook a major restructuring of our database and implemented a wide array of automated procedures that were previously performed manually. This included the integration of numerous separate databases into a single unified structure, automated assignment of testing to be performed (based on multiple clinical study requirements), and mechanisms for specimen tracking and data analysis. In addition, the separate databases contained a substantial amount of non-uniform data structure and historical key-entry errors, which required major revision before integration into the improved data structure.
In summary, Julian’s database skills are extensive and he made a substantial contribution to the structure, utility, and accuracy of our clinical laboratory database during his 4-year tenure.”
“[I] would be pleased to have him in my department as an advanced graduate student.”
“Julian is a bright, creative, and motivated student. He has a pleasant personality and good communication skills. I would be glad to have him as my own graduate student.”
“Julian is extremely talented, energetic, and determined … He came to New York University as an undergraduate and enrolled in my graduate course in Theory of Computation and was the best student in a class of some 40.”
“Julian is a young man of exceptional drive and talent. …. I rate him in the top 1% of students with whom I have dealt.”