April 2020

Upcoming Events

We are looking into hosting a virtual meetup in the next few weeks. Stay tuned for more details!

Past Events

Making Apache Spark Better With Delta Lake | February 2020
Video available here.

Robotics and Machine Learning: Working with NVIDIA Jetson Kits | December 2019
Video available here.

Connect Data and Devices with Apache NiFi | November 2019
Video available here.

Special Session: Introduction to Machine Learning | September 2019
Video available here.

Additional videos are available here.


Upcoming Baltimore Hackathon
Have an idea for Baltimore Hackathon taking place in Spring 2020? Submit your idea here

Considering a career change?
Are you a software or system engineer, data scientist, analytic developer, or cybersecurity expert interested in learning about new opportunities?
Please send us an email to learn about opportunities available with our sponsors and partners.

Interested in side projects?
Are you an expert with data and willing to mentor, or are you an up and coming hobbyist looking for a side project to work on?
If so, please send us an email to discuss building a side project group.

Get involved!
Want to be more involved in our data science community? If you have experience running workshops, hackathons, curating newsletters, or are just interested in helping to grow the meetup, please send us an email!

Erias Ventures
Erias has an immediate need for Software Engineers, System Engineers, Test Engineers, Data Scientists, and System Administrators. External referral bonuses are available. For more information, please contact us at

COVID-19 Data News


An Unexpected Ally in the War With Bacteria — Scientists have struggled to develop new antibiotics. Enter: the machines. This article discusses how machine learning and AI is being used to make sense of mountains of biomedical data to discover new treatments.
For more, click here.

A.I. Versus the Coronavirus — Advanced computers have defeated chess masters and learned how to pick through mountains of data to recognize faces and voices. Now, a billionaire developer of software and artificial intelligence is teaming up with top universities and companies to see if A.I. can help curb the current and future pandemics.
For more, click here.

How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’ — A detailed review of a recent piece of coronavirus to illustrate data science lessons such as actually caring about the answer to a question, performing thorough literature reviews, and being specific and concrete about your theory.
For more, click here.

AI could help with the next pandemic—but not with this one — It was an AI that first saw it coming, or so the story goes. On December 30, an artificial-intelligence company called BlueDot, which uses machine learning to monitor outbreaks of infectious diseases around the world, alerted clients—including various governments, hospitals, and businesses—to an unusual bump in pneumonia cases in Wuhan, China. It would be another nine days before the World Health Organization officially flagged what we’ve all come to know as Covid-19.
For more, click here.

Researchers Will Deploy AI to Better Understand Coronavirus — In the months since the novel coronavirus emerged in Wuhan, China, last December, almost 2,000 research papers have been published on the health effects of the new virus, possible treatments, and the dynamics of the resulting pandemic.
For more, click here.


Coronavirus: The Hammer and the Dance — Extensive article highlighting how strong coronavirus measures today should only last a few weeks, there shouldn’t be a big peak of infections afterwards, and it can all be done for a reasonable cost to society, saving millions of lives along the way. If we don’t take these measures, tens of millions will be infected, many will die, along with anybody else that requires intensive care, because the healthcare system will have collapsed.
For more, click here.

AI Can Help Scientists Find a Covid-19 Vaccine — Artificial intelligence has already played a vital role in the outbreak since day 1—a reminder for the first time in a while that it can be a tool for good. Over the last few weeks, teams at the Allen Institute for AI, Google DeepMind, and elsewhere have created AI tools, shared datasets and research results, and shared them freely with the global scientific community.
For more, click here.

Coronavirus: We Need Better Data Hygiene — Quick article on how poor data hygiene is muddling our thinking and the irresponsible and harmful behavior of most media outlets.
For more, click here.

Data Set: A Short Review of COVID-19 Data SourcesA brief discussion of the pros and cons of some of the popular COVID-19 datasets available. Including 1 Point 3 Acres, John Hopkins CSSE, and the COVID Tracking Project.
For more, click here.

Data Set: COVID-19 Open Research Dataset (CORD-19) — In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
For more, click here.

Data Set: Kaggle COVID19 Global Forecasting Competition — The White House Office of Science and Technology Policy (OSTP) pulled together a coalition research groups and companies (including Kaggle) to prepare the COVID-19 Open Research Dataset (CORD-19) to attempt to address key open scientific questions on COVID-19. Those questions are drawn from National Academies of Sciences, Engineering, and Medicine’s (NASEM) and the World Health Organization (WHO).
For more, click here.

Data Set: COVID-19 Image Data Collection — A database of COVID-19 cases with chest X-ray or CT images with the goal of using these images to develop AI based approaches to predict and understand the infection.
For more, click here.

Data News and Articles


The Secretive Company That Might End Privacy as We Know It — A little-known start-up helps law enforcement match photos of unknown people to their online images — and “might lead to a dystopian future or something,” a backer says.
For more, click here.

Bayesian Product Ranking at Wayfair — Wayfair has a huge catalog with over 14 million items. This article discusses their project to improve product rankings and introduces a new Bayesian system developed at Wayfair to (1) identify products and (2) present them to our customers.
For more, click here.


How-To's and Tutorials

A DIY License Plate Reader with a Raspberry Pi and Machine Learning — A detailed article covering from research through implementation how to build a DIY license plate reader.
For more information, click here.

Goodreads ETL Pipeline — An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform. In short, data is captured in real time from the Goodreads API using the Goodreads Python wrapper, stored on local disk and is moved to the Landing Bucket on AWS S3.
For more information, click here.

Overfitting: A Guided Tour — This post introduces overfitting, describes how overfitting influences both prediction and inference problems, provides supervised and unsupervised examples of overfitting, and presents a fundamental relationship between train and test error. The goal is to provide some additional intuition beyond material covered in introductory machine learning resources.
For more information, click here.

Data Tools and Resources

Aleph — Aleph is a powerful tool for people who follow the money. It helps investigators to securely access and search large amounts of data - no matter whether they are a government database or a leaked email archive.
For more information, click here.

Data Set: Google's Dataset Search — A powerful tool from Google that gives a snapshot of the data out there on the Web. The largest topics that the datasets cover are geosciences, biology, and agriculture. The majority of governments in the world publish their data and describe it with The United States leads in the number of open government datasets available, with more than 2 million. And the most popular data formats? Tables–you can find more than 6 million of them on Dataset Search.
For more information, click here and here.

Data Set: Unprecedented Facebook URLs Dataset  Social Science One and Facebook have completed, and are now making available to academic researchers, one of the largest social science datasets ever constructed. The dataset contains a total of more than 10 trillion numbers that summarize information about 38 million URLs shared worldwide more than 100 times publicly on Facebook.
For more information, click here.
If you are interested in speaking, hosting, or sponsoring a meetup, have opportunities to list, or local news to share, please email

This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
Data Works MD · 101 W Dickman St · Baltimore, MD 21784-9239 · USA

Email Marketing Powered by Mailchimp