Data Works MD Conference 2021
We are in the early planning stages for a Maryland data-focused conference in June 2021. If you would like to stay informed, please sign-up for updates
Interested in a side project?
Are you an expert with data and willing to mentor, or are you an up and coming hobbyist looking for a side project to work on? We have put together a group to focus on a few problems working with Baltimore City data and need your help
. The current project focuses on data parsing and analysis for the Baltimore Board of Estimates. If interested, please send us an email
or join us on Slack
to discuss building a side project group.
Considering a career change?
Are you a software or system engineer, data scientist, analytic developer, or cybersecurity expert interested in learning about new opportunities?
Please send us an email
to learn about the opportunities available with our partners.
Are you hiring?
If your company is looking for data scientists, data engineers, software engineers, and other data related experts, please reach out so that we can help our members find new opportunities.
Please send us an email
introducing your company and needs.
Want to be more involved in our data science community? If you have experience running workshops, hackathons, curating newsletters, or are just interested in helping to grow the meetup, please send us an email
Erias has an immediate need for Software Engineers, System Engineers, Test Engineers, Data Scientists, and System Administrators. External referral bonuses are available. For more information, please contact us at email@example.com
Data News and Articles
Resilience and Vibrancy: The 2020 Data & AI Landscape — In a year like no other in recent memory, the data ecosystem is showing not just remarkable resilience but exciting vibrancy. Cloud and data technologies (data infrastructure, machine learning / artificial intelligence, data driven applications) are at the heart of digital transformation. As a result, many companies in the data ecosystem have not just survived, but in fact thrived, in an otherwise overall challenging political and economic context. Tags: AI
State of AI Report 2020 — Now in its third year, the State of AI Report 2020 discusses the research, talent, industry, politics, and predictions for AI in 2020. Tags: AI
CitiStat's Data-Driven Government — Baltimore was a pioneer in implementing data-driven management in local government, but over 20 years it wasn't always embraced. The lesson? Culture and leadership matter. Tags: Data, Maryland
Unpopular Opinion - Data Scientists Should Be More End-to-End — This article argues against splitting the data science process into multiple roles and that data scientists can be more effective by being end-to-end. Tags: Careers
How Tech Companies Can Advance Data Science for Social Good — The data science for social good (DSSG) movement has for years been making datasets about important social issues—such as health care infrastructure, school enrollment, air quality, and business registrations—available to trusted organizations or the public. Large tech companies such as Facebook, Google, Amazon, and others have recently begun to embrace the DSSG movement. Tags: Data, Civic
Data Cleaning Is Analysis, Not Grunt Work — Cleaning data is considered by some people to be menial work that’s somehow “beneath” the sexy “real” data science work. This article claims otherwise. Tags: Data
What If You Were an Evil Data Scientist? — What could they get away with? Are there checks and balances? How bad can it get? Tags: Data
The Next Big Breakthrough in AI Will Be Around Language — The 2020s are going to bring major advances in language-based AI tasks. GPT-3, a state-of-the-art natural language processing tool developed by OpenAI, will soon be able to produce short stories, songs, press releases, technical manuals, text in the style of particular writers, and even computer code. Cloud-AI services will enable the development of a new class of enterprise apps that are more creative (or “generative” — the “G” in GPT) than anything we’ve seen before. Tags: AI, Language
How-To's and Tutorials
Data Orchestration — A Primer —
Data scientists and data engineers are responsible for authoring data pipelines and workflows. Historically individuals wrote cron jobs to orchestrate data but today there are data orchestration frameworks that allow them to programmatically author, schedule, and monitor data pipelines. Tags: Data
Data Science Meets Devops: MLOps with Jupyter, Git, & Kubernetes —
An end-to-end example of deploying a machine learning product using Jupyter, Papermill, Tekton, GitOps and Kubeflow. Tags: Machine Learning, DevOps
Traffic Prediction with Advanced Graph Neural Networks — Researchers at DeepMind
have partnered with the Google Maps team to improve the accuracy of real-time ETAs by up to 50% in places like Berlin, Jakarta, São Paulo, Sydney, Tokyo, and Washington D.C. by using advanced machine learning techniques including Graph Neural Networks. Tags: Neural Networks
How to Win Kaggle Competitions with Anthony Goldbloom —
Anthony Goldbloom is the founder and CEO of Kaggle
talks about his vision for Kaggle, how Kaggle & the competitions have changed over the years, how competitive data science can prepare you for the real world, whether he likes Python or R better – and which jobs we should be worried about losing to AI in the next few decades. Tags: Podcast
Things I Learned to Become a Senior Software Engineer —
Great article describing a number of lessons to being a senior engineer such as learning good habits, paying attention, and embracing fear. Tags: Careers
Data Tools and Resources
Apache Arrow: The Hidden Champion of Data Analytics — Arrow is used by open-source projects like Apache Parquet, Apache Spark, pandas, and many commercial or closed-source services. It provides in-memory computing, standardized columnar storage format, and an IPC and RPC framework for data exchange between processes and nodes respectively. Tags: Tools, Data
Moon — Moon is a novel approach to web applications based on pure functions. An application is a function that uses drivers, programs that provide an interface to the real world. Tags: Tools, Web
Dolt — Dolt is Git for data! Dolt is a relational database, i.e. it has tables, and you can execute SQL queries against those tables. It also has version control primitives that operate at the level of table cell. Thus Dolt is a database that supports fine grained value-wise version control, where all changes to data and schema are stored in commit log. Tags: Tools, Data
Huggingfaces Datasets —
Datasets is a lightweight and extensible library to easily share and access datasets and evaluation metrics for Natural Language Processing (NLP) and more. Tags: Tools, Data
TensorFlow Recommenders — TensorFlow Recommenders (TFRS)
is an open-source TensorFlow package that makes building, evaluating, and serving sophisticated recommender models easy. Tags: Tools, Machine Learning