Open Source Rollup for Week #37

What a busy week it has been at Lookout! In summarizing this week’s GitHub activity I’ve had to go through 6-7 pages of my GitHub News Feed. Like last week the majority of the activity was on more backend-focused repositories such as:

  • ngx_borderpatrol, our edge-authentication proxy, saw a few bug fixes and enhancements.
  • elementary-rpc, a Ruby gem for protobuf-based RPC, received a few minor bug fixes and documentation improvements.
  • Hermann, a Ruby gem bringing librdkafka into MRI, saw a huge number of bug fixes, API changes and fixes in some of the lower level C binding to librdkafka.

New repositories

This week we’ve open sourced 1 repository: mudskipper, a utility for extracting change-events from MySQL binlogs. In case you missed it, Beau wrote a great introductory post for the Go-based tool.

Finally, the raw GitHub numbers:

  • 19 git pushes
  • 18 pull requests created
  • 22 issues created
  • 2 comments on pull requests
  • 8 commit comments
  • 1 repositories open sourced

Contributors


egeste

ismith

phrinx

rothrock

rtyler

rwygand

stancampbell3

trane

wkimeria


posted in: · ·



Capturing Change Data with Mudskipper

Engineering at Lookout diiferentiates itself with its embrace of a heterogeneous coding environment. Although Ruby usually carries the day for server-side components, we write in Java, Objective-C, Bash, R, C, Scala, Python, and Go.

One such project built in Go we’re open sourcing today: Mudskipper, a utility for extracting change-data events from MySQL binlogs. Currently at Lookout, Mudskipper captures about 20 million events per day from 7 MySQL tables.

When I set out to build Mudskipper, I had two goals in mind:

  1. Find a way to capture change-data for select tables in our MySQL databases.
  2. Explore novel features in Go like channels and goroutines.

What Is Change-Data?

Change-data is metadata stored in tabular format that embodies all the changes to a database table during a window of time. It differs from a log mainly in that it is stored in a row-column format meant for querying by a database system. For example, a banking application might have an account_balance_audit table that keeps track of all the changes to the account_balance table. Historically, the _audit table would be populated by triggers on the table it is shadowing. This puts extra workload on the database and introduces more complexity in the application.

Mudskipper Takes A Different Approach

With row-based binary logging enabled, Mudskipper can scan the binlog stream and selectively extract change events. Mudskipper decouples change-data capture from the event or app that caused the change. Decoupling lets us spread the data capture effort across multiple, independent processes and possibly across many CPUs. Database applications no longer need to maintain custom logic for capturing change-data. Auditing the application is further isolated from the app itself.

Why Go?

First, it’s fun to be in the avant-garde.

Second, my background is in C, and Go feels like a well-planned, 21st century version of C.

Third, I have never much cottoned to object-oriented languages. Go dispenses with classes and inheritance. See Is Go an object-oriented language?

Finally, goroutines and channels offered a simple approach to distributing the workload over more CPU cycles. Goroutines and channels also encourage a coder to think more about loose coupling and tight cohesion.

In Mudskipper, the binlog scanning and extraction is separated from the effort of writing the output. Moreover, the scanning process is implemented as a dynamic pool of goroutines. When lots of binlogs show up quickly, the code easily brings on more scanners. They spin down when the workload slacks.

Mudskipper is very much a work in progress. I invite you to check out the code, offer comments, and send pull requests.

- Joseph (Beau) Rothrock

posted in: · · ·



Open Source Rollup for Week #36

Over the past week we’ve been doing quite a lot of hacking within our GitHub organization. Unlike previous weeks we’ve actually very few JavaScript related commits and a plethora of backend tooling changes.

Most of the changes aren’t quite ready to summarize but if you’re interested in following more real-time development and changes, be sure to watch the following repositories:

The numbers:

  • 29 issues created
  • 12 git pushes
  • 9 pull requests created
  • 4 commit comments
  • 3 repositories forked

Contributors


MadanThangavelu

egeste

ismith

rtyler

stancampbell3


posted in: · ·