DSPy a system for AI to Write Prompts and Do Fine Tuning
Zeppelin at twitter (sf data science meetup, july 2016)
1. Zeppelin at Twitter
Prasad Wagle
Technical Lead, Data Platform
twitter.com/prasadwagle
July 13, 2016
SF Data Science Meetup
Galvanize, San Francisco, CA
2. Twitter Data Pipeline Overview
Production
systems
Presto
Vertica
MySQL
Scalding
Spark
R
Custom Dashboards
Tableau
Zeppelin
Command line tools
HDFS
Analytics Tools
Analytics Front-ends
3. One company-wide server
860 notes
4000 paragraphs
1500 Vertica, 1500 Presto, 300 MySQL
400 Markdown
300 Scalding, Spark, etc
850 users
Zeppelin Usage Metrics
4. Field of (Data) Dreams
Started as hackweek project in Dec 2015, Beta: Jan 27
Number of notes created
Feb - 270, Mar - 350, Apr - 470, May - 560, Jun - 750
5. Report Creators
Product managers (dashboards, product analytics)
Data scientists
Sales analysts
Engineers and SREs
Report Viewers
Anyone in the company
Zeppelin Users
6. My mind is absolutely blown away by the ease of use, speed, and power of
Zeppelin. I've been wanting a tool like this at Twitter my entire time
working here.
started playing with @ApacheZeppelin. amazingly addictive!
Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast
(and we're even using it for daily tracking of our OKRs amongst all the
other metrics)
Zeppelin Testimonials
7. Very easy to create and share reports
Web based
Works seamlessly with analytics engines
JDBC
Non-JDBC - Scalding, Spark
Open source (easy to add features)
Reasons for adoption
8. Drag and drop report builder
Can create complex queries without SQL knowledge (e.g. Top N)
Polished UI
Filters and other transformations work on extracts
no new database queries (fast)
Row level permissions (for sales reports)
Tableau
10. Authentication
Integrated with Twitter’s homegrown single sign-on system
SSL
Integrated with Twitter’s homegrown key distribution system
Notebook authorization
Data source authorization
Work Done (Security)
11. Websocket deadlock issue with Jetty 8
reduce communication
remove synchronized block (risky, will move to Jetty 9)
Monitoring
Standby server
Backups
Work Done (Stability, Operations)
13. Notebook authorization
Data source authorization
Run scheduled notes with a user
Scalding interpreter
Reduce websocket communication
Paragraph footer
Row level permissions
Work Contributed to Apache Project
15. Features and UX
Notebook organization (folders)
Email reports and alerts
Row level permissions like tableau
Operations
Monitoring (end-to-end query)
Admin (view/stop running jobs, resource usage)
Failover
Continuous Integration
Future / Work in Progress
16. There is a real need for Notebook style interface for data analysis
Zeppelin is enterprise ready, flexible and easy to use
Zeppelin user and dev community is awesome
Takeaways