Knowledge Distillation — Study NotesMany people working in the DS field may have been sharing some common pains. One of them is scaling. You trained with some…Nov 6, 2020Nov 6, 2020
Data Preprocessing — Deduplication with MinHash and LSHWhen dealing with text preprocessing, one headache a data scientist has to deal with is the duplicated or similar documents.Nov 2, 2020Nov 2, 2020
Add API Gateway to AWS Lambda — a simple Python test caseThis is an attempt to explore AWS lambda. Final output as belowAug 9, 2020Aug 9, 2020
User Sessions with PysparkGenerally speaking, “sessions” is a very important concept for tracking user behavior. It is the idea of splitting the set of behaviors of…Mar 31, 2019Mar 31, 2019
Installing Pyspark on MacPyspark is the abbreviations for Spark Python API. I understand it as a python library providing entry points for spark functionalities.Mar 26, 2019Mar 26, 2019