Wenjing ZhanKnowledge Distillation — Study NotesMany people working in the DS field may have been sharing some common pains. One of them is scaling. You trained with some…Nov 6, 2020Nov 6, 2020
Wenjing ZhanData Preprocessing — Deduplication with MinHash and LSHWhen dealing with text preprocessing, one headache a data scientist has to deal with is the duplicated or similar documents.Nov 2, 2020Nov 2, 2020
Wenjing ZhanAdd API Gateway to AWS Lambda — a simple Python test caseThis is an attempt to explore AWS lambda. Final output as belowAug 9, 2020Aug 9, 2020
Wenjing ZhanUser Sessions with PysparkGenerally speaking, “sessions” is a very important concept for tracking user behavior. It is the idea of splitting the set of behaviors of…Mar 31, 2019Mar 31, 2019
Wenjing ZhanInstalling Pyspark on MacPyspark is the abbreviations for Spark Python API. I understand it as a python library providing entry points for spark functionalities.Mar 26, 2019Mar 26, 2019