This was the subject of a popular discussion recently posted on Quora: 20 questions to detect a fake data scientist. We asked our own data scientist, and he came up with a very different set of questions: compare his answer (#1 below – 20 questions) with Quora replies (#2 and #3 below – 30 questions). Note that #2 focuses on statistics, and #3 on architecture. The link to the original Quora discussion is also provided in this article. Which questions would you add or remove?
Many other related interview questions and answers (data science, R, Python and so on) can be found here.
Answer from our data scientist (many of these questions are open questions):
- What is the life cycle of a data science project?
- How do you measure yield (over base line) resulting from a new or refined algorithm or architecture?
- What is cross-validation? How to do it right?
- Is it better to design robust or accurate algorithms?
- Have you written production code? Prototyped an algorithm? Created a proof of concept?
- What is the biggest data set you have worked with, in terms of training set size, and in terms of having your algorithm implemented in production mode to process billions of transactions per day / month / year?
- Name a few famous API’s (for instance Google search). How would you create one?
- How to efficiently scrape web data, or collect tons of tweets?
- How to optimize algorithms (parallel processing and/or faster algorithm: provide examples for both)
- Examples of NoSQL architecture?
- How do you clean data?
- How do you define / select metrics? Have you designed and used compound metrics?
- Examples of bad and good visualizations?
- Have you been involved – as an adviser or architect – in the design of dashboard or alarm systems?
- How frequently an algorithm must be updated? What about lookup tables in real-time systems?
- Provide examples of machine-to-machine communication.
- Provide examples where you automated a repetitive analytical task.
- How do you assess the statistical significance of an insight?
- How to turn unstructured data into structured data?
- How to very efficiently cluster 100 billion web pages, for instance with a tagging or indexing algorithm?
- If you were interviewing a data scientist, what questions would you ask her?
Read the full article here: http://bit.ly/1Wp2qUI