Reading time: 11 minutes
I successfully passed the AWS Certified Big Data Specialty certification on 20th January, 2020. I already hold the AWS Solutions Architect Associate certification and if I were to compare both these exams, I would definitely rate the specialty way higher in terms of difficulty and complexity. It took me more than two months to prepare for the exam (with last three weeks of full time study). Preparation time can vary based on your understanding of Big Data tools.
In this post, I will share the details of the study material I found helpful for the preparation and some tips along the way. Let us get into the detailed sections one by one.
Section 1: AWS Recommended Material
The best way to start your preparation is to get to know what AWS recommends to study and what it expects you to know before you appear in the exam. I highly recommend going through all four points below. I personally completed the 3rd and 4th part after I finished the self-paced courses discussed in the Section 2.
The official Exam Guide.
The official Sample Questions. It will give you fair idea on what kind of questions to expect. You will find the answers at the end of the document.
The official Data Analytics Fundamental training. It is a free course and you can access it with your AWS Training and Certification account.
The official Exam Readiness course. It is also a free course and you can access it with your AWS Training and Certification account.
Section 2: Self-Paced Courses
I started with A Cloud Guru’s course on Big Data Specialty. This course covers almost 80% of the exam content. I believe they are working on updating it. I really liked the detailed explanation of Redshift and Dynamo DB sections.
I also took the second course on Udemy. This course is the most up-to-date course related to the exam topics. You also get a full practice test at the end.
During my preparation, I also found a third course on Linux Academy. Although I did not fully complete it since I was focusing on appearing in the exam very soon. The website also offers a 7-day trial period and you can efficiently utilize it to finish the course (at least solve the full practice test within the course – highly recommended)
I would strongly recommend doing at least two out of these courses as the style of teaching is different in each course and you will learn new things for the same topic. Eventually the purpose of doing a certification is to improve your knowledge. If you are also on a clock with less time and want to choose one then go with the second one.
Very Important -> Do create notes on each topic while you study, they come in really handy for revision at later stages as there is quite a lot of stuff to remember.
Section 3: Whitepapers and re-Invent Videos
During the initial phase of my preparation, I took this section a little less seriously but as I was progressing my research on how others have prepared for this exam, I found that everyone was emphasizing about reading white papers and watching some of the re-Invent videos. I highly recommend going through the content listed below at least once!
A Deep Dive into What’s New with Amazon EMR video link
Effective Data Lakes: Challenges and Design Patterns video link
Deep Dive into Best Practices for Amazon Redshift video link
Big Data Analytics Architectural Patterns & Best Practices video link
High Performance Data Streaming with Amazon Kinesis: Best Practices video link
Amazon Dynamo DB Deep Dive video link
Big Data Analytics options on AWS whitepaper link
Streaming Data Solutions on AWS using Amazon Kinesis whitepaper link
Migrating applications to AWS whitepaper link
Amazon EMR Best Practices whitepaper link
Enterprise Data Warehousing on AWS whitepaper link
Section 4: Hands on Experience
In my experience, it is one of the most important thing to do before you appear in the exam. Create an AWS free tier account and practice the labs no matter which course you are following from section 2.
The exam covers a wide range of topics/services and each of them have many small but important features. If you skip the labs then there are high chances that you either mix those key features or might totally forget some of them. Therefore, make sure you do the labs in the course. Remember to shut down the services once finished with the tasks. Otherwise, you may incur heavy costs i.e. if you leave an EMR cluster running for hours. Usually you will incur around five to ten dollars for doing all the labs.
Section 5: Time Management and Answering Approach
The exam gives you 170 minutes to answer 65 questions. Most of the questions are scenario based and span between three to six lines. Sometimes the answers are also more than a line. It breaks down to 2 minutes and 37 seconds for each question, which is not a lot to read large questions and find the right answer. Some tips:
Read the questions fast, note the key points i.e. real-time, near real-time, cost-optimized.
While reading answers, try to discard the most obvious wrong answers i.e. a question demands a real-time solution, here you can discard the options suggesting a data-pipeline with batch options. In most cases, you can easily discard two options. The job is to find the best one from the remaining two options. Read the question again, and this time, try to find some keywords/hints to eliminate the third wrong option.
Do not forget to read the last line(s) of the questions again. Sometime, a question contains a scenario in a large text about a combination of AWS services that are used but at the end, a cost-optimized solution is required. Therefore, in this case you should choose the option that solves the problem with lowest costs.
In case you are stuck on a question, do not spend more time on it. If you have already spent two minutes on a question and still no clue about the answer, just mark the question using the flag button for later review. Use the time efficiently to complete other questions first. Remember, time management is extremely important.
Section 6: Mock Tests and Practice Exams
Do solve practice exams to understand what type of question can come in the exam and how far are you with your preparation. It will also give you an idea on the average time you take to understand a question and answer it. I solved the practice exams available in the Udemy and Linux Academy courses (discussed in section 2) and a free test sample on Whizlabs link, which is also good for practice (but very lengthy questions).
Section 7: Cheat Sheet on Most Important Services
Amazon Dynamo DB
Amazon Redshift and Redshift Spectrum
AWS Elastic Search
Data Migration Service
Amazon Machine Learning Service - AML (although its deprecated but still valid for the exam)
AWS IoT Service
Security (Remember it’s job zero as recommended by AWS)
This is one of the most important section. Do not take this one easy. The exam consist of 20% questions from this section
Section 8: Helpful Posts and FAQs
Do your research in finding more tips on social media and blogs. I found the below posts helpful for my preparation:
In addition, if you have time, go through the FAQs of each individual service. During preparation, you might come across many random questions and FAQs link might become a handy tool for you.
It is indeed a difficult exam, not to scare anyone but to prepare and motivate. If you are well prepared, you will definitely ace it. The hands on experience is very important. Do not skip the labs specially Redshift, Dynamo DB, Kinesis and EMR. If you are already familiar with Big Data technologies then it will be easier for you to understand the landscape of Big Data technologies offered by AWS, how they connect together building complex data lakes, data pipelines, data warehouse and data analytics solutions.
I hope it helps you in your preparation. If you have any questions, reach out. I wish you all the best for your exam.