
I recently passed the Data Analyst Cloudera exam. But maybe we shouldn't run so far ahead.

After taking notes of the training videos and memorizing the core material with the enthusiasm of a young student, I tried to focus on being able to perform moderately complex tasks smoothly without having to browse the docks wildly. Actually, I don't consider browsing the manual/documentation to be inherently reprehensible, I had practical reasons for not doing it — the virtual machine used in the exam is not abundant in resources (although Hue plus a terminal window went smoothly on one tab) and it is quite easy to run out of time. Only the official Apache docs can be used in the exam, so it's worth taking all the time to get to know them and the sqoop manual, don't see them there first if you get in trouble. As an alternative source on the net, I found another preparer videos.
The exam is completely practice-oriented see the example at the bottom of the page, so it is definitely recommended to download Quickstart images(I used the docker version, the skinny one with 8 GB of RAM happily ran away on Ubuntu). In VM there is a retail_db database with a couple of tables, if you drag them into Hive with sqoop, you can already start practicing (and with the root/cloudera pair you can access db).
If you have done a thorough job, you will know the differences between HiveQL and Impala, you can write CTAS with confidence, you can use the built-in functions and you know when to use over-partition-by.
The first step is to transfer the correct amount to your Cloudera account, for which he will send you a welcome email. At one point he was redirected to DOGSsite — this is a site specializing in online exams, a special feature of web design typical of the previous century. The next step was to choose the time of the exam, my time zone (and he politely warned me that although I can schedule the exam for another time, I no longer have the opportunity to do so in the last 24 hours). There is a compatibility test on the PSI side, in order for this to succeed, I had to add one of PSI's chrome extensions.
Delay can be critical when reaching a virtual machine, so if the son of man does not trust the proper alignment of the stars, he can take his fate into his own hands and book a meeting room at Cloudera's Budapest headquarters — trusting that he can work there with optimal technical conditions.
The exam itself takes 120 minutes, but 15 minutes of administrative work before it, so it is ideal to occupy the room half an hour before the start of the exam. During the exam, theoretically, there could be a slip in time, but for me it was not significant.

When the time came, I put on my jacket, umbrella and drove the tram to the point marked by google maps. Even though my trademark is that I'm a little late from everywhere, surprisingly I managed to get there on time. Of course, this did not go absolutely smoothly either, considering that I naively expected a two-meter inscription 'Cloudera', which I could not find anywhere. After some guidance, I learned that their headquarters were on the sixth/seventh floor of the Roosevelt office building, so I stumbled into the office building. After being unable to mingle with the suit-clad gentlemen there with my tasteless brown faux leather jacket, I asked for a guest card to Cloudera. The elevator works with a card, so desperate pressing of buttons will not help a person get up to the seventh floor. At the reception, I quickly made a guest sticker for myself, which I proudly glued to the chest of my jacket. After a few minutes my contact arrived, and after another couple of minutes they managed to find another meeting room and then escorted me to the room. In total, I was in the room after 5-10 minutes, so I could start rearranging it. The shutters were pulled down, a blackboard was taken out, and I packed everything I knew off the table, took out my laptop and waited with sweaty palms to click on the start exam button on the PSI interface.

After clicking on the button, I got an interface, for now without a VM, where in a chat window I was greeted with a template text, then I was asked to verify myself, carry the laptop around the room, show the walls of the room thoroughly, show the surface of the table, etc. Since I couldn't take pictures of the exam, the image above is just an illustration I found with a google image search — but it's absolutely perfect for showing the surface. I couldn't sit with my back to the window, or I had to be clearly visible in the camera (the latter could be checked under View webcam & desktop in the top menu bar). Before starting, the system monitor/top had to be shown. The exam itself consisted of nine shorter tasks. Solving a task usually took only a few minutes, the difficulty was to control it. It was a common stipulation to follow another format identical to an existing table (file format, tagging, column name), so I had to be able to check this reflexively. There are no subscores on the tasks and they are automatically checked, so it's easy to slip through such “banana peels”.
I did not experience performance problems, most of the queries ran in 2-3 minutes. Twice I typed text to no avail, it didn't show up on the virtual machine for my desperate keystrokes - both times this problem was solved by itself after a couple of minutes. I also had a little trouble with the touchpad, but from this I can easily imagine that it was a local problem. After the exam was over, I received a template text that in 2-3 days there will be results. By the way, I received these surprisingly quickly - by the time I got back to the office after my already quite timely lunch, they had already notified me that I had passed.