I suggest that you do listening, speaking, and reading all at the same time. writing can be learnt later.
If you use an app like Duolingo or Rosetta Stone, you will notice that the word is presented, in a way that you can understand its meaning without direct translation, pronounced by a native speaker, then spoken by you. These actions all happen within a few seconds of each other, so I would not list them as separate steps.
You could try to train your ear by listening to English (or whatever language you are interested in) music or TV or movies. Of course doing this alone is not enough because you would not understand what the words mean unless you spend many many hours listening, like a baby listens to their mother, father, etc before he/she begins speaking.
You can do it all at the same time. There is no reason to separate what skill to develop at a particular time.
Build sentence, write it down, then speak it out loud. You can start with yes/no questions at the begining. Don't know how to do that? read grammar